├── .gitattributes
├── .gitignore
├── README.md
├── notebooks
    ├── Combining DataFrames.ipynb
    ├── Group Operations.ipynb
    ├── Indexing and Selecting.ipynb
    ├── Misc Functions.ipynb
    ├── Pandas Intro to Data Structures.ipynb
    └── Row-Column Transformations.ipynb
└── requirements.txt


/.gitattributes:
--------------------------------------------------------------------------------
1 | # Auto detect text files and perform LF normalization
2 | * text=auto
3 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | .DS_STORE
  2 | 
  3 | # Byte-compiled / optimized / DLL files
  4 | __pycache__/
  5 | *.py[cod]
  6 | *$py.class
  7 | 
  8 | # C extensions
  9 | *.so
 10 | 
 11 | # Distribution / packaging
 12 | .Python
 13 | build/
 14 | develop-eggs/
 15 | dist/
 16 | downloads/
 17 | eggs/
 18 | .eggs/
 19 | lib/
 20 | lib64/
 21 | parts/
 22 | sdist/
 23 | var/
 24 | wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Installer logs
 37 | pip-log.txt
 38 | pip-delete-this-directory.txt
 39 | 
 40 | # Unit test / coverage reports
 41 | htmlcov/
 42 | .tox/
 43 | .nox/
 44 | .coverage
 45 | .coverage.*
 46 | .cache
 47 | nosetests.xml
 48 | coverage.xml
 49 | *.cover
 50 | .hypothesis/
 51 | .pytest_cache/
 52 | 
 53 | # Translations
 54 | *.mo
 55 | *.pot
 56 | 
 57 | # Django stuff:
 58 | *.log
 59 | local_settings.py
 60 | db.sqlite3
 61 | 
 62 | # Flask stuff:
 63 | instance/
 64 | .webassets-cache
 65 | 
 66 | # Scrapy stuff:
 67 | .scrapy
 68 | 
 69 | # Sphinx documentation
 70 | docs/_build/
 71 | 
 72 | # PyBuilder
 73 | target/
 74 | 
 75 | # Jupyter Notebook
 76 | .ipynb_checkpoints
 77 | 
 78 | # IPython
 79 | profile_default/
 80 | ipython_config.py
 81 | 
 82 | # pyenv
 83 | .python-version
 84 | 
 85 | # celery beat schedule file
 86 | celerybeat-schedule
 87 | 
 88 | # SageMath parsed files
 89 | *.sage.py
 90 | 
 91 | # Environments
 92 | .env
 93 | .venv
 94 | env/
 95 | venv/
 96 | ENV/
 97 | env.bak/
 98 | venv.bak/
 99 | 
100 | # Spyder project settings
101 | .spyderproject
102 | .spyproject
103 | 
104 | # Rope project settings
105 | .ropeproject
106 | 
107 | # mkdocs documentation
108 | /site
109 | 
110 | # mypy
111 | .mypy_cache/
112 | .dmypy.json
113 | dmypy.json
114 | 
115 | # Pyre type checker
116 | .pyre/
117 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # An Opinionated Guide to Pandas
 2 | 
 3 | ## Getting started with this tutorial
 4 | 
 5 | I made this after quite a lot of thought. There are a ton of pandas tutorials out there and the maintainers of pandas themselves have tutorials. But I think these tutorials have one of two flavors:
 6 | 
 7 | 1. Intro: you barely get into details. 
 8 | 2. Reference: just the details
 9 | 
10 | I wanted to let people know what are the important and advanced pandas functions that a data scientist uses on a day to day basis. And I could not find it. 
11 | 
12 | Thus this. 
13 | 
14 | This tutorial is an opinionated guide to pandas. I'll let you know which functions I think are not worth learning and which are. This is not an intro to pandas. This is pandas for data scientists, and I hope you enjoy.
15 | 
16 | ## Installing Virtualenv
17 | 
18 | The first step to get running with these tutorials is to install virtualenv. Fortunately there is a [great tutorial](https://docs.python-guide.org/dev/virtualenvs/#lower-level-virtualenv) on hitchhiker's guide to python. Please follow the steps  in the guide.
19 | 
20 | Once you have installed virtualenv let's make an enviornment with the following command:
21 | 
22 | `virtualenv -p python3 env`
23 | 
24 | Notice that we are using python 3. Pandas has announced that they will stop supporting python 2 after 2019. You will then need to activate your env. Again the tutorial is a great resource on showing you how to do this on both windows and mac: 
25 | 
26 | [Activate your env](https://docs.python-guide.org/dev/virtualenvs/#lower-level-virtualenv)
27 | 
28 | The next step is that we will need to install all the requirements:
29 | 
30 | `pip install -r requirements.txt`
31 | 
32 | Finally the last step is to run an ipython notebook from within the env and then we are ready to go:
33 | 
34 | `ipython notebook` 
35 | 
36 | Pandas itself has some good resources on installation that you can find [here](https://pandas.pydata.org/pandas-docs/stable/install.html)
37 | 
38 | 
39 | ## Order of the Notebooks
40 | 
41 | The recommended order is:
42 | 
43 | 1. [Pandas Intro to Data Structures](https://github.com/knathanieltucker/pandas-tutorial/blob/master/notebooks/Pandas%20Intro%20to%20Data%20Structures.ipynb)
44 | 2. [Indexing and Selecting](https://github.com/knathanieltucker/pandas-tutorial/blob/master/notebooks/Indexing%20and%20Selecting.ipynb)
45 | 3. [Group Operations](https://github.com/knathanieltucker/pandas-tutorial/blob/master/notebooks/Group%20Operations.ipynb)
46 | 4. [Row-Column Transformations](https://github.com/knathanieltucker/pandas-tutorial/blob/master/notebooks/Row-Column%20Transformations.ipynb)
47 | 5. [Combining DataFrames](https://github.com/knathanieltucker/pandas-tutorial/blob/master/notebooks/Combining%20DataFrames.ipynb)
48 | 6. [Misc Functions](https://github.com/knathanieltucker/pandas-tutorial/blob/master/notebooks/Misc%20Functions.ipynb)
49 | 
50 | 
51 | ## Exercises
52 | 
53 | If you are like me you will also find using some of these techniques in exercises to be quite useful as well. And fortunately pandas has some great [exercises listed on their site](https://pandas.pydata.org/pandas-docs/stable/tutorials.html#exercises-for-new-users). If y'all would like and these tutorials/videos get enough support, I'd be happy to video the solutions to those exercises as well. 
54 | 


--------------------------------------------------------------------------------
/notebooks/Combining DataFrames.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "code",
   5 |    "execution_count": 1,
   6 |    "metadata": {},
   7 |    "outputs": [],
   8 |    "source": [
   9 |     "import seaborn as sns\n",
  10 |     "import pandas as pd\n",
  11 |     "import numpy as np"
  12 |    ]
  13 |   },
  14 |   {
  15 |    "cell_type": "markdown",
  16 |    "metadata": {},
  17 |    "source": [
  18 |     "# Pandas Combining DataFrames\n",
  19 |     "\n",
  20 |     "In pandas there are 4 (plus a few special case) ways to combine data from different frames:\n",
  21 |     "\n",
  22 |     "* Merging\n",
  23 |     "* Joining\n",
  24 |     "* Concatenating \n",
  25 |     "* Appending\n",
  26 |     "\n",
  27 |     "Where merging and joining are basically redundant and concatenating and appending are basically redundant. \n",
  28 |     "\n",
  29 |     "So today we will be going over Merging and Concatenating in pandas. \n",
  30 |     "\n",
  31 |     "Check out the full documentation [here](http://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html), but be warned it is a bit long :)\n",
  32 |     "\n",
  33 |     "\n",
  34 |     "Okay let's get started."
  35 |    ]
  36 |   },
  37 |   {
  38 |    "cell_type": "code",
  39 |    "execution_count": 2,
  40 |    "metadata": {},
  41 |    "outputs": [
  42 |     {
  43 |      "data": {
  44 |       "text/html": [
  45 |        "<div>\n",
  46 |        "<style scoped>\n",
  47 |        "    .dataframe tbody tr th:only-of-type {\n",
  48 |        "        vertical-align: middle;\n",
  49 |        "    }\n",
  50 |        "\n",
  51 |        "    .dataframe tbody tr th {\n",
  52 |        "        vertical-align: top;\n",
  53 |        "    }\n",
  54 |        "\n",
  55 |        "    .dataframe thead th {\n",
  56 |        "        text-align: right;\n",
  57 |        "    }\n",
  58 |        "</style>\n",
  59 |        "<table border=\"1\" class=\"dataframe\">\n",
  60 |        "  <thead>\n",
  61 |        "    <tr style=\"text-align: right;\">\n",
  62 |        "      <th></th>\n",
  63 |        "      <th>total_bill</th>\n",
  64 |        "      <th>tip</th>\n",
  65 |        "      <th>sex</th>\n",
  66 |        "      <th>smoker</th>\n",
  67 |        "      <th>day</th>\n",
  68 |        "      <th>time</th>\n",
  69 |        "      <th>size</th>\n",
  70 |        "    </tr>\n",
  71 |        "  </thead>\n",
  72 |        "  <tbody>\n",
  73 |        "    <tr>\n",
  74 |        "      <th>0</th>\n",
  75 |        "      <td>16.99</td>\n",
  76 |        "      <td>1.01</td>\n",
  77 |        "      <td>Female</td>\n",
  78 |        "      <td>No</td>\n",
  79 |        "      <td>Sun</td>\n",
  80 |        "      <td>Dinner</td>\n",
  81 |        "      <td>2</td>\n",
  82 |        "    </tr>\n",
  83 |        "    <tr>\n",
  84 |        "      <th>1</th>\n",
  85 |        "      <td>10.34</td>\n",
  86 |        "      <td>1.66</td>\n",
  87 |        "      <td>Male</td>\n",
  88 |        "      <td>No</td>\n",
  89 |        "      <td>Sun</td>\n",
  90 |        "      <td>Dinner</td>\n",
  91 |        "      <td>3</td>\n",
  92 |        "    </tr>\n",
  93 |        "    <tr>\n",
  94 |        "      <th>2</th>\n",
  95 |        "      <td>21.01</td>\n",
  96 |        "      <td>3.50</td>\n",
  97 |        "      <td>Male</td>\n",
  98 |        "      <td>No</td>\n",
  99 |        "      <td>Sun</td>\n",
 100 |        "      <td>Dinner</td>\n",
 101 |        "      <td>3</td>\n",
 102 |        "    </tr>\n",
 103 |        "  </tbody>\n",
 104 |        "</table>\n",
 105 |        "</div>"
 106 |       ],
 107 |       "text/plain": [
 108 |        "   total_bill   tip     sex smoker  day    time  size\n",
 109 |        "0       16.99  1.01  Female     No  Sun  Dinner     2\n",
 110 |        "1       10.34  1.66    Male     No  Sun  Dinner     3\n",
 111 |        "2       21.01  3.50    Male     No  Sun  Dinner     3"
 112 |       ]
 113 |      },
 114 |      "execution_count": 2,
 115 |      "metadata": {},
 116 |      "output_type": "execute_result"
 117 |     }
 118 |    ],
 119 |    "source": [
 120 |     "tips = sns.load_dataset('tips')\n",
 121 |     "tips.head(3)"
 122 |    ]
 123 |   },
 124 |   {
 125 |    "cell_type": "markdown",
 126 |    "metadata": {},
 127 |    "source": [
 128 |     "## Merge\n",
 129 |     "\n",
 130 |     "Merging is for doing complex column-wise combinations of dataframes in a SQL-like way. If you don't know SQL joins then check out this resource [sql joins](https://www.w3schools.com/sql/sql_join.asp) and comment below (and maybe I'll make a video). \n",
 131 |     "\n",
 132 |     "Two merge we need two dataframes, let's make them below:"
 133 |    ]
 134 |   },
 135 |   {
 136 |    "cell_type": "code",
 137 |    "execution_count": 3,
 138 |    "metadata": {},
 139 |    "outputs": [],
 140 |    "source": [
 141 |     "tips_bill = tips.groupby(['sex', 'smoker'])[['total_bill', 'tip']].sum()\n",
 142 |     "tips_tip = tips.groupby(['sex', 'smoker'])[['total_bill', 'tip']].sum()\n",
 143 |     "\n",
 144 |     "del tips_bill['tip']\n",
 145 |     "del tips_tip['total_bill']"
 146 |    ]
 147 |   },
 148 |   {
 149 |    "cell_type": "code",
 150 |    "execution_count": 4,
 151 |    "metadata": {},
 152 |    "outputs": [
 153 |     {
 154 |      "data": {
 155 |       "text/html": [
 156 |        "<div>\n",
 157 |        "<style scoped>\n",
 158 |        "    .dataframe tbody tr th:only-of-type {\n",
 159 |        "        vertical-align: middle;\n",
 160 |        "    }\n",
 161 |        "\n",
 162 |        "    .dataframe tbody tr th {\n",
 163 |        "        vertical-align: top;\n",
 164 |        "    }\n",
 165 |        "\n",
 166 |        "    .dataframe thead th {\n",
 167 |        "        text-align: right;\n",
 168 |        "    }\n",
 169 |        "</style>\n",
 170 |        "<table border=\"1\" class=\"dataframe\">\n",
 171 |        "  <thead>\n",
 172 |        "    <tr style=\"text-align: right;\">\n",
 173 |        "      <th></th>\n",
 174 |        "      <th></th>\n",
 175 |        "      <th>total_bill</th>\n",
 176 |        "    </tr>\n",
 177 |        "    <tr>\n",
 178 |        "      <th>sex</th>\n",
 179 |        "      <th>smoker</th>\n",
 180 |        "      <th></th>\n",
 181 |        "    </tr>\n",
 182 |        "  </thead>\n",
 183 |        "  <tbody>\n",
 184 |        "    <tr>\n",
 185 |        "      <th rowspan=\"2\" valign=\"top\">Male</th>\n",
 186 |        "      <th>Yes</th>\n",
 187 |        "      <td>1337.07</td>\n",
 188 |        "    </tr>\n",
 189 |        "    <tr>\n",
 190 |        "      <th>No</th>\n",
 191 |        "      <td>1919.75</td>\n",
 192 |        "    </tr>\n",
 193 |        "    <tr>\n",
 194 |        "      <th rowspan=\"2\" valign=\"top\">Female</th>\n",
 195 |        "      <th>Yes</th>\n",
 196 |        "      <td>593.27</td>\n",
 197 |        "    </tr>\n",
 198 |        "    <tr>\n",
 199 |        "      <th>No</th>\n",
 200 |        "      <td>977.68</td>\n",
 201 |        "    </tr>\n",
 202 |        "  </tbody>\n",
 203 |        "</table>\n",
 204 |        "</div>"
 205 |       ],
 206 |       "text/plain": [
 207 |        "               total_bill\n",
 208 |        "sex    smoker            \n",
 209 |        "Male   Yes        1337.07\n",
 210 |        "       No         1919.75\n",
 211 |        "Female Yes         593.27\n",
 212 |        "       No          977.68"
 213 |       ]
 214 |      },
 215 |      "execution_count": 4,
 216 |      "metadata": {},
 217 |      "output_type": "execute_result"
 218 |     }
 219 |    ],
 220 |    "source": [
 221 |     "tips_bill"
 222 |    ]
 223 |   },
 224 |   {
 225 |    "cell_type": "code",
 226 |    "execution_count": 18,
 227 |    "metadata": {},
 228 |    "outputs": [
 229 |     {
 230 |      "data": {
 231 |       "text/html": [
 232 |        "<div>\n",
 233 |        "<style scoped>\n",
 234 |        "    .dataframe tbody tr th:only-of-type {\n",
 235 |        "        vertical-align: middle;\n",
 236 |        "    }\n",
 237 |        "\n",
 238 |        "    .dataframe tbody tr th {\n",
 239 |        "        vertical-align: top;\n",
 240 |        "    }\n",
 241 |        "\n",
 242 |        "    .dataframe thead th {\n",
 243 |        "        text-align: right;\n",
 244 |        "    }\n",
 245 |        "</style>\n",
 246 |        "<table border=\"1\" class=\"dataframe\">\n",
 247 |        "  <thead>\n",
 248 |        "    <tr style=\"text-align: right;\">\n",
 249 |        "      <th></th>\n",
 250 |        "      <th></th>\n",
 251 |        "      <th>tip</th>\n",
 252 |        "    </tr>\n",
 253 |        "    <tr>\n",
 254 |        "      <th>sex</th>\n",
 255 |        "      <th>smoker</th>\n",
 256 |        "      <th></th>\n",
 257 |        "    </tr>\n",
 258 |        "  </thead>\n",
 259 |        "  <tbody>\n",
 260 |        "    <tr>\n",
 261 |        "      <th rowspan=\"2\" valign=\"top\">Male</th>\n",
 262 |        "      <th>Yes</th>\n",
 263 |        "      <td>183.07</td>\n",
 264 |        "    </tr>\n",
 265 |        "    <tr>\n",
 266 |        "      <th>No</th>\n",
 267 |        "      <td>302.00</td>\n",
 268 |        "    </tr>\n",
 269 |        "    <tr>\n",
 270 |        "      <th rowspan=\"2\" valign=\"top\">Female</th>\n",
 271 |        "      <th>Yes</th>\n",
 272 |        "      <td>96.74</td>\n",
 273 |        "    </tr>\n",
 274 |        "    <tr>\n",
 275 |        "      <th>No</th>\n",
 276 |        "      <td>149.77</td>\n",
 277 |        "    </tr>\n",
 278 |        "  </tbody>\n",
 279 |        "</table>\n",
 280 |        "</div>"
 281 |       ],
 282 |       "text/plain": [
 283 |        "                  tip\n",
 284 |        "sex    smoker        \n",
 285 |        "Male   Yes     183.07\n",
 286 |        "       No      302.00\n",
 287 |        "Female Yes      96.74\n",
 288 |        "       No      149.77"
 289 |       ]
 290 |      },
 291 |      "execution_count": 18,
 292 |      "metadata": {},
 293 |      "output_type": "execute_result"
 294 |     }
 295 |    ],
 296 |    "source": [
 297 |     "tips_tip"
 298 |    ]
 299 |   },
 300 |   {
 301 |    "cell_type": "markdown",
 302 |    "metadata": {},
 303 |    "source": [
 304 |     "Now that we have two datasets that we want to combine (aka take the tips and combine with the total bill), how do we do it? We merge!"
 305 |    ]
 306 |   },
 307 |   {
 308 |    "cell_type": "code",
 309 |    "execution_count": 20,
 310 |    "metadata": {},
 311 |    "outputs": [],
 312 |    "source": [
 313 |     "pd.merge?"
 314 |    ]
 315 |   },
 316 |   {
 317 |    "cell_type": "markdown",
 318 |    "metadata": {},
 319 |    "source": [
 320 |     "Notice that there are a ton of options:"
 321 |    ]
 322 |   },
 323 |   {
 324 |    "cell_type": "code",
 325 |    "execution_count": 22,
 326 |    "metadata": {},
 327 |    "outputs": [
 328 |     {
 329 |      "data": {
 330 |       "text/html": [
 331 |        "<div>\n",
 332 |        "<style scoped>\n",
 333 |        "    .dataframe tbody tr th:only-of-type {\n",
 334 |        "        vertical-align: middle;\n",
 335 |        "    }\n",
 336 |        "\n",
 337 |        "    .dataframe tbody tr th {\n",
 338 |        "        vertical-align: top;\n",
 339 |        "    }\n",
 340 |        "\n",
 341 |        "    .dataframe thead th {\n",
 342 |        "        text-align: right;\n",
 343 |        "    }\n",
 344 |        "</style>\n",
 345 |        "<table border=\"1\" class=\"dataframe\">\n",
 346 |        "  <thead>\n",
 347 |        "    <tr style=\"text-align: right;\">\n",
 348 |        "      <th></th>\n",
 349 |        "      <th></th>\n",
 350 |        "      <th>total_bill</th>\n",
 351 |        "      <th>tip</th>\n",
 352 |        "    </tr>\n",
 353 |        "    <tr>\n",
 354 |        "      <th>sex</th>\n",
 355 |        "      <th>smoker</th>\n",
 356 |        "      <th></th>\n",
 357 |        "      <th></th>\n",
 358 |        "    </tr>\n",
 359 |        "  </thead>\n",
 360 |        "  <tbody>\n",
 361 |        "    <tr>\n",
 362 |        "      <th rowspan=\"2\" valign=\"top\">Male</th>\n",
 363 |        "      <th>Yes</th>\n",
 364 |        "      <td>1337.07</td>\n",
 365 |        "      <td>183.07</td>\n",
 366 |        "    </tr>\n",
 367 |        "    <tr>\n",
 368 |        "      <th>No</th>\n",
 369 |        "      <td>1919.75</td>\n",
 370 |        "      <td>302.00</td>\n",
 371 |        "    </tr>\n",
 372 |        "    <tr>\n",
 373 |        "      <th rowspan=\"2\" valign=\"top\">Female</th>\n",
 374 |        "      <th>Yes</th>\n",
 375 |        "      <td>593.27</td>\n",
 376 |        "      <td>96.74</td>\n",
 377 |        "    </tr>\n",
 378 |        "    <tr>\n",
 379 |        "      <th>No</th>\n",
 380 |        "      <td>977.68</td>\n",
 381 |        "      <td>149.77</td>\n",
 382 |        "    </tr>\n",
 383 |        "  </tbody>\n",
 384 |        "</table>\n",
 385 |        "</div>"
 386 |       ],
 387 |       "text/plain": [
 388 |        "               total_bill     tip\n",
 389 |        "sex    smoker                    \n",
 390 |        "Male   Yes        1337.07  183.07\n",
 391 |        "       No         1919.75  302.00\n",
 392 |        "Female Yes         593.27   96.74\n",
 393 |        "       No          977.68  149.77"
 394 |       ]
 395 |      },
 396 |      "execution_count": 22,
 397 |      "metadata": {},
 398 |      "output_type": "execute_result"
 399 |     }
 400 |    ],
 401 |    "source": [
 402 |     "# we can merge on the indexes\n",
 403 |     "pd.merge(tips_bill, tips_tip, \n",
 404 |     "         right_index=True, left_index=True)"
 405 |    ]
 406 |   },
 407 |   {
 408 |    "cell_type": "code",
 409 |    "execution_count": 24,
 410 |    "metadata": {},
 411 |    "outputs": [
 412 |     {
 413 |      "data": {
 414 |       "text/html": [
 415 |        "<div>\n",
 416 |        "<style scoped>\n",
 417 |        "    .dataframe tbody tr th:only-of-type {\n",
 418 |        "        vertical-align: middle;\n",
 419 |        "    }\n",
 420 |        "\n",
 421 |        "    .dataframe tbody tr th {\n",
 422 |        "        vertical-align: top;\n",
 423 |        "    }\n",
 424 |        "\n",
 425 |        "    .dataframe thead th {\n",
 426 |        "        text-align: right;\n",
 427 |        "    }\n",
 428 |        "</style>\n",
 429 |        "<table border=\"1\" class=\"dataframe\">\n",
 430 |        "  <thead>\n",
 431 |        "    <tr style=\"text-align: right;\">\n",
 432 |        "      <th></th>\n",
 433 |        "      <th>sex</th>\n",
 434 |        "      <th>smoker</th>\n",
 435 |        "      <th>total_bill</th>\n",
 436 |        "      <th>tip</th>\n",
 437 |        "    </tr>\n",
 438 |        "  </thead>\n",
 439 |        "  <tbody>\n",
 440 |        "    <tr>\n",
 441 |        "      <th>0</th>\n",
 442 |        "      <td>Male</td>\n",
 443 |        "      <td>Yes</td>\n",
 444 |        "      <td>1337.07</td>\n",
 445 |        "      <td>183.07</td>\n",
 446 |        "    </tr>\n",
 447 |        "    <tr>\n",
 448 |        "      <th>1</th>\n",
 449 |        "      <td>Male</td>\n",
 450 |        "      <td>No</td>\n",
 451 |        "      <td>1919.75</td>\n",
 452 |        "      <td>302.00</td>\n",
 453 |        "    </tr>\n",
 454 |        "    <tr>\n",
 455 |        "      <th>2</th>\n",
 456 |        "      <td>Female</td>\n",
 457 |        "      <td>Yes</td>\n",
 458 |        "      <td>593.27</td>\n",
 459 |        "      <td>96.74</td>\n",
 460 |        "    </tr>\n",
 461 |        "    <tr>\n",
 462 |        "      <th>3</th>\n",
 463 |        "      <td>Female</td>\n",
 464 |        "      <td>No</td>\n",
 465 |        "      <td>977.68</td>\n",
 466 |        "      <td>149.77</td>\n",
 467 |        "    </tr>\n",
 468 |        "  </tbody>\n",
 469 |        "</table>\n",
 470 |        "</div>"
 471 |       ],
 472 |       "text/plain": [
 473 |        "      sex smoker  total_bill     tip\n",
 474 |        "0    Male    Yes     1337.07  183.07\n",
 475 |        "1    Male     No     1919.75  302.00\n",
 476 |        "2  Female    Yes      593.27   96.74\n",
 477 |        "3  Female     No      977.68  149.77"
 478 |       ]
 479 |      },
 480 |      "execution_count": 24,
 481 |      "metadata": {},
 482 |      "output_type": "execute_result"
 483 |     }
 484 |    ],
 485 |    "source": [
 486 |     "#we can reset indexes and then merge on the columns - perhaps the easiest way\n",
 487 |     "pd.merge(\n",
 488 |     "    tips_bill.reset_index(), \n",
 489 |     "    tips_tip.reset_index(),\n",
 490 |     "    on=['sex', 'smoker']\n",
 491 |     ")"
 492 |    ]
 493 |   },
 494 |   {
 495 |    "cell_type": "code",
 496 |    "execution_count": 25,
 497 |    "metadata": {},
 498 |    "outputs": [
 499 |     {
 500 |      "data": {
 501 |       "text/html": [
 502 |        "<div>\n",
 503 |        "<style scoped>\n",
 504 |        "    .dataframe tbody tr th:only-of-type {\n",
 505 |        "        vertical-align: middle;\n",
 506 |        "    }\n",
 507 |        "\n",
 508 |        "    .dataframe tbody tr th {\n",
 509 |        "        vertical-align: top;\n",
 510 |        "    }\n",
 511 |        "\n",
 512 |        "    .dataframe thead th {\n",
 513 |        "        text-align: right;\n",
 514 |        "    }\n",
 515 |        "</style>\n",
 516 |        "<table border=\"1\" class=\"dataframe\">\n",
 517 |        "  <thead>\n",
 518 |        "    <tr style=\"text-align: right;\">\n",
 519 |        "      <th></th>\n",
 520 |        "      <th>sex</th>\n",
 521 |        "      <th>smoker</th>\n",
 522 |        "      <th>total_bill</th>\n",
 523 |        "      <th>tip</th>\n",
 524 |        "    </tr>\n",
 525 |        "  </thead>\n",
 526 |        "  <tbody>\n",
 527 |        "    <tr>\n",
 528 |        "      <th>0</th>\n",
 529 |        "      <td>Male</td>\n",
 530 |        "      <td>Yes</td>\n",
 531 |        "      <td>1337.07</td>\n",
 532 |        "      <td>183.07</td>\n",
 533 |        "    </tr>\n",
 534 |        "    <tr>\n",
 535 |        "      <th>1</th>\n",
 536 |        "      <td>Male</td>\n",
 537 |        "      <td>No</td>\n",
 538 |        "      <td>1919.75</td>\n",
 539 |        "      <td>302.00</td>\n",
 540 |        "    </tr>\n",
 541 |        "    <tr>\n",
 542 |        "      <th>2</th>\n",
 543 |        "      <td>Female</td>\n",
 544 |        "      <td>Yes</td>\n",
 545 |        "      <td>593.27</td>\n",
 546 |        "      <td>96.74</td>\n",
 547 |        "    </tr>\n",
 548 |        "    <tr>\n",
 549 |        "      <th>3</th>\n",
 550 |        "      <td>Female</td>\n",
 551 |        "      <td>No</td>\n",
 552 |        "      <td>977.68</td>\n",
 553 |        "      <td>149.77</td>\n",
 554 |        "    </tr>\n",
 555 |        "  </tbody>\n",
 556 |        "</table>\n",
 557 |        "</div>"
 558 |       ],
 559 |       "text/plain": [
 560 |        "      sex smoker  total_bill     tip\n",
 561 |        "0    Male    Yes     1337.07  183.07\n",
 562 |        "1    Male     No     1919.75  302.00\n",
 563 |        "2  Female    Yes      593.27   96.74\n",
 564 |        "3  Female     No      977.68  149.77"
 565 |       ]
 566 |      },
 567 |      "execution_count": 25,
 568 |      "metadata": {},
 569 |      "output_type": "execute_result"
 570 |     }
 571 |    ],
 572 |    "source": [
 573 |     "# it can actually infer the above - but be very careful with this\n",
 574 |     "pd.merge(\n",
 575 |     "    tips_bill.reset_index(), \n",
 576 |     "    tips_tip.reset_index()\n",
 577 |     ")"
 578 |    ]
 579 |   },
 580 |   {
 581 |    "cell_type": "code",
 582 |    "execution_count": 27,
 583 |    "metadata": {},
 584 |    "outputs": [
 585 |     {
 586 |      "data": {
 587 |       "text/html": [
 588 |        "<div>\n",
 589 |        "<style scoped>\n",
 590 |        "    .dataframe tbody tr th:only-of-type {\n",
 591 |        "        vertical-align: middle;\n",
 592 |        "    }\n",
 593 |        "\n",
 594 |        "    .dataframe tbody tr th {\n",
 595 |        "        vertical-align: top;\n",
 596 |        "    }\n",
 597 |        "\n",
 598 |        "    .dataframe thead th {\n",
 599 |        "        text-align: right;\n",
 600 |        "    }\n",
 601 |        "</style>\n",
 602 |        "<table border=\"1\" class=\"dataframe\">\n",
 603 |        "  <thead>\n",
 604 |        "    <tr style=\"text-align: right;\">\n",
 605 |        "      <th></th>\n",
 606 |        "      <th>sex</th>\n",
 607 |        "      <th>smoker</th>\n",
 608 |        "      <th>total_bill</th>\n",
 609 |        "      <th>tip</th>\n",
 610 |        "    </tr>\n",
 611 |        "  </thead>\n",
 612 |        "  <tbody>\n",
 613 |        "    <tr>\n",
 614 |        "      <th>0</th>\n",
 615 |        "      <td>Male</td>\n",
 616 |        "      <td>Yes</td>\n",
 617 |        "      <td>1337.07</td>\n",
 618 |        "      <td>183.07</td>\n",
 619 |        "    </tr>\n",
 620 |        "    <tr>\n",
 621 |        "      <th>1</th>\n",
 622 |        "      <td>Male</td>\n",
 623 |        "      <td>No</td>\n",
 624 |        "      <td>1919.75</td>\n",
 625 |        "      <td>302.00</td>\n",
 626 |        "    </tr>\n",
 627 |        "    <tr>\n",
 628 |        "      <th>2</th>\n",
 629 |        "      <td>Female</td>\n",
 630 |        "      <td>Yes</td>\n",
 631 |        "      <td>593.27</td>\n",
 632 |        "      <td>96.74</td>\n",
 633 |        "    </tr>\n",
 634 |        "    <tr>\n",
 635 |        "      <th>3</th>\n",
 636 |        "      <td>Female</td>\n",
 637 |        "      <td>No</td>\n",
 638 |        "      <td>977.68</td>\n",
 639 |        "      <td>149.77</td>\n",
 640 |        "    </tr>\n",
 641 |        "  </tbody>\n",
 642 |        "</table>\n",
 643 |        "</div>"
 644 |       ],
 645 |       "text/plain": [
 646 |        "      sex smoker  total_bill     tip\n",
 647 |        "0    Male    Yes     1337.07  183.07\n",
 648 |        "1    Male     No     1919.75  302.00\n",
 649 |        "2  Female    Yes      593.27   96.74\n",
 650 |        "3  Female     No      977.68  149.77"
 651 |       ]
 652 |      },
 653 |      "execution_count": 27,
 654 |      "metadata": {},
 655 |      "output_type": "execute_result"
 656 |     }
 657 |    ],
 658 |    "source": [
 659 |     "# it can merge on partial column and index\n",
 660 |     "pd.merge(\n",
 661 |     "    tips_bill.reset_index(), \n",
 662 |     "    tips_tip,\n",
 663 |     "    left_on=['sex', 'smoker'],\n",
 664 |     "    right_index=True\n",
 665 |     ")"
 666 |    ]
 667 |   },
 668 |   {
 669 |    "cell_type": "code",
 670 |    "execution_count": 5,
 671 |    "metadata": {},
 672 |    "outputs": [
 673 |     {
 674 |      "data": {
 675 |       "text/html": [
 676 |        "<div>\n",
 677 |        "<style scoped>\n",
 678 |        "    .dataframe tbody tr th:only-of-type {\n",
 679 |        "        vertical-align: middle;\n",
 680 |        "    }\n",
 681 |        "\n",
 682 |        "    .dataframe tbody tr th {\n",
 683 |        "        vertical-align: top;\n",
 684 |        "    }\n",
 685 |        "\n",
 686 |        "    .dataframe thead th {\n",
 687 |        "        text-align: right;\n",
 688 |        "    }\n",
 689 |        "</style>\n",
 690 |        "<table border=\"1\" class=\"dataframe\">\n",
 691 |        "  <thead>\n",
 692 |        "    <tr style=\"text-align: right;\">\n",
 693 |        "      <th></th>\n",
 694 |        "      <th>sex</th>\n",
 695 |        "      <th>total_bill</th>\n",
 696 |        "    </tr>\n",
 697 |        "    <tr>\n",
 698 |        "      <th>smoker</th>\n",
 699 |        "      <th></th>\n",
 700 |        "      <th></th>\n",
 701 |        "    </tr>\n",
 702 |        "  </thead>\n",
 703 |        "  <tbody>\n",
 704 |        "    <tr>\n",
 705 |        "      <th>Yes</th>\n",
 706 |        "      <td>Male</td>\n",
 707 |        "      <td>1337.07</td>\n",
 708 |        "    </tr>\n",
 709 |        "    <tr>\n",
 710 |        "      <th>No</th>\n",
 711 |        "      <td>Male</td>\n",
 712 |        "      <td>1919.75</td>\n",
 713 |        "    </tr>\n",
 714 |        "    <tr>\n",
 715 |        "      <th>Yes</th>\n",
 716 |        "      <td>Female</td>\n",
 717 |        "      <td>593.27</td>\n",
 718 |        "    </tr>\n",
 719 |        "    <tr>\n",
 720 |        "      <th>No</th>\n",
 721 |        "      <td>Female</td>\n",
 722 |        "      <td>977.68</td>\n",
 723 |        "    </tr>\n",
 724 |        "  </tbody>\n",
 725 |        "</table>\n",
 726 |        "</div>"
 727 |       ],
 728 |       "text/plain": [
 729 |        "           sex  total_bill\n",
 730 |        "smoker                    \n",
 731 |        "Yes       Male     1337.07\n",
 732 |        "No        Male     1919.75\n",
 733 |        "Yes     Female      593.27\n",
 734 |        "No      Female      977.68"
 735 |       ]
 736 |      },
 737 |      "execution_count": 5,
 738 |      "metadata": {},
 739 |      "output_type": "execute_result"
 740 |     }
 741 |    ],
 742 |    "source": [
 743 |     "#it can do interesting combinations\n",
 744 |     "tips_bill_strange = tips_bill.reset_index(level=0)\n",
 745 |     "tips_bill_strange"
 746 |    ]
 747 |   },
 748 |   {
 749 |    "cell_type": "code",
 750 |    "execution_count": 7,
 751 |    "metadata": {},
 752 |    "outputs": [
 753 |     {
 754 |      "data": {
 755 |       "text/html": [
 756 |        "<div>\n",
 757 |        "<style scoped>\n",
 758 |        "    .dataframe tbody tr th:only-of-type {\n",
 759 |        "        vertical-align: middle;\n",
 760 |        "    }\n",
 761 |        "\n",
 762 |        "    .dataframe tbody tr th {\n",
 763 |        "        vertical-align: top;\n",
 764 |        "    }\n",
 765 |        "\n",
 766 |        "    .dataframe thead th {\n",
 767 |        "        text-align: right;\n",
 768 |        "    }\n",
 769 |        "</style>\n",
 770 |        "<table border=\"1\" class=\"dataframe\">\n",
 771 |        "  <thead>\n",
 772 |        "    <tr style=\"text-align: right;\">\n",
 773 |        "      <th></th>\n",
 774 |        "      <th>sex</th>\n",
 775 |        "      <th>smoker</th>\n",
 776 |        "      <th>tip</th>\n",
 777 |        "      <th>total_bill</th>\n",
 778 |        "    </tr>\n",
 779 |        "  </thead>\n",
 780 |        "  <tbody>\n",
 781 |        "    <tr>\n",
 782 |        "      <th>0</th>\n",
 783 |        "      <td>Male</td>\n",
 784 |        "      <td>Yes</td>\n",
 785 |        "      <td>183.07</td>\n",
 786 |        "      <td>1337.07</td>\n",
 787 |        "    </tr>\n",
 788 |        "    <tr>\n",
 789 |        "      <th>1</th>\n",
 790 |        "      <td>Male</td>\n",
 791 |        "      <td>No</td>\n",
 792 |        "      <td>302.00</td>\n",
 793 |        "      <td>1919.75</td>\n",
 794 |        "    </tr>\n",
 795 |        "    <tr>\n",
 796 |        "      <th>2</th>\n",
 797 |        "      <td>Female</td>\n",
 798 |        "      <td>Yes</td>\n",
 799 |        "      <td>96.74</td>\n",
 800 |        "      <td>593.27</td>\n",
 801 |        "    </tr>\n",
 802 |        "    <tr>\n",
 803 |        "      <th>3</th>\n",
 804 |        "      <td>Female</td>\n",
 805 |        "      <td>No</td>\n",
 806 |        "      <td>149.77</td>\n",
 807 |        "      <td>977.68</td>\n",
 808 |        "    </tr>\n",
 809 |        "  </tbody>\n",
 810 |        "</table>\n",
 811 |        "</div>"
 812 |       ],
 813 |       "text/plain": [
 814 |        "      sex smoker     tip  total_bill\n",
 815 |        "0    Male    Yes  183.07     1337.07\n",
 816 |        "1    Male     No  302.00     1919.75\n",
 817 |        "2  Female    Yes   96.74      593.27\n",
 818 |        "3  Female     No  149.77      977.68"
 819 |       ]
 820 |      },
 821 |      "execution_count": 7,
 822 |      "metadata": {},
 823 |      "output_type": "execute_result"
 824 |     }
 825 |    ],
 826 |    "source": [
 827 |     "pd.merge(\n",
 828 |     "    tips_tip.reset_index(), \n",
 829 |     "    tips_bill_strange,\n",
 830 |     "    on=['sex', 'smoker']\n",
 831 |     ")"
 832 |    ]
 833 |   },
 834 |   {
 835 |    "cell_type": "code",
 836 |    "execution_count": 31,
 837 |    "metadata": {},
 838 |    "outputs": [
 839 |     {
 840 |      "data": {
 841 |       "text/html": [
 842 |        "<div>\n",
 843 |        "<style scoped>\n",
 844 |        "    .dataframe tbody tr th:only-of-type {\n",
 845 |        "        vertical-align: middle;\n",
 846 |        "    }\n",
 847 |        "\n",
 848 |        "    .dataframe tbody tr th {\n",
 849 |        "        vertical-align: top;\n",
 850 |        "    }\n",
 851 |        "\n",
 852 |        "    .dataframe thead th {\n",
 853 |        "        text-align: right;\n",
 854 |        "    }\n",
 855 |        "</style>\n",
 856 |        "<table border=\"1\" class=\"dataframe\">\n",
 857 |        "  <thead>\n",
 858 |        "    <tr style=\"text-align: right;\">\n",
 859 |        "      <th></th>\n",
 860 |        "      <th>sex</th>\n",
 861 |        "      <th>smoker</th>\n",
 862 |        "      <th>total_bill</th>\n",
 863 |        "      <th>tip</th>\n",
 864 |        "    </tr>\n",
 865 |        "  </thead>\n",
 866 |        "  <tbody>\n",
 867 |        "    <tr>\n",
 868 |        "      <th>0</th>\n",
 869 |        "      <td>Male</td>\n",
 870 |        "      <td>Yes</td>\n",
 871 |        "      <td>1337.07</td>\n",
 872 |        "      <td>183.07</td>\n",
 873 |        "    </tr>\n",
 874 |        "    <tr>\n",
 875 |        "      <th>1</th>\n",
 876 |        "      <td>Male</td>\n",
 877 |        "      <td>No</td>\n",
 878 |        "      <td>1919.75</td>\n",
 879 |        "      <td>302.00</td>\n",
 880 |        "    </tr>\n",
 881 |        "    <tr>\n",
 882 |        "      <th>2</th>\n",
 883 |        "      <td>Female</td>\n",
 884 |        "      <td>Yes</td>\n",
 885 |        "      <td>593.27</td>\n",
 886 |        "      <td>NaN</td>\n",
 887 |        "    </tr>\n",
 888 |        "    <tr>\n",
 889 |        "      <th>3</th>\n",
 890 |        "      <td>Female</td>\n",
 891 |        "      <td>No</td>\n",
 892 |        "      <td>977.68</td>\n",
 893 |        "      <td>NaN</td>\n",
 894 |        "    </tr>\n",
 895 |        "  </tbody>\n",
 896 |        "</table>\n",
 897 |        "</div>"
 898 |       ],
 899 |       "text/plain": [
 900 |        "      sex smoker  total_bill     tip\n",
 901 |        "0    Male    Yes     1337.07  183.07\n",
 902 |        "1    Male     No     1919.75  302.00\n",
 903 |        "2  Female    Yes      593.27     NaN\n",
 904 |        "3  Female     No      977.68     NaN"
 905 |       ]
 906 |      },
 907 |      "execution_count": 31,
 908 |      "metadata": {},
 909 |      "output_type": "execute_result"
 910 |     }
 911 |    ],
 912 |    "source": [
 913 |     "# we can do any SQL-like functionality\n",
 914 |     "pd.merge(\n",
 915 |     "    tips_bill.reset_index(), \n",
 916 |     "    tips_tip.reset_index().head(2),\n",
 917 |     "    how='left'\n",
 918 |     ")"
 919 |    ]
 920 |   },
 921 |   {
 922 |    "cell_type": "code",
 923 |    "execution_count": 32,
 924 |    "metadata": {},
 925 |    "outputs": [
 926 |     {
 927 |      "data": {
 928 |       "text/html": [
 929 |        "<div>\n",
 930 |        "<style scoped>\n",
 931 |        "    .dataframe tbody tr th:only-of-type {\n",
 932 |        "        vertical-align: middle;\n",
 933 |        "    }\n",
 934 |        "\n",
 935 |        "    .dataframe tbody tr th {\n",
 936 |        "        vertical-align: top;\n",
 937 |        "    }\n",
 938 |        "\n",
 939 |        "    .dataframe thead th {\n",
 940 |        "        text-align: right;\n",
 941 |        "    }\n",
 942 |        "</style>\n",
 943 |        "<table border=\"1\" class=\"dataframe\">\n",
 944 |        "  <thead>\n",
 945 |        "    <tr style=\"text-align: right;\">\n",
 946 |        "      <th></th>\n",
 947 |        "      <th>sex</th>\n",
 948 |        "      <th>smoker</th>\n",
 949 |        "      <th>total_bill</th>\n",
 950 |        "      <th>tip</th>\n",
 951 |        "    </tr>\n",
 952 |        "  </thead>\n",
 953 |        "  <tbody>\n",
 954 |        "    <tr>\n",
 955 |        "      <th>0</th>\n",
 956 |        "      <td>Male</td>\n",
 957 |        "      <td>Yes</td>\n",
 958 |        "      <td>1337.07</td>\n",
 959 |        "      <td>183.07</td>\n",
 960 |        "    </tr>\n",
 961 |        "    <tr>\n",
 962 |        "      <th>1</th>\n",
 963 |        "      <td>Male</td>\n",
 964 |        "      <td>No</td>\n",
 965 |        "      <td>1919.75</td>\n",
 966 |        "      <td>302.00</td>\n",
 967 |        "    </tr>\n",
 968 |        "  </tbody>\n",
 969 |        "</table>\n",
 970 |        "</div>"
 971 |       ],
 972 |       "text/plain": [
 973 |        "    sex smoker  total_bill     tip\n",
 974 |        "0  Male    Yes     1337.07  183.07\n",
 975 |        "1  Male     No     1919.75  302.00"
 976 |       ]
 977 |      },
 978 |      "execution_count": 32,
 979 |      "metadata": {},
 980 |      "output_type": "execute_result"
 981 |     }
 982 |    ],
 983 |    "source": [
 984 |     "pd.merge(\n",
 985 |     "    tips_bill.reset_index(), \n",
 986 |     "    tips_tip.reset_index().head(2),\n",
 987 |     "    how='inner'\n",
 988 |     ")"
 989 |    ]
 990 |   },
 991 |   {
 992 |    "cell_type": "code",
 993 |    "execution_count": 36,
 994 |    "metadata": {},
 995 |    "outputs": [
 996 |     {
 997 |      "data": {
 998 |       "text/html": [
 999 |        "<div>\n",
1000 |        "<style scoped>\n",
1001 |        "    .dataframe tbody tr th:only-of-type {\n",
1002 |        "        vertical-align: middle;\n",
1003 |        "    }\n",
1004 |        "\n",
1005 |        "    .dataframe tbody tr th {\n",
1006 |        "        vertical-align: top;\n",
1007 |        "    }\n",
1008 |        "\n",
1009 |        "    .dataframe thead th {\n",
1010 |        "        text-align: right;\n",
1011 |        "    }\n",
1012 |        "</style>\n",
1013 |        "<table border=\"1\" class=\"dataframe\">\n",
1014 |        "  <thead>\n",
1015 |        "    <tr style=\"text-align: right;\">\n",
1016 |        "      <th></th>\n",
1017 |        "      <th>sex</th>\n",
1018 |        "      <th>smoker</th>\n",
1019 |        "      <th>total_bill</th>\n",
1020 |        "      <th>tip</th>\n",
1021 |        "      <th>_merge</th>\n",
1022 |        "    </tr>\n",
1023 |        "  </thead>\n",
1024 |        "  <tbody>\n",
1025 |        "    <tr>\n",
1026 |        "      <th>0</th>\n",
1027 |        "      <td>Male</td>\n",
1028 |        "      <td>No</td>\n",
1029 |        "      <td>1919.75</td>\n",
1030 |        "      <td>302.00</td>\n",
1031 |        "      <td>both</td>\n",
1032 |        "    </tr>\n",
1033 |        "    <tr>\n",
1034 |        "      <th>1</th>\n",
1035 |        "      <td>Female</td>\n",
1036 |        "      <td>Yes</td>\n",
1037 |        "      <td>593.27</td>\n",
1038 |        "      <td>96.74</td>\n",
1039 |        "      <td>both</td>\n",
1040 |        "    </tr>\n",
1041 |        "    <tr>\n",
1042 |        "      <th>2</th>\n",
1043 |        "      <td>Female</td>\n",
1044 |        "      <td>No</td>\n",
1045 |        "      <td>977.68</td>\n",
1046 |        "      <td>NaN</td>\n",
1047 |        "      <td>left_only</td>\n",
1048 |        "    </tr>\n",
1049 |        "    <tr>\n",
1050 |        "      <th>3</th>\n",
1051 |        "      <td>Male</td>\n",
1052 |        "      <td>Yes</td>\n",
1053 |        "      <td>NaN</td>\n",
1054 |        "      <td>183.07</td>\n",
1055 |        "      <td>right_only</td>\n",
1056 |        "    </tr>\n",
1057 |        "  </tbody>\n",
1058 |        "</table>\n",
1059 |        "</div>"
1060 |       ],
1061 |       "text/plain": [
1062 |        "      sex smoker  total_bill     tip      _merge\n",
1063 |        "0    Male     No     1919.75  302.00        both\n",
1064 |        "1  Female    Yes      593.27   96.74        both\n",
1065 |        "2  Female     No      977.68     NaN   left_only\n",
1066 |        "3    Male    Yes         NaN  183.07  right_only"
1067 |       ]
1068 |      },
1069 |      "execution_count": 36,
1070 |      "metadata": {},
1071 |      "output_type": "execute_result"
1072 |     }
1073 |    ],
1074 |    "source": [
1075 |     "# and if you add an indicator...\n",
1076 |     "pd.merge(\n",
1077 |     "    tips_bill.reset_index().tail(3), \n",
1078 |     "    tips_tip.reset_index().head(3),\n",
1079 |     "    how='outer',\n",
1080 |     "    indicator=True\n",
1081 |     ")"
1082 |    ]
1083 |   },
1084 |   {
1085 |    "cell_type": "code",
1086 |    "execution_count": 35,
1087 |    "metadata": {},
1088 |    "outputs": [
1089 |     {
1090 |      "data": {
1091 |       "text/html": [
1092 |        "<div>\n",
1093 |        "<style scoped>\n",
1094 |        "    .dataframe tbody tr th:only-of-type {\n",
1095 |        "        vertical-align: middle;\n",
1096 |        "    }\n",
1097 |        "\n",
1098 |        "    .dataframe tbody tr th {\n",
1099 |        "        vertical-align: top;\n",
1100 |        "    }\n",
1101 |        "\n",
1102 |        "    .dataframe thead th {\n",
1103 |        "        text-align: right;\n",
1104 |        "    }\n",
1105 |        "</style>\n",
1106 |        "<table border=\"1\" class=\"dataframe\">\n",
1107 |        "  <thead>\n",
1108 |        "    <tr style=\"text-align: right;\">\n",
1109 |        "      <th></th>\n",
1110 |        "      <th></th>\n",
1111 |        "      <th>total_bill_left</th>\n",
1112 |        "      <th>total_bill_right</th>\n",
1113 |        "    </tr>\n",
1114 |        "    <tr>\n",
1115 |        "      <th>sex</th>\n",
1116 |        "      <th>smoker</th>\n",
1117 |        "      <th></th>\n",
1118 |        "      <th></th>\n",
1119 |        "    </tr>\n",
1120 |        "  </thead>\n",
1121 |        "  <tbody>\n",
1122 |        "    <tr>\n",
1123 |        "      <th rowspan=\"2\" valign=\"top\">Male</th>\n",
1124 |        "      <th>Yes</th>\n",
1125 |        "      <td>1337.07</td>\n",
1126 |        "      <td>1337.07</td>\n",
1127 |        "    </tr>\n",
1128 |        "    <tr>\n",
1129 |        "      <th>No</th>\n",
1130 |        "      <td>1919.75</td>\n",
1131 |        "      <td>1919.75</td>\n",
1132 |        "    </tr>\n",
1133 |        "    <tr>\n",
1134 |        "      <th rowspan=\"2\" valign=\"top\">Female</th>\n",
1135 |        "      <th>Yes</th>\n",
1136 |        "      <td>593.27</td>\n",
1137 |        "      <td>593.27</td>\n",
1138 |        "    </tr>\n",
1139 |        "    <tr>\n",
1140 |        "      <th>No</th>\n",
1141 |        "      <td>977.68</td>\n",
1142 |        "      <td>977.68</td>\n",
1143 |        "    </tr>\n",
1144 |        "  </tbody>\n",
1145 |        "</table>\n",
1146 |        "</div>"
1147 |       ],
1148 |       "text/plain": [
1149 |        "               total_bill_left  total_bill_right\n",
1150 |        "sex    smoker                                   \n",
1151 |        "Male   Yes             1337.07           1337.07\n",
1152 |        "       No              1919.75           1919.75\n",
1153 |        "Female Yes              593.27            593.27\n",
1154 |        "       No               977.68            977.68"
1155 |       ]
1156 |      },
1157 |      "execution_count": 35,
1158 |      "metadata": {},
1159 |      "output_type": "execute_result"
1160 |     }
1161 |    ],
1162 |    "source": [
1163 |     "# it can handle columns with the same name\n",
1164 |     "pd.merge(tips_bill, \n",
1165 |     "         tips_bill, \n",
1166 |     "         right_index=True, \n",
1167 |     "         left_index=True,\n",
1168 |     "         suffixes=('_left', '_right')\n",
1169 |     ")"
1170 |    ]
1171 |   },
1172 |   {
1173 |    "cell_type": "markdown",
1174 |    "metadata": {},
1175 |    "source": [
1176 |     "This is one of the most complex parts of pandas - but it is very important to master. So please do check out the excerises below!\n",
1177 |     "\n",
1178 |     "One thing to be careful with here is merging two data types. Strings are not equal to ints!"
1179 |    ]
1180 |   },
1181 |   {
1182 |    "cell_type": "markdown",
1183 |    "metadata": {},
1184 |    "source": [
1185 |     "# Contatenation\n",
1186 |     "\n",
1187 |     "Concatenating is for combining more than two dataframes in either column-wise or row-wise. The problem with concatenate is that the combinations it allows you to do are rather simplistic. That's why we need merge. \n",
1188 |     "\n",
1189 |     "Concatenate can take as many data frames as you want, but it requires that they are specifically constructed. All of the dataframes you pass in will need to have the same index. So no more using columns as an index. \n",
1190 |     "\n",
1191 |     "Let's check out basic use below:"
1192 |    ]
1193 |   },
1194 |   {
1195 |    "cell_type": "code",
1196 |    "execution_count": 8,
1197 |    "metadata": {},
1198 |    "outputs": [
1199 |     {
1200 |      "data": {
1201 |       "text/html": [
1202 |        "<div>\n",
1203 |        "<style scoped>\n",
1204 |        "    .dataframe tbody tr th:only-of-type {\n",
1205 |        "        vertical-align: middle;\n",
1206 |        "    }\n",
1207 |        "\n",
1208 |        "    .dataframe tbody tr th {\n",
1209 |        "        vertical-align: top;\n",
1210 |        "    }\n",
1211 |        "\n",
1212 |        "    .dataframe thead th {\n",
1213 |        "        text-align: right;\n",
1214 |        "    }\n",
1215 |        "</style>\n",
1216 |        "<table border=\"1\" class=\"dataframe\">\n",
1217 |        "  <thead>\n",
1218 |        "    <tr style=\"text-align: right;\">\n",
1219 |        "      <th></th>\n",
1220 |        "      <th></th>\n",
1221 |        "      <th>total_bill</th>\n",
1222 |        "      <th>tip</th>\n",
1223 |        "    </tr>\n",
1224 |        "    <tr>\n",
1225 |        "      <th>sex</th>\n",
1226 |        "      <th>smoker</th>\n",
1227 |        "      <th></th>\n",
1228 |        "      <th></th>\n",
1229 |        "    </tr>\n",
1230 |        "  </thead>\n",
1231 |        "  <tbody>\n",
1232 |        "    <tr>\n",
1233 |        "      <th rowspan=\"2\" valign=\"top\">Male</th>\n",
1234 |        "      <th>Yes</th>\n",
1235 |        "      <td>1337.07</td>\n",
1236 |        "      <td>NaN</td>\n",
1237 |        "    </tr>\n",
1238 |        "    <tr>\n",
1239 |        "      <th>No</th>\n",
1240 |        "      <td>1919.75</td>\n",
1241 |        "      <td>NaN</td>\n",
1242 |        "    </tr>\n",
1243 |        "    <tr>\n",
1244 |        "      <th rowspan=\"2\" valign=\"top\">Female</th>\n",
1245 |        "      <th>Yes</th>\n",
1246 |        "      <td>593.27</td>\n",
1247 |        "      <td>NaN</td>\n",
1248 |        "    </tr>\n",
1249 |        "    <tr>\n",
1250 |        "      <th>No</th>\n",
1251 |        "      <td>977.68</td>\n",
1252 |        "      <td>NaN</td>\n",
1253 |        "    </tr>\n",
1254 |        "    <tr>\n",
1255 |        "      <th rowspan=\"2\" valign=\"top\">Male</th>\n",
1256 |        "      <th>Yes</th>\n",
1257 |        "      <td>1337.07</td>\n",
1258 |        "      <td>NaN</td>\n",
1259 |        "    </tr>\n",
1260 |        "    <tr>\n",
1261 |        "      <th>No</th>\n",
1262 |        "      <td>1919.75</td>\n",
1263 |        "      <td>NaN</td>\n",
1264 |        "    </tr>\n",
1265 |        "    <tr>\n",
1266 |        "      <th rowspan=\"2\" valign=\"top\">Female</th>\n",
1267 |        "      <th>Yes</th>\n",
1268 |        "      <td>593.27</td>\n",
1269 |        "      <td>NaN</td>\n",
1270 |        "    </tr>\n",
1271 |        "    <tr>\n",
1272 |        "      <th>No</th>\n",
1273 |        "      <td>977.68</td>\n",
1274 |        "      <td>NaN</td>\n",
1275 |        "    </tr>\n",
1276 |        "    <tr>\n",
1277 |        "      <th rowspan=\"2\" valign=\"top\">Male</th>\n",
1278 |        "      <th>Yes</th>\n",
1279 |        "      <td>NaN</td>\n",
1280 |        "      <td>183.07</td>\n",
1281 |        "    </tr>\n",
1282 |        "    <tr>\n",
1283 |        "      <th>No</th>\n",
1284 |        "      <td>NaN</td>\n",
1285 |        "      <td>302.00</td>\n",
1286 |        "    </tr>\n",
1287 |        "    <tr>\n",
1288 |        "      <th rowspan=\"2\" valign=\"top\">Female</th>\n",
1289 |        "      <th>Yes</th>\n",
1290 |        "      <td>NaN</td>\n",
1291 |        "      <td>96.74</td>\n",
1292 |        "    </tr>\n",
1293 |        "    <tr>\n",
1294 |        "      <th>No</th>\n",
1295 |        "      <td>NaN</td>\n",
1296 |        "      <td>149.77</td>\n",
1297 |        "    </tr>\n",
1298 |        "  </tbody>\n",
1299 |        "</table>\n",
1300 |        "</div>"
1301 |       ],
1302 |       "text/plain": [
1303 |        "               total_bill     tip\n",
1304 |        "sex    smoker                    \n",
1305 |        "Male   Yes        1337.07     NaN\n",
1306 |        "       No         1919.75     NaN\n",
1307 |        "Female Yes         593.27     NaN\n",
1308 |        "       No          977.68     NaN\n",
1309 |        "Male   Yes        1337.07     NaN\n",
1310 |        "       No         1919.75     NaN\n",
1311 |        "Female Yes         593.27     NaN\n",
1312 |        "       No          977.68     NaN\n",
1313 |        "Male   Yes            NaN  183.07\n",
1314 |        "       No             NaN  302.00\n",
1315 |        "Female Yes            NaN   96.74\n",
1316 |        "       No             NaN  149.77"
1317 |       ]
1318 |      },
1319 |      "execution_count": 8,
1320 |      "metadata": {},
1321 |      "output_type": "execute_result"
1322 |     }
1323 |    ],
1324 |    "source": [
1325 |     "# this adds the dataframes together row wise\n",
1326 |     "pd.concat([tips_bill, tips_bill, tips_tip], sort=False)"
1327 |    ]
1328 |   },
1329 |   {
1330 |    "cell_type": "code",
1331 |    "execution_count": 9,
1332 |    "metadata": {},
1333 |    "outputs": [
1334 |     {
1335 |      "data": {
1336 |       "text/html": [
1337 |        "<div>\n",
1338 |        "<style scoped>\n",
1339 |        "    .dataframe tbody tr th:only-of-type {\n",
1340 |        "        vertical-align: middle;\n",
1341 |        "    }\n",
1342 |        "\n",
1343 |        "    .dataframe tbody tr th {\n",
1344 |        "        vertical-align: top;\n",
1345 |        "    }\n",
1346 |        "\n",
1347 |        "    .dataframe thead th {\n",
1348 |        "        text-align: right;\n",
1349 |        "    }\n",
1350 |        "</style>\n",
1351 |        "<table border=\"1\" class=\"dataframe\">\n",
1352 |        "  <thead>\n",
1353 |        "    <tr style=\"text-align: right;\">\n",
1354 |        "      <th></th>\n",
1355 |        "      <th></th>\n",
1356 |        "      <th>total_bill</th>\n",
1357 |        "      <th>tip</th>\n",
1358 |        "    </tr>\n",
1359 |        "    <tr>\n",
1360 |        "      <th>sex</th>\n",
1361 |        "      <th>smoker</th>\n",
1362 |        "      <th></th>\n",
1363 |        "      <th></th>\n",
1364 |        "    </tr>\n",
1365 |        "  </thead>\n",
1366 |        "  <tbody>\n",
1367 |        "    <tr>\n",
1368 |        "      <th rowspan=\"2\" valign=\"top\">Male</th>\n",
1369 |        "      <th>Yes</th>\n",
1370 |        "      <td>1337.07</td>\n",
1371 |        "      <td>183.07</td>\n",
1372 |        "    </tr>\n",
1373 |        "    <tr>\n",
1374 |        "      <th>No</th>\n",
1375 |        "      <td>1919.75</td>\n",
1376 |        "      <td>302.00</td>\n",
1377 |        "    </tr>\n",
1378 |        "    <tr>\n",
1379 |        "      <th rowspan=\"2\" valign=\"top\">Female</th>\n",
1380 |        "      <th>Yes</th>\n",
1381 |        "      <td>593.27</td>\n",
1382 |        "      <td>96.74</td>\n",
1383 |        "    </tr>\n",
1384 |        "    <tr>\n",
1385 |        "      <th>No</th>\n",
1386 |        "      <td>977.68</td>\n",
1387 |        "      <td>149.77</td>\n",
1388 |        "    </tr>\n",
1389 |        "  </tbody>\n",
1390 |        "</table>\n",
1391 |        "</div>"
1392 |       ],
1393 |       "text/plain": [
1394 |        "               total_bill     tip\n",
1395 |        "sex    smoker                    \n",
1396 |        "Male   Yes        1337.07  183.07\n",
1397 |        "       No         1919.75  302.00\n",
1398 |        "Female Yes         593.27   96.74\n",
1399 |        "       No          977.68  149.77"
1400 |       ]
1401 |      },
1402 |      "execution_count": 9,
1403 |      "metadata": {},
1404 |      "output_type": "execute_result"
1405 |     }
1406 |    ],
1407 |    "source": [
1408 |     "# this does it column wise\n",
1409 |     "pd.concat([tips_bill, tips_tip], axis=1)"
1410 |    ]
1411 |   },
1412 |   {
1413 |    "cell_type": "code",
1414 |    "execution_count": 10,
1415 |    "metadata": {},
1416 |    "outputs": [
1417 |     {
1418 |      "data": {
1419 |       "text/html": [
1420 |        "<div>\n",
1421 |        "<style scoped>\n",
1422 |        "    .dataframe tbody tr th:only-of-type {\n",
1423 |        "        vertical-align: middle;\n",
1424 |        "    }\n",
1425 |        "\n",
1426 |        "    .dataframe tbody tr th {\n",
1427 |        "        vertical-align: top;\n",
1428 |        "    }\n",
1429 |        "\n",
1430 |        "    .dataframe thead th {\n",
1431 |        "        text-align: right;\n",
1432 |        "    }\n",
1433 |        "</style>\n",
1434 |        "<table border=\"1\" class=\"dataframe\">\n",
1435 |        "  <thead>\n",
1436 |        "    <tr style=\"text-align: right;\">\n",
1437 |        "      <th></th>\n",
1438 |        "      <th></th>\n",
1439 |        "      <th></th>\n",
1440 |        "      <th>total_bill</th>\n",
1441 |        "      <th>tip</th>\n",
1442 |        "    </tr>\n",
1443 |        "    <tr>\n",
1444 |        "      <th></th>\n",
1445 |        "      <th>sex</th>\n",
1446 |        "      <th>smoker</th>\n",
1447 |        "      <th></th>\n",
1448 |        "      <th></th>\n",
1449 |        "    </tr>\n",
1450 |        "  </thead>\n",
1451 |        "  <tbody>\n",
1452 |        "    <tr>\n",
1453 |        "      <th rowspan=\"4\" valign=\"top\">num0</th>\n",
1454 |        "      <th rowspan=\"2\" valign=\"top\">Male</th>\n",
1455 |        "      <th>Yes</th>\n",
1456 |        "      <td>1337.07</td>\n",
1457 |        "      <td>NaN</td>\n",
1458 |        "    </tr>\n",
1459 |        "    <tr>\n",
1460 |        "      <th>No</th>\n",
1461 |        "      <td>1919.75</td>\n",
1462 |        "      <td>NaN</td>\n",
1463 |        "    </tr>\n",
1464 |        "    <tr>\n",
1465 |        "      <th rowspan=\"2\" valign=\"top\">Female</th>\n",
1466 |        "      <th>Yes</th>\n",
1467 |        "      <td>593.27</td>\n",
1468 |        "      <td>NaN</td>\n",
1469 |        "    </tr>\n",
1470 |        "    <tr>\n",
1471 |        "      <th>No</th>\n",
1472 |        "      <td>977.68</td>\n",
1473 |        "      <td>NaN</td>\n",
1474 |        "    </tr>\n",
1475 |        "    <tr>\n",
1476 |        "      <th rowspan=\"4\" valign=\"top\">num1</th>\n",
1477 |        "      <th rowspan=\"2\" valign=\"top\">Male</th>\n",
1478 |        "      <th>Yes</th>\n",
1479 |        "      <td>NaN</td>\n",
1480 |        "      <td>183.07</td>\n",
1481 |        "    </tr>\n",
1482 |        "    <tr>\n",
1483 |        "      <th>No</th>\n",
1484 |        "      <td>NaN</td>\n",
1485 |        "      <td>302.00</td>\n",
1486 |        "    </tr>\n",
1487 |        "    <tr>\n",
1488 |        "      <th rowspan=\"2\" valign=\"top\">Female</th>\n",
1489 |        "      <th>Yes</th>\n",
1490 |        "      <td>NaN</td>\n",
1491 |        "      <td>96.74</td>\n",
1492 |        "    </tr>\n",
1493 |        "    <tr>\n",
1494 |        "      <th>No</th>\n",
1495 |        "      <td>NaN</td>\n",
1496 |        "      <td>149.77</td>\n",
1497 |        "    </tr>\n",
1498 |        "  </tbody>\n",
1499 |        "</table>\n",
1500 |        "</div>"
1501 |       ],
1502 |       "text/plain": [
1503 |        "                    total_bill     tip\n",
1504 |        "     sex    smoker                    \n",
1505 |        "num0 Male   Yes        1337.07     NaN\n",
1506 |        "            No         1919.75     NaN\n",
1507 |        "     Female Yes         593.27     NaN\n",
1508 |        "            No          977.68     NaN\n",
1509 |        "num1 Male   Yes            NaN  183.07\n",
1510 |        "            No             NaN  302.00\n",
1511 |        "     Female Yes            NaN   96.74\n",
1512 |        "            No             NaN  149.77"
1513 |       ]
1514 |      },
1515 |      "execution_count": 10,
1516 |      "metadata": {},
1517 |      "output_type": "execute_result"
1518 |     }
1519 |    ],
1520 |    "source": [
1521 |     "# and finally this will add on the dataset where it's from\n",
1522 |     "pd.concat([tips_bill, tips_tip], sort=False, keys=['num0', 'num1'])"
1523 |    ]
1524 |   },
1525 |   {
1526 |    "cell_type": "markdown",
1527 |    "metadata": {},
1528 |    "source": [
1529 |     "As you can see there is not a ton of functionality to concat, but it is invaluable if you have more than one dataframe or you are looking to append the rows of one dataframe onto another."
1530 |    ]
1531 |   },
1532 |   {
1533 |    "cell_type": "markdown",
1534 |    "metadata": {},
1535 |    "source": [
1536 |     "## Conclusion\n",
1537 |     "\n",
1538 |     "There are a couple of other ways to merge data, but they are pretty niche (and mainly for time series data). If y'all have a desire for me to go over them then comment below!\n",
1539 |     "\n",
1540 |     "They are:\n",
1541 |     "\n",
1542 |     "* combine_first\n",
1543 |     "* merge_ordered\n",
1544 |     "* merge_asof\n",
1545 |     "\n",
1546 |     "Otherwise you should be fully equipped to do the [exercises](https://github.com/guipsamora/pandas_exercises#merge). These functions require a bit of practice to get used to, so don't be discouraged if it takes some time."
1547 |    ]
1548 |   },
1549 |   {
1550 |    "cell_type": "code",
1551 |    "execution_count": null,
1552 |    "metadata": {},
1553 |    "outputs": [],
1554 |    "source": []
1555 |   }
1556 |  ],
1557 |  "metadata": {
1558 |   "kernelspec": {
1559 |    "display_name": "Python 3",
1560 |    "language": "python",
1561 |    "name": "python3"
1562 |   },
1563 |   "language_info": {
1564 |    "codemirror_mode": {
1565 |     "name": "ipython",
1566 |     "version": 3
1567 |    },
1568 |    "file_extension": ".py",
1569 |    "mimetype": "text/x-python",
1570 |    "name": "python",
1571 |    "nbconvert_exporter": "python",
1572 |    "pygments_lexer": "ipython3",
1573 |    "version": "3.7.3"
1574 |   }
1575 |  },
1576 |  "nbformat": 4,
1577 |  "nbformat_minor": 2
1578 | }
1579 | 


--------------------------------------------------------------------------------
/notebooks/Group Operations.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "code",
   5 |    "execution_count": 3,
   6 |    "metadata": {},
   7 |    "outputs": [],
   8 |    "source": [
   9 |     "import seaborn as sns\n",
  10 |     "import pandas as pd\n",
  11 |     "import numpy as np"
  12 |    ]
  13 |   },
  14 |   {
  15 |    "cell_type": "markdown",
  16 |    "metadata": {},
  17 |    "source": [
  18 |     "# Pandas Group Operations\n",
  19 |     "\n",
  20 |     "Let's next go over grouped operations with pandas. This section of the pandas library does not have as much feature bloat as other parts, which is nice. And the community is starting to narrow around a couple of operations that are core to grouped operations. We'll be going over these operations with particular emphasis on groupby and agg:\n",
  21 |     "\n",
  22 |     "* groupby\n",
  23 |     "* agg\n",
  24 |     "* filter\n",
  25 |     "* transform\n",
  26 |     "\n",
  27 |     "Check out the full documentation [here](http://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html), but be warned it is a bit long :)"
  28 |    ]
  29 |   },
  30 |   {
  31 |    "cell_type": "markdown",
  32 |    "metadata": {},
  33 |    "source": [
  34 |     "Let's start with our good old tips dataset:"
  35 |    ]
  36 |   },
  37 |   {
  38 |    "cell_type": "code",
  39 |    "execution_count": 4,
  40 |    "metadata": {},
  41 |    "outputs": [
  42 |     {
  43 |      "data": {
  44 |       "text/html": [
  45 |        "<div>\n",
  46 |        "<style scoped>\n",
  47 |        "    .dataframe tbody tr th:only-of-type {\n",
  48 |        "        vertical-align: middle;\n",
  49 |        "    }\n",
  50 |        "\n",
  51 |        "    .dataframe tbody tr th {\n",
  52 |        "        vertical-align: top;\n",
  53 |        "    }\n",
  54 |        "\n",
  55 |        "    .dataframe thead th {\n",
  56 |        "        text-align: right;\n",
  57 |        "    }\n",
  58 |        "</style>\n",
  59 |        "<table border=\"1\" class=\"dataframe\">\n",
  60 |        "  <thead>\n",
  61 |        "    <tr style=\"text-align: right;\">\n",
  62 |        "      <th></th>\n",
  63 |        "      <th>total_bill</th>\n",
  64 |        "      <th>tip</th>\n",
  65 |        "      <th>sex</th>\n",
  66 |        "      <th>smoker</th>\n",
  67 |        "      <th>day</th>\n",
  68 |        "      <th>time</th>\n",
  69 |        "      <th>size</th>\n",
  70 |        "    </tr>\n",
  71 |        "  </thead>\n",
  72 |        "  <tbody>\n",
  73 |        "    <tr>\n",
  74 |        "      <th>0</th>\n",
  75 |        "      <td>16.99</td>\n",
  76 |        "      <td>1.01</td>\n",
  77 |        "      <td>Female</td>\n",
  78 |        "      <td>No</td>\n",
  79 |        "      <td>Sun</td>\n",
  80 |        "      <td>Dinner</td>\n",
  81 |        "      <td>2</td>\n",
  82 |        "    </tr>\n",
  83 |        "    <tr>\n",
  84 |        "      <th>1</th>\n",
  85 |        "      <td>10.34</td>\n",
  86 |        "      <td>1.66</td>\n",
  87 |        "      <td>Male</td>\n",
  88 |        "      <td>No</td>\n",
  89 |        "      <td>Sun</td>\n",
  90 |        "      <td>Dinner</td>\n",
  91 |        "      <td>3</td>\n",
  92 |        "    </tr>\n",
  93 |        "    <tr>\n",
  94 |        "      <th>2</th>\n",
  95 |        "      <td>21.01</td>\n",
  96 |        "      <td>3.50</td>\n",
  97 |        "      <td>Male</td>\n",
  98 |        "      <td>No</td>\n",
  99 |        "      <td>Sun</td>\n",
 100 |        "      <td>Dinner</td>\n",
 101 |        "      <td>3</td>\n",
 102 |        "    </tr>\n",
 103 |        "  </tbody>\n",
 104 |        "</table>\n",
 105 |        "</div>"
 106 |       ],
 107 |       "text/plain": [
 108 |        "   total_bill   tip     sex smoker  day    time  size\n",
 109 |        "0       16.99  1.01  Female     No  Sun  Dinner     2\n",
 110 |        "1       10.34  1.66    Male     No  Sun  Dinner     3\n",
 111 |        "2       21.01  3.50    Male     No  Sun  Dinner     3"
 112 |       ]
 113 |      },
 114 |      "execution_count": 4,
 115 |      "metadata": {},
 116 |      "output_type": "execute_result"
 117 |     }
 118 |    ],
 119 |    "source": [
 120 |     "tips = sns.load_dataset('tips')\n",
 121 |     "tips.head(3)"
 122 |    ]
 123 |   },
 124 |   {
 125 |    "cell_type": "markdown",
 126 |    "metadata": {},
 127 |    "source": [
 128 |     "### Groupby\n",
 129 |     "\n",
 130 |     "A grouped operation starts by specifying which groups of data that we would want to operate over. There are many ways of making groupsm, but the tool that pandas uses to make groups of data, is `groupby`"
 131 |    ]
 132 |   },
 133 |   {
 134 |    "cell_type": "code",
 135 |    "execution_count": 5,
 136 |    "metadata": {},
 137 |    "outputs": [
 138 |     {
 139 |      "data": {
 140 |       "text/plain": [
 141 |        "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x11ff27d30>"
 142 |       ]
 143 |      },
 144 |      "execution_count": 5,
 145 |      "metadata": {},
 146 |      "output_type": "execute_result"
 147 |     }
 148 |    ],
 149 |    "source": [
 150 |     "tips_gb = tips.groupby(['sex', 'smoker'])\n",
 151 |     "tips_gb"
 152 |    ]
 153 |   },
 154 |   {
 155 |    "cell_type": "markdown",
 156 |    "metadata": {},
 157 |    "source": [
 158 |     "Groupby works by telling pandas a couple of columns. Pandas will look in your data and see every unique combination of the columns that you specify. Each unique combination is a group. So in this case we will have four groups: male smoker, female smoker, male non-smoker, female non-smoker.\n",
 159 |     "\n",
 160 |     "The groupby object by itself is not super important."
 161 |    ]
 162 |   },
 163 |   {
 164 |    "cell_type": "markdown",
 165 |    "metadata": {},
 166 |    "source": [
 167 |     "Once we have these groups (specified in the groupby object), we can do three types of operations on it (with the most important being agg)\n",
 168 |     "\n",
 169 |     "### Agg\n",
 170 |     "\n",
 171 |     "The aggregate operation aggregates all the data in these groups into one value. You use a dictionary to specify which values you'd like. For example look below, we are asking for both the mean and the min value of the tip column for each group:"
 172 |    ]
 173 |   },
 174 |   {
 175 |    "cell_type": "code",
 176 |    "execution_count": 6,
 177 |    "metadata": {},
 178 |    "outputs": [
 179 |     {
 180 |      "data": {
 181 |       "text/html": [
 182 |        "<div>\n",
 183 |        "<style scoped>\n",
 184 |        "    .dataframe tbody tr th:only-of-type {\n",
 185 |        "        vertical-align: middle;\n",
 186 |        "    }\n",
 187 |        "\n",
 188 |        "    .dataframe tbody tr th {\n",
 189 |        "        vertical-align: top;\n",
 190 |        "    }\n",
 191 |        "\n",
 192 |        "    .dataframe thead tr th {\n",
 193 |        "        text-align: left;\n",
 194 |        "    }\n",
 195 |        "\n",
 196 |        "    .dataframe thead tr:last-of-type th {\n",
 197 |        "        text-align: right;\n",
 198 |        "    }\n",
 199 |        "</style>\n",
 200 |        "<table border=\"1\" class=\"dataframe\">\n",
 201 |        "  <thead>\n",
 202 |        "    <tr>\n",
 203 |        "      <th></th>\n",
 204 |        "      <th></th>\n",
 205 |        "      <th colspan=\"2\" halign=\"left\">tip</th>\n",
 206 |        "      <th>day</th>\n",
 207 |        "      <th>total_bill</th>\n",
 208 |        "    </tr>\n",
 209 |        "    <tr>\n",
 210 |        "      <th></th>\n",
 211 |        "      <th></th>\n",
 212 |        "      <th>mean</th>\n",
 213 |        "      <th>min</th>\n",
 214 |        "      <th>first</th>\n",
 215 |        "      <th>size</th>\n",
 216 |        "    </tr>\n",
 217 |        "    <tr>\n",
 218 |        "      <th>sex</th>\n",
 219 |        "      <th>smoker</th>\n",
 220 |        "      <th></th>\n",
 221 |        "      <th></th>\n",
 222 |        "      <th></th>\n",
 223 |        "      <th></th>\n",
 224 |        "    </tr>\n",
 225 |        "  </thead>\n",
 226 |        "  <tbody>\n",
 227 |        "    <tr>\n",
 228 |        "      <th rowspan=\"2\" valign=\"top\">Male</th>\n",
 229 |        "      <th>Yes</th>\n",
 230 |        "      <td>3.051167</td>\n",
 231 |        "      <td>1.00</td>\n",
 232 |        "      <td>Sat</td>\n",
 233 |        "      <td>60</td>\n",
 234 |        "    </tr>\n",
 235 |        "    <tr>\n",
 236 |        "      <th>No</th>\n",
 237 |        "      <td>3.113402</td>\n",
 238 |        "      <td>1.25</td>\n",
 239 |        "      <td>Sun</td>\n",
 240 |        "      <td>97</td>\n",
 241 |        "    </tr>\n",
 242 |        "    <tr>\n",
 243 |        "      <th rowspan=\"2\" valign=\"top\">Female</th>\n",
 244 |        "      <th>Yes</th>\n",
 245 |        "      <td>2.931515</td>\n",
 246 |        "      <td>1.00</td>\n",
 247 |        "      <td>Sat</td>\n",
 248 |        "      <td>33</td>\n",
 249 |        "    </tr>\n",
 250 |        "    <tr>\n",
 251 |        "      <th>No</th>\n",
 252 |        "      <td>2.773519</td>\n",
 253 |        "      <td>1.00</td>\n",
 254 |        "      <td>Sun</td>\n",
 255 |        "      <td>54</td>\n",
 256 |        "    </tr>\n",
 257 |        "  </tbody>\n",
 258 |        "</table>\n",
 259 |        "</div>"
 260 |       ],
 261 |       "text/plain": [
 262 |        "                    tip         day total_bill\n",
 263 |        "                   mean   min first       size\n",
 264 |        "sex    smoker                                 \n",
 265 |        "Male   Yes     3.051167  1.00   Sat         60\n",
 266 |        "       No      3.113402  1.25   Sun         97\n",
 267 |        "Female Yes     2.931515  1.00   Sat         33\n",
 268 |        "       No      2.773519  1.00   Sun         54"
 269 |       ]
 270 |      },
 271 |      "execution_count": 6,
 272 |      "metadata": {},
 273 |      "output_type": "execute_result"
 274 |     }
 275 |    ],
 276 |    "source": [
 277 |     "tips_agg = tips_gb.agg({\n",
 278 |     "    'tip': ['mean', 'min'],\n",
 279 |     "    'day': 'first',\n",
 280 |     "    'total_bill': 'size'\n",
 281 |     "})\n",
 282 |     "\n",
 283 |     "tips_agg"
 284 |    ]
 285 |   },
 286 |   {
 287 |    "cell_type": "markdown",
 288 |    "metadata": {},
 289 |    "source": [
 290 |     "So notice that we get both a multi-index for both the index and the columns. We can always get rid of the multi-index with a `reset_index` (see [indexing and selecting](https://github.com/knathanieltucker/pandas-tutorial/blob/master/notebooks/Indexing%20and%20Selecting.ipynb) for more details):"
 291 |    ]
 292 |   },
 293 |   {
 294 |    "cell_type": "code",
 295 |    "execution_count": 7,
 296 |    "metadata": {},
 297 |    "outputs": [
 298 |     {
 299 |      "data": {
 300 |       "text/html": [
 301 |        "<div>\n",
 302 |        "<style scoped>\n",
 303 |        "    .dataframe tbody tr th:only-of-type {\n",
 304 |        "        vertical-align: middle;\n",
 305 |        "    }\n",
 306 |        "\n",
 307 |        "    .dataframe tbody tr th {\n",
 308 |        "        vertical-align: top;\n",
 309 |        "    }\n",
 310 |        "\n",
 311 |        "    .dataframe thead tr th {\n",
 312 |        "        text-align: left;\n",
 313 |        "    }\n",
 314 |        "</style>\n",
 315 |        "<table border=\"1\" class=\"dataframe\">\n",
 316 |        "  <thead>\n",
 317 |        "    <tr>\n",
 318 |        "      <th></th>\n",
 319 |        "      <th>sex</th>\n",
 320 |        "      <th>smoker</th>\n",
 321 |        "      <th colspan=\"2\" halign=\"left\">tip</th>\n",
 322 |        "      <th>day</th>\n",
 323 |        "      <th>total_bill</th>\n",
 324 |        "    </tr>\n",
 325 |        "    <tr>\n",
 326 |        "      <th></th>\n",
 327 |        "      <th></th>\n",
 328 |        "      <th></th>\n",
 329 |        "      <th>mean</th>\n",
 330 |        "      <th>min</th>\n",
 331 |        "      <th>first</th>\n",
 332 |        "      <th>size</th>\n",
 333 |        "    </tr>\n",
 334 |        "  </thead>\n",
 335 |        "  <tbody>\n",
 336 |        "    <tr>\n",
 337 |        "      <th>0</th>\n",
 338 |        "      <td>Male</td>\n",
 339 |        "      <td>Yes</td>\n",
 340 |        "      <td>3.051167</td>\n",
 341 |        "      <td>1.00</td>\n",
 342 |        "      <td>Sat</td>\n",
 343 |        "      <td>60</td>\n",
 344 |        "    </tr>\n",
 345 |        "    <tr>\n",
 346 |        "      <th>1</th>\n",
 347 |        "      <td>Male</td>\n",
 348 |        "      <td>No</td>\n",
 349 |        "      <td>3.113402</td>\n",
 350 |        "      <td>1.25</td>\n",
 351 |        "      <td>Sun</td>\n",
 352 |        "      <td>97</td>\n",
 353 |        "    </tr>\n",
 354 |        "    <tr>\n",
 355 |        "      <th>2</th>\n",
 356 |        "      <td>Female</td>\n",
 357 |        "      <td>Yes</td>\n",
 358 |        "      <td>2.931515</td>\n",
 359 |        "      <td>1.00</td>\n",
 360 |        "      <td>Sat</td>\n",
 361 |        "      <td>33</td>\n",
 362 |        "    </tr>\n",
 363 |        "    <tr>\n",
 364 |        "      <th>3</th>\n",
 365 |        "      <td>Female</td>\n",
 366 |        "      <td>No</td>\n",
 367 |        "      <td>2.773519</td>\n",
 368 |        "      <td>1.00</td>\n",
 369 |        "      <td>Sun</td>\n",
 370 |        "      <td>54</td>\n",
 371 |        "    </tr>\n",
 372 |        "  </tbody>\n",
 373 |        "</table>\n",
 374 |        "</div>"
 375 |       ],
 376 |       "text/plain": [
 377 |        "      sex smoker       tip         day total_bill\n",
 378 |        "                      mean   min first       size\n",
 379 |        "0    Male    Yes  3.051167  1.00   Sat         60\n",
 380 |        "1    Male     No  3.113402  1.25   Sun         97\n",
 381 |        "2  Female    Yes  2.931515  1.00   Sat         33\n",
 382 |        "3  Female     No  2.773519  1.00   Sun         54"
 383 |       ]
 384 |      },
 385 |      "execution_count": 7,
 386 |      "metadata": {},
 387 |      "output_type": "execute_result"
 388 |     }
 389 |    ],
 390 |    "source": [
 391 |     "tips_agg.reset_index()"
 392 |    ]
 393 |   },
 394 |   {
 395 |    "cell_type": "markdown",
 396 |    "metadata": {},
 397 |    "source": [
 398 |     "And we can either use stacking or our column trick to get rid of the column nonsense:"
 399 |    ]
 400 |   },
 401 |   {
 402 |    "cell_type": "code",
 403 |    "execution_count": 8,
 404 |    "metadata": {},
 405 |    "outputs": [
 406 |     {
 407 |      "data": {
 408 |       "text/plain": [
 409 |        "MultiIndex(levels=[['tip', 'day', 'total_bill'], ['first', 'mean', 'min', 'size']],\n",
 410 |        "           codes=[[0, 0, 1, 2], [1, 2, 0, 3]])"
 411 |       ]
 412 |      },
 413 |      "execution_count": 8,
 414 |      "metadata": {},
 415 |      "output_type": "execute_result"
 416 |     }
 417 |    ],
 418 |    "source": [
 419 |     "# before\n",
 420 |     "tips_agg.columns"
 421 |    ]
 422 |   },
 423 |   {
 424 |    "cell_type": "code",
 425 |    "execution_count": 9,
 426 |    "metadata": {},
 427 |    "outputs": [
 428 |     {
 429 |      "data": {
 430 |       "text/html": [
 431 |        "<div>\n",
 432 |        "<style scoped>\n",
 433 |        "    .dataframe tbody tr th:only-of-type {\n",
 434 |        "        vertical-align: middle;\n",
 435 |        "    }\n",
 436 |        "\n",
 437 |        "    .dataframe tbody tr th {\n",
 438 |        "        vertical-align: top;\n",
 439 |        "    }\n",
 440 |        "\n",
 441 |        "    .dataframe thead th {\n",
 442 |        "        text-align: right;\n",
 443 |        "    }\n",
 444 |        "</style>\n",
 445 |        "<table border=\"1\" class=\"dataframe\">\n",
 446 |        "  <thead>\n",
 447 |        "    <tr style=\"text-align: right;\">\n",
 448 |        "      <th></th>\n",
 449 |        "      <th></th>\n",
 450 |        "      <th></th>\n",
 451 |        "      <th>tip</th>\n",
 452 |        "      <th>day</th>\n",
 453 |        "      <th>total_bill</th>\n",
 454 |        "    </tr>\n",
 455 |        "    <tr>\n",
 456 |        "      <th>sex</th>\n",
 457 |        "      <th>smoker</th>\n",
 458 |        "      <th></th>\n",
 459 |        "      <th></th>\n",
 460 |        "      <th></th>\n",
 461 |        "      <th></th>\n",
 462 |        "    </tr>\n",
 463 |        "  </thead>\n",
 464 |        "  <tbody>\n",
 465 |        "    <tr>\n",
 466 |        "      <th rowspan=\"8\" valign=\"top\">Male</th>\n",
 467 |        "      <th rowspan=\"4\" valign=\"top\">Yes</th>\n",
 468 |        "      <th>first</th>\n",
 469 |        "      <td>NaN</td>\n",
 470 |        "      <td>Sat</td>\n",
 471 |        "      <td>NaN</td>\n",
 472 |        "    </tr>\n",
 473 |        "    <tr>\n",
 474 |        "      <th>mean</th>\n",
 475 |        "      <td>3.051167</td>\n",
 476 |        "      <td>NaN</td>\n",
 477 |        "      <td>NaN</td>\n",
 478 |        "    </tr>\n",
 479 |        "    <tr>\n",
 480 |        "      <th>min</th>\n",
 481 |        "      <td>1.000000</td>\n",
 482 |        "      <td>NaN</td>\n",
 483 |        "      <td>NaN</td>\n",
 484 |        "    </tr>\n",
 485 |        "    <tr>\n",
 486 |        "      <th>size</th>\n",
 487 |        "      <td>NaN</td>\n",
 488 |        "      <td>NaN</td>\n",
 489 |        "      <td>60.0</td>\n",
 490 |        "    </tr>\n",
 491 |        "    <tr>\n",
 492 |        "      <th rowspan=\"4\" valign=\"top\">No</th>\n",
 493 |        "      <th>first</th>\n",
 494 |        "      <td>NaN</td>\n",
 495 |        "      <td>Sun</td>\n",
 496 |        "      <td>NaN</td>\n",
 497 |        "    </tr>\n",
 498 |        "    <tr>\n",
 499 |        "      <th>mean</th>\n",
 500 |        "      <td>3.113402</td>\n",
 501 |        "      <td>NaN</td>\n",
 502 |        "      <td>NaN</td>\n",
 503 |        "    </tr>\n",
 504 |        "    <tr>\n",
 505 |        "      <th>min</th>\n",
 506 |        "      <td>1.250000</td>\n",
 507 |        "      <td>NaN</td>\n",
 508 |        "      <td>NaN</td>\n",
 509 |        "    </tr>\n",
 510 |        "    <tr>\n",
 511 |        "      <th>size</th>\n",
 512 |        "      <td>NaN</td>\n",
 513 |        "      <td>NaN</td>\n",
 514 |        "      <td>97.0</td>\n",
 515 |        "    </tr>\n",
 516 |        "    <tr>\n",
 517 |        "      <th rowspan=\"8\" valign=\"top\">Female</th>\n",
 518 |        "      <th rowspan=\"4\" valign=\"top\">Yes</th>\n",
 519 |        "      <th>first</th>\n",
 520 |        "      <td>NaN</td>\n",
 521 |        "      <td>Sat</td>\n",
 522 |        "      <td>NaN</td>\n",
 523 |        "    </tr>\n",
 524 |        "    <tr>\n",
 525 |        "      <th>mean</th>\n",
 526 |        "      <td>2.931515</td>\n",
 527 |        "      <td>NaN</td>\n",
 528 |        "      <td>NaN</td>\n",
 529 |        "    </tr>\n",
 530 |        "    <tr>\n",
 531 |        "      <th>min</th>\n",
 532 |        "      <td>1.000000</td>\n",
 533 |        "      <td>NaN</td>\n",
 534 |        "      <td>NaN</td>\n",
 535 |        "    </tr>\n",
 536 |        "    <tr>\n",
 537 |        "      <th>size</th>\n",
 538 |        "      <td>NaN</td>\n",
 539 |        "      <td>NaN</td>\n",
 540 |        "      <td>33.0</td>\n",
 541 |        "    </tr>\n",
 542 |        "    <tr>\n",
 543 |        "      <th rowspan=\"4\" valign=\"top\">No</th>\n",
 544 |        "      <th>first</th>\n",
 545 |        "      <td>NaN</td>\n",
 546 |        "      <td>Sun</td>\n",
 547 |        "      <td>NaN</td>\n",
 548 |        "    </tr>\n",
 549 |        "    <tr>\n",
 550 |        "      <th>mean</th>\n",
 551 |        "      <td>2.773519</td>\n",
 552 |        "      <td>NaN</td>\n",
 553 |        "      <td>NaN</td>\n",
 554 |        "    </tr>\n",
 555 |        "    <tr>\n",
 556 |        "      <th>min</th>\n",
 557 |        "      <td>1.000000</td>\n",
 558 |        "      <td>NaN</td>\n",
 559 |        "      <td>NaN</td>\n",
 560 |        "    </tr>\n",
 561 |        "    <tr>\n",
 562 |        "      <th>size</th>\n",
 563 |        "      <td>NaN</td>\n",
 564 |        "      <td>NaN</td>\n",
 565 |        "      <td>54.0</td>\n",
 566 |        "    </tr>\n",
 567 |        "  </tbody>\n",
 568 |        "</table>\n",
 569 |        "</div>"
 570 |       ],
 571 |       "text/plain": [
 572 |        "                          tip  day  total_bill\n",
 573 |        "sex    smoker                                 \n",
 574 |        "Male   Yes    first       NaN  Sat         NaN\n",
 575 |        "              mean   3.051167  NaN         NaN\n",
 576 |        "              min    1.000000  NaN         NaN\n",
 577 |        "              size        NaN  NaN        60.0\n",
 578 |        "       No     first       NaN  Sun         NaN\n",
 579 |        "              mean   3.113402  NaN         NaN\n",
 580 |        "              min    1.250000  NaN         NaN\n",
 581 |        "              size        NaN  NaN        97.0\n",
 582 |        "Female Yes    first       NaN  Sat         NaN\n",
 583 |        "              mean   2.931515  NaN         NaN\n",
 584 |        "              min    1.000000  NaN         NaN\n",
 585 |        "              size        NaN  NaN        33.0\n",
 586 |        "       No     first       NaN  Sun         NaN\n",
 587 |        "              mean   2.773519  NaN         NaN\n",
 588 |        "              min    1.000000  NaN         NaN\n",
 589 |        "              size        NaN  NaN        54.0"
 590 |       ]
 591 |      },
 592 |      "execution_count": 9,
 593 |      "metadata": {},
 594 |      "output_type": "execute_result"
 595 |     }
 596 |    ],
 597 |    "source": [
 598 |     "tips_agg.stack()"
 599 |    ]
 600 |   },
 601 |   {
 602 |    "cell_type": "code",
 603 |    "execution_count": 10,
 604 |    "metadata": {},
 605 |    "outputs": [
 606 |     {
 607 |      "data": {
 608 |       "text/plain": [
 609 |        "Index(['tip__mean', 'tip__min', 'day__first', 'total_bill__size'], dtype='object')"
 610 |       ]
 611 |      },
 612 |      "execution_count": 10,
 613 |      "metadata": {},
 614 |      "output_type": "execute_result"
 615 |     }
 616 |    ],
 617 |    "source": [
 618 |     "tips_agg.columns = ['__'.join(col).strip() for col in tips_agg.columns.values]\n",
 619 |     "tips_agg.columns"
 620 |    ]
 621 |   },
 622 |   {
 623 |    "cell_type": "code",
 624 |    "execution_count": 11,
 625 |    "metadata": {},
 626 |    "outputs": [
 627 |     {
 628 |      "data": {
 629 |       "text/html": [
 630 |        "<div>\n",
 631 |        "<style scoped>\n",
 632 |        "    .dataframe tbody tr th:only-of-type {\n",
 633 |        "        vertical-align: middle;\n",
 634 |        "    }\n",
 635 |        "\n",
 636 |        "    .dataframe tbody tr th {\n",
 637 |        "        vertical-align: top;\n",
 638 |        "    }\n",
 639 |        "\n",
 640 |        "    .dataframe thead th {\n",
 641 |        "        text-align: right;\n",
 642 |        "    }\n",
 643 |        "</style>\n",
 644 |        "<table border=\"1\" class=\"dataframe\">\n",
 645 |        "  <thead>\n",
 646 |        "    <tr style=\"text-align: right;\">\n",
 647 |        "      <th></th>\n",
 648 |        "      <th></th>\n",
 649 |        "      <th>tip__mean</th>\n",
 650 |        "      <th>tip__min</th>\n",
 651 |        "      <th>day__first</th>\n",
 652 |        "      <th>total_bill__size</th>\n",
 653 |        "    </tr>\n",
 654 |        "    <tr>\n",
 655 |        "      <th>sex</th>\n",
 656 |        "      <th>smoker</th>\n",
 657 |        "      <th></th>\n",
 658 |        "      <th></th>\n",
 659 |        "      <th></th>\n",
 660 |        "      <th></th>\n",
 661 |        "    </tr>\n",
 662 |        "  </thead>\n",
 663 |        "  <tbody>\n",
 664 |        "    <tr>\n",
 665 |        "      <th rowspan=\"2\" valign=\"top\">Male</th>\n",
 666 |        "      <th>Yes</th>\n",
 667 |        "      <td>3.051167</td>\n",
 668 |        "      <td>1.00</td>\n",
 669 |        "      <td>Sat</td>\n",
 670 |        "      <td>60</td>\n",
 671 |        "    </tr>\n",
 672 |        "    <tr>\n",
 673 |        "      <th>No</th>\n",
 674 |        "      <td>3.113402</td>\n",
 675 |        "      <td>1.25</td>\n",
 676 |        "      <td>Sun</td>\n",
 677 |        "      <td>97</td>\n",
 678 |        "    </tr>\n",
 679 |        "    <tr>\n",
 680 |        "      <th rowspan=\"2\" valign=\"top\">Female</th>\n",
 681 |        "      <th>Yes</th>\n",
 682 |        "      <td>2.931515</td>\n",
 683 |        "      <td>1.00</td>\n",
 684 |        "      <td>Sat</td>\n",
 685 |        "      <td>33</td>\n",
 686 |        "    </tr>\n",
 687 |        "    <tr>\n",
 688 |        "      <th>No</th>\n",
 689 |        "      <td>2.773519</td>\n",
 690 |        "      <td>1.00</td>\n",
 691 |        "      <td>Sun</td>\n",
 692 |        "      <td>54</td>\n",
 693 |        "    </tr>\n",
 694 |        "  </tbody>\n",
 695 |        "</table>\n",
 696 |        "</div>"
 697 |       ],
 698 |       "text/plain": [
 699 |        "               tip__mean  tip__min day__first  total_bill__size\n",
 700 |        "sex    smoker                                                  \n",
 701 |        "Male   Yes      3.051167      1.00        Sat                60\n",
 702 |        "       No       3.113402      1.25        Sun                97\n",
 703 |        "Female Yes      2.931515      1.00        Sat                33\n",
 704 |        "       No       2.773519      1.00        Sun                54"
 705 |       ]
 706 |      },
 707 |      "execution_count": 11,
 708 |      "metadata": {},
 709 |      "output_type": "execute_result"
 710 |     }
 711 |    ],
 712 |    "source": [
 713 |     "tips_agg"
 714 |    ]
 715 |   },
 716 |   {
 717 |    "cell_type": "markdown",
 718 |    "metadata": {},
 719 |    "source": [
 720 |     "That is about it for the aggregation, you can find some common aggregation functions listed [here](http://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#aggregation)"
 721 |    ]
 722 |   },
 723 |   {
 724 |    "cell_type": "markdown",
 725 |    "metadata": {},
 726 |    "source": [
 727 |     "### Filter\n",
 728 |     "\n",
 729 |     "The next common group operation is a filter. This one is pretty simple, we filter out member of groups that don't meet our criteria.\n",
 730 |     "\n",
 731 |     "For example let's only look at the least busy times the place is open. One way we might do that is exclude all times above the median from the analysis"
 732 |    ]
 733 |   },
 734 |   {
 735 |    "cell_type": "code",
 736 |    "execution_count": 53,
 737 |    "metadata": {},
 738 |    "outputs": [],
 739 |    "source": [
 740 |     "# we use the exact same groupby syntax\n",
 741 |     "tips_gb = tips.groupby(['day', 'time'])"
 742 |    ]
 743 |   },
 744 |   {
 745 |    "cell_type": "code",
 746 |    "execution_count": 54,
 747 |    "metadata": {},
 748 |    "outputs": [],
 749 |    "source": [
 750 |     "median_size = tips_gb.agg({'size': 'sum'}).median()[0]"
 751 |    ]
 752 |   },
 753 |   {
 754 |    "cell_type": "code",
 755 |    "execution_count": 56,
 756 |    "metadata": {},
 757 |    "outputs": [
 758 |     {
 759 |      "data": {
 760 |       "text/html": [
 761 |        "<div>\n",
 762 |        "<style scoped>\n",
 763 |        "    .dataframe tbody tr th:only-of-type {\n",
 764 |        "        vertical-align: middle;\n",
 765 |        "    }\n",
 766 |        "\n",
 767 |        "    .dataframe tbody tr th {\n",
 768 |        "        vertical-align: top;\n",
 769 |        "    }\n",
 770 |        "\n",
 771 |        "    .dataframe thead th {\n",
 772 |        "        text-align: right;\n",
 773 |        "    }\n",
 774 |        "</style>\n",
 775 |        "<table border=\"1\" class=\"dataframe\">\n",
 776 |        "  <thead>\n",
 777 |        "    <tr style=\"text-align: right;\">\n",
 778 |        "      <th></th>\n",
 779 |        "      <th>total_bill</th>\n",
 780 |        "      <th>tip</th>\n",
 781 |        "      <th>sex</th>\n",
 782 |        "      <th>smoker</th>\n",
 783 |        "      <th>day</th>\n",
 784 |        "      <th>time</th>\n",
 785 |        "      <th>size</th>\n",
 786 |        "    </tr>\n",
 787 |        "  </thead>\n",
 788 |        "  <tbody>\n",
 789 |        "    <tr>\n",
 790 |        "      <th>90</th>\n",
 791 |        "      <td>28.97</td>\n",
 792 |        "      <td>3.00</td>\n",
 793 |        "      <td>Male</td>\n",
 794 |        "      <td>Yes</td>\n",
 795 |        "      <td>Fri</td>\n",
 796 |        "      <td>Dinner</td>\n",
 797 |        "      <td>2</td>\n",
 798 |        "    </tr>\n",
 799 |        "    <tr>\n",
 800 |        "      <th>91</th>\n",
 801 |        "      <td>22.49</td>\n",
 802 |        "      <td>3.50</td>\n",
 803 |        "      <td>Male</td>\n",
 804 |        "      <td>No</td>\n",
 805 |        "      <td>Fri</td>\n",
 806 |        "      <td>Dinner</td>\n",
 807 |        "      <td>2</td>\n",
 808 |        "    </tr>\n",
 809 |        "    <tr>\n",
 810 |        "      <th>92</th>\n",
 811 |        "      <td>5.75</td>\n",
 812 |        "      <td>1.00</td>\n",
 813 |        "      <td>Female</td>\n",
 814 |        "      <td>Yes</td>\n",
 815 |        "      <td>Fri</td>\n",
 816 |        "      <td>Dinner</td>\n",
 817 |        "      <td>2</td>\n",
 818 |        "    </tr>\n",
 819 |        "    <tr>\n",
 820 |        "      <th>93</th>\n",
 821 |        "      <td>16.32</td>\n",
 822 |        "      <td>4.30</td>\n",
 823 |        "      <td>Female</td>\n",
 824 |        "      <td>Yes</td>\n",
 825 |        "      <td>Fri</td>\n",
 826 |        "      <td>Dinner</td>\n",
 827 |        "      <td>2</td>\n",
 828 |        "    </tr>\n",
 829 |        "    <tr>\n",
 830 |        "      <th>94</th>\n",
 831 |        "      <td>22.75</td>\n",
 832 |        "      <td>3.25</td>\n",
 833 |        "      <td>Female</td>\n",
 834 |        "      <td>No</td>\n",
 835 |        "      <td>Fri</td>\n",
 836 |        "      <td>Dinner</td>\n",
 837 |        "      <td>2</td>\n",
 838 |        "    </tr>\n",
 839 |        "  </tbody>\n",
 840 |        "</table>\n",
 841 |        "</div>"
 842 |       ],
 843 |       "text/plain": [
 844 |        "    total_bill   tip     sex smoker  day    time  size\n",
 845 |        "90       28.97  3.00    Male    Yes  Fri  Dinner     2\n",
 846 |        "91       22.49  3.50    Male     No  Fri  Dinner     2\n",
 847 |        "92        5.75  1.00  Female    Yes  Fri  Dinner     2\n",
 848 |        "93       16.32  4.30  Female    Yes  Fri  Dinner     2\n",
 849 |        "94       22.75  3.25  Female     No  Fri  Dinner     2"
 850 |       ]
 851 |      },
 852 |      "execution_count": 56,
 853 |      "metadata": {},
 854 |      "output_type": "execute_result"
 855 |     }
 856 |    ],
 857 |    "source": [
 858 |     "# notice that we carved out quite a few rows\n",
 859 |     "tips_gb.filter(lambda group: group['size'].sum() < median_size).head()"
 860 |    ]
 861 |   },
 862 |   {
 863 |    "cell_type": "markdown",
 864 |    "metadata": {},
 865 |    "source": [
 866 |     "That's honestly about it. I don't use this functionality too much, but it's pretty simple and I don't think it complicates things too much, so may as well throw it in."
 867 |    ]
 868 |   },
 869 |   {
 870 |    "cell_type": "markdown",
 871 |    "metadata": {},
 872 |    "source": [
 873 |     "### Transform\n",
 874 |     "\n",
 875 |     "The final group operation is transform. This uses group information to apply transformations to individual data points. For example look below: each day let's divide by the bill and tip by the average amount spent on that day. That way we can look at how much that bill differs from the average of that day"
 876 |    ]
 877 |   },
 878 |   {
 879 |    "cell_type": "code",
 880 |    "execution_count": 57,
 881 |    "metadata": {},
 882 |    "outputs": [],
 883 |    "source": [
 884 |     "tips_gb = tips.groupby(['day'])"
 885 |    ]
 886 |   },
 887 |   {
 888 |    "cell_type": "code",
 889 |    "execution_count": 58,
 890 |    "metadata": {},
 891 |    "outputs": [
 892 |     {
 893 |      "data": {
 894 |       "text/html": [
 895 |        "<div>\n",
 896 |        "<style scoped>\n",
 897 |        "    .dataframe tbody tr th:only-of-type {\n",
 898 |        "        vertical-align: middle;\n",
 899 |        "    }\n",
 900 |        "\n",
 901 |        "    .dataframe tbody tr th {\n",
 902 |        "        vertical-align: top;\n",
 903 |        "    }\n",
 904 |        "\n",
 905 |        "    .dataframe thead th {\n",
 906 |        "        text-align: right;\n",
 907 |        "    }\n",
 908 |        "</style>\n",
 909 |        "<table border=\"1\" class=\"dataframe\">\n",
 910 |        "  <thead>\n",
 911 |        "    <tr style=\"text-align: right;\">\n",
 912 |        "      <th></th>\n",
 913 |        "      <th>total_bill</th>\n",
 914 |        "      <th>tip</th>\n",
 915 |        "    </tr>\n",
 916 |        "  </thead>\n",
 917 |        "  <tbody>\n",
 918 |        "    <tr>\n",
 919 |        "      <th>0</th>\n",
 920 |        "      <td>0.793554</td>\n",
 921 |        "      <td>0.310279</td>\n",
 922 |        "    </tr>\n",
 923 |        "    <tr>\n",
 924 |        "      <th>1</th>\n",
 925 |        "      <td>0.482952</td>\n",
 926 |        "      <td>0.509964</td>\n",
 927 |        "    </tr>\n",
 928 |        "    <tr>\n",
 929 |        "      <th>2</th>\n",
 930 |        "      <td>0.981317</td>\n",
 931 |        "      <td>1.075225</td>\n",
 932 |        "    </tr>\n",
 933 |        "    <tr>\n",
 934 |        "      <th>3</th>\n",
 935 |        "      <td>1.106025</td>\n",
 936 |        "      <td>1.016856</td>\n",
 937 |        "    </tr>\n",
 938 |        "    <tr>\n",
 939 |        "      <th>4</th>\n",
 940 |        "      <td>1.148529</td>\n",
 941 |        "      <td>1.109018</td>\n",
 942 |        "    </tr>\n",
 943 |        "  </tbody>\n",
 944 |        "</table>\n",
 945 |        "</div>"
 946 |       ],
 947 |       "text/plain": [
 948 |        "   total_bill       tip\n",
 949 |        "0    0.793554  0.310279\n",
 950 |        "1    0.482952  0.509964\n",
 951 |        "2    0.981317  1.075225\n",
 952 |        "3    1.106025  1.016856\n",
 953 |        "4    1.148529  1.109018"
 954 |       ]
 955 |      },
 956 |      "execution_count": 58,
 957 |      "metadata": {},
 958 |      "output_type": "execute_result"
 959 |     }
 960 |    ],
 961 |    "source": [
 962 |     "tips_gb[['total_bill', 'tip']].transform(lambda x: x / x.mean()).head()"
 963 |    ]
 964 |   },
 965 |   {
 966 |    "cell_type": "markdown",
 967 |    "metadata": {},
 968 |    "source": [
 969 |     "I think I have only ever used this function for normalization, but it is pretty straight forwards and intuitive, so I'm fine with the added flexibility."
 970 |    ]
 971 |   },
 972 |   {
 973 |    "cell_type": "markdown",
 974 |    "metadata": {},
 975 |    "source": [
 976 |     "## Conclusion\n",
 977 |     "\n",
 978 |     "This is about it for understanding pandas group operations. As always check out some of the [exercises on this topic](https://github.com/guipsamora/pandas_exercises#grouping), you should be able to do them with ease.\n",
 979 |     "\n",
 980 |     "As a final note, understanding the groupby and agg functions is critical to using pandas effectively. The transform and filter are nice, but you could probably get by without them."
 981 |    ]
 982 |   },
 983 |   {
 984 |    "cell_type": "code",
 985 |    "execution_count": null,
 986 |    "metadata": {},
 987 |    "outputs": [],
 988 |    "source": []
 989 |   }
 990 |  ],
 991 |  "metadata": {
 992 |   "kernelspec": {
 993 |    "display_name": "Python 3",
 994 |    "language": "python",
 995 |    "name": "python3"
 996 |   },
 997 |   "language_info": {
 998 |    "codemirror_mode": {
 999 |     "name": "ipython",
1000 |     "version": 3
1001 |    },
1002 |    "file_extension": ".py",
1003 |    "mimetype": "text/x-python",
1004 |    "name": "python",
1005 |    "nbconvert_exporter": "python",
1006 |    "pygments_lexer": "ipython3",
1007 |    "version": "3.7.3"
1008 |   }
1009 |  },
1010 |  "nbformat": 4,
1011 |  "nbformat_minor": 2
1012 | }
1013 | 


--------------------------------------------------------------------------------
/notebooks/Indexing and Selecting.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "code",
   5 |    "execution_count": 1,
   6 |    "metadata": {},
   7 |    "outputs": [],
   8 |    "source": [
   9 |     "import seaborn as sns\n",
  10 |     "import pandas as pd\n",
  11 |     "import numpy as np"
  12 |    ]
  13 |   },
  14 |   {
  15 |    "cell_type": "markdown",
  16 |    "metadata": {},
  17 |    "source": [
  18 |     "# Pandas Indexing and Selecting\n",
  19 |     "\n",
  20 |     "Let's talk about slicing and dicing pandas data. We are going to be going over four topics today:\n",
  21 |     "\n",
  22 |     "* Review the basics\n",
  23 |     "* Multi-index\n",
  24 |     "* Getting Single Values\n",
  25 |     "* Pointing out some stuff you don't need to worry about\n",
  26 |     "\n",
  27 |     "As always you can check out the full documentation: [basic indexing](http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html) and [advanced indexing](http://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html). But be warned that they are very long and tell you way more than you'd need to know :)"
  28 |    ]
  29 |   },
  30 |   {
  31 |    "cell_type": "markdown",
  32 |    "metadata": {},
  33 |    "source": [
  34 |     "## Review the Basics\n",
  35 |     "\n",
  36 |     "First let's start with a bit of a recap on traditional indexing and selection. (We went over most of this in the [pandas fundamentals](https://github.com/knathanieltucker/pandas-tutorial/blob/master/notebooks/Pandas%20Intro%20to%20Data%20Structures.ipynb)). To start off with, here is the data we are going to be working with (good old tips data):"
  37 |    ]
  38 |   },
  39 |   {
  40 |    "cell_type": "code",
  41 |    "execution_count": 2,
  42 |    "metadata": {},
  43 |    "outputs": [
  44 |     {
  45 |      "data": {
  46 |       "text/html": [
  47 |        "<div>\n",
  48 |        "<style scoped>\n",
  49 |        "    .dataframe tbody tr th:only-of-type {\n",
  50 |        "        vertical-align: middle;\n",
  51 |        "    }\n",
  52 |        "\n",
  53 |        "    .dataframe tbody tr th {\n",
  54 |        "        vertical-align: top;\n",
  55 |        "    }\n",
  56 |        "\n",
  57 |        "    .dataframe thead th {\n",
  58 |        "        text-align: right;\n",
  59 |        "    }\n",
  60 |        "</style>\n",
  61 |        "<table border=\"1\" class=\"dataframe\">\n",
  62 |        "  <thead>\n",
  63 |        "    <tr style=\"text-align: right;\">\n",
  64 |        "      <th></th>\n",
  65 |        "      <th>total_bill</th>\n",
  66 |        "      <th>tip</th>\n",
  67 |        "      <th>sex</th>\n",
  68 |        "      <th>smoker</th>\n",
  69 |        "      <th>day</th>\n",
  70 |        "      <th>time</th>\n",
  71 |        "      <th>size</th>\n",
  72 |        "    </tr>\n",
  73 |        "  </thead>\n",
  74 |        "  <tbody>\n",
  75 |        "    <tr>\n",
  76 |        "      <th>0</th>\n",
  77 |        "      <td>16.99</td>\n",
  78 |        "      <td>1.01</td>\n",
  79 |        "      <td>Female</td>\n",
  80 |        "      <td>No</td>\n",
  81 |        "      <td>Sun</td>\n",
  82 |        "      <td>Dinner</td>\n",
  83 |        "      <td>2</td>\n",
  84 |        "    </tr>\n",
  85 |        "    <tr>\n",
  86 |        "      <th>1</th>\n",
  87 |        "      <td>10.34</td>\n",
  88 |        "      <td>1.66</td>\n",
  89 |        "      <td>Male</td>\n",
  90 |        "      <td>No</td>\n",
  91 |        "      <td>Sun</td>\n",
  92 |        "      <td>Dinner</td>\n",
  93 |        "      <td>3</td>\n",
  94 |        "    </tr>\n",
  95 |        "    <tr>\n",
  96 |        "      <th>2</th>\n",
  97 |        "      <td>21.01</td>\n",
  98 |        "      <td>3.50</td>\n",
  99 |        "      <td>Male</td>\n",
 100 |        "      <td>No</td>\n",
 101 |        "      <td>Sun</td>\n",
 102 |        "      <td>Dinner</td>\n",
 103 |        "      <td>3</td>\n",
 104 |        "    </tr>\n",
 105 |        "  </tbody>\n",
 106 |        "</table>\n",
 107 |        "</div>"
 108 |       ],
 109 |       "text/plain": [
 110 |        "   total_bill   tip     sex smoker  day    time  size\n",
 111 |        "0       16.99  1.01  Female     No  Sun  Dinner     2\n",
 112 |        "1       10.34  1.66    Male     No  Sun  Dinner     3\n",
 113 |        "2       21.01  3.50    Male     No  Sun  Dinner     3"
 114 |       ]
 115 |      },
 116 |      "execution_count": 2,
 117 |      "metadata": {},
 118 |      "output_type": "execute_result"
 119 |     }
 120 |    ],
 121 |    "source": [
 122 |     "tips = sns.load_dataset('tips')\n",
 123 |     "tips.head(3)"
 124 |    ]
 125 |   },
 126 |   {
 127 |    "cell_type": "markdown",
 128 |    "metadata": {},
 129 |    "source": [
 130 |     "There are basically 4 ways to do get data from dataframes:"
 131 |    ]
 132 |   },
 133 |   {
 134 |    "cell_type": "code",
 135 |    "execution_count": 3,
 136 |    "metadata": {},
 137 |    "outputs": [
 138 |     {
 139 |      "data": {
 140 |       "text/html": [
 141 |        "<div>\n",
 142 |        "<style scoped>\n",
 143 |        "    .dataframe tbody tr th:only-of-type {\n",
 144 |        "        vertical-align: middle;\n",
 145 |        "    }\n",
 146 |        "\n",
 147 |        "    .dataframe tbody tr th {\n",
 148 |        "        vertical-align: top;\n",
 149 |        "    }\n",
 150 |        "\n",
 151 |        "    .dataframe thead th {\n",
 152 |        "        text-align: right;\n",
 153 |        "    }\n",
 154 |        "</style>\n",
 155 |        "<table border=\"1\" class=\"dataframe\">\n",
 156 |        "  <thead>\n",
 157 |        "    <tr style=\"text-align: right;\">\n",
 158 |        "      <th></th>\n",
 159 |        "      <th>total_bill</th>\n",
 160 |        "      <th>tip</th>\n",
 161 |        "    </tr>\n",
 162 |        "  </thead>\n",
 163 |        "  <tbody>\n",
 164 |        "    <tr>\n",
 165 |        "      <th>0</th>\n",
 166 |        "      <td>16.99</td>\n",
 167 |        "      <td>1.01</td>\n",
 168 |        "    </tr>\n",
 169 |        "    <tr>\n",
 170 |        "      <th>1</th>\n",
 171 |        "      <td>10.34</td>\n",
 172 |        "      <td>1.66</td>\n",
 173 |        "    </tr>\n",
 174 |        "    <tr>\n",
 175 |        "      <th>2</th>\n",
 176 |        "      <td>21.01</td>\n",
 177 |        "      <td>3.50</td>\n",
 178 |        "    </tr>\n",
 179 |        "    <tr>\n",
 180 |        "      <th>3</th>\n",
 181 |        "      <td>23.68</td>\n",
 182 |        "      <td>3.31</td>\n",
 183 |        "    </tr>\n",
 184 |        "    <tr>\n",
 185 |        "      <th>4</th>\n",
 186 |        "      <td>24.59</td>\n",
 187 |        "      <td>3.61</td>\n",
 188 |        "    </tr>\n",
 189 |        "  </tbody>\n",
 190 |        "</table>\n",
 191 |        "</div>"
 192 |       ],
 193 |       "text/plain": [
 194 |        "   total_bill   tip\n",
 195 |        "0       16.99  1.01\n",
 196 |        "1       10.34  1.66\n",
 197 |        "2       21.01  3.50\n",
 198 |        "3       23.68  3.31\n",
 199 |        "4       24.59  3.61"
 200 |       ]
 201 |      },
 202 |      "execution_count": 3,
 203 |      "metadata": {},
 204 |      "output_type": "execute_result"
 205 |     }
 206 |    ],
 207 |    "source": [
 208 |     "# 1) get columns\n",
 209 |     "tips[['total_bill', 'tip']].head()"
 210 |    ]
 211 |   },
 212 |   {
 213 |    "cell_type": "code",
 214 |    "execution_count": 4,
 215 |    "metadata": {},
 216 |    "outputs": [
 217 |     {
 218 |      "data": {
 219 |       "text/html": [
 220 |        "<div>\n",
 221 |        "<style scoped>\n",
 222 |        "    .dataframe tbody tr th:only-of-type {\n",
 223 |        "        vertical-align: middle;\n",
 224 |        "    }\n",
 225 |        "\n",
 226 |        "    .dataframe tbody tr th {\n",
 227 |        "        vertical-align: top;\n",
 228 |        "    }\n",
 229 |        "\n",
 230 |        "    .dataframe thead th {\n",
 231 |        "        text-align: right;\n",
 232 |        "    }\n",
 233 |        "</style>\n",
 234 |        "<table border=\"1\" class=\"dataframe\">\n",
 235 |        "  <thead>\n",
 236 |        "    <tr style=\"text-align: right;\">\n",
 237 |        "      <th></th>\n",
 238 |        "      <th>total_bill</th>\n",
 239 |        "      <th>tip</th>\n",
 240 |        "      <th>sex</th>\n",
 241 |        "      <th>smoker</th>\n",
 242 |        "      <th>day</th>\n",
 243 |        "      <th>time</th>\n",
 244 |        "      <th>size</th>\n",
 245 |        "    </tr>\n",
 246 |        "  </thead>\n",
 247 |        "  <tbody>\n",
 248 |        "    <tr>\n",
 249 |        "      <th>3</th>\n",
 250 |        "      <td>23.68</td>\n",
 251 |        "      <td>3.31</td>\n",
 252 |        "      <td>Male</td>\n",
 253 |        "      <td>No</td>\n",
 254 |        "      <td>Sun</td>\n",
 255 |        "      <td>Dinner</td>\n",
 256 |        "      <td>2</td>\n",
 257 |        "    </tr>\n",
 258 |        "    <tr>\n",
 259 |        "      <th>4</th>\n",
 260 |        "      <td>24.59</td>\n",
 261 |        "      <td>3.61</td>\n",
 262 |        "      <td>Female</td>\n",
 263 |        "      <td>No</td>\n",
 264 |        "      <td>Sun</td>\n",
 265 |        "      <td>Dinner</td>\n",
 266 |        "      <td>4</td>\n",
 267 |        "    </tr>\n",
 268 |        "  </tbody>\n",
 269 |        "</table>\n",
 270 |        "</div>"
 271 |       ],
 272 |       "text/plain": [
 273 |        "   total_bill   tip     sex smoker  day    time  size\n",
 274 |        "3       23.68  3.31    Male     No  Sun  Dinner     2\n",
 275 |        "4       24.59  3.61  Female     No  Sun  Dinner     4"
 276 |       ]
 277 |      },
 278 |      "execution_count": 4,
 279 |      "metadata": {},
 280 |      "output_type": "execute_result"
 281 |     }
 282 |    ],
 283 |    "source": [
 284 |     "# 2) get some rows\n",
 285 |     "tips[3:5]"
 286 |    ]
 287 |   },
 288 |   {
 289 |    "cell_type": "code",
 290 |    "execution_count": 5,
 291 |    "metadata": {},
 292 |    "outputs": [
 293 |     {
 294 |      "data": {
 295 |       "text/html": [
 296 |        "<div>\n",
 297 |        "<style scoped>\n",
 298 |        "    .dataframe tbody tr th:only-of-type {\n",
 299 |        "        vertical-align: middle;\n",
 300 |        "    }\n",
 301 |        "\n",
 302 |        "    .dataframe tbody tr th {\n",
 303 |        "        vertical-align: top;\n",
 304 |        "    }\n",
 305 |        "\n",
 306 |        "    .dataframe thead th {\n",
 307 |        "        text-align: right;\n",
 308 |        "    }\n",
 309 |        "</style>\n",
 310 |        "<table border=\"1\" class=\"dataframe\">\n",
 311 |        "  <thead>\n",
 312 |        "    <tr style=\"text-align: right;\">\n",
 313 |        "      <th></th>\n",
 314 |        "      <th>sex</th>\n",
 315 |        "      <th>smoker</th>\n",
 316 |        "    </tr>\n",
 317 |        "  </thead>\n",
 318 |        "  <tbody>\n",
 319 |        "    <tr>\n",
 320 |        "      <th>2</th>\n",
 321 |        "      <td>Male</td>\n",
 322 |        "      <td>No</td>\n",
 323 |        "    </tr>\n",
 324 |        "    <tr>\n",
 325 |        "      <th>3</th>\n",
 326 |        "      <td>Male</td>\n",
 327 |        "      <td>No</td>\n",
 328 |        "    </tr>\n",
 329 |        "    <tr>\n",
 330 |        "      <th>4</th>\n",
 331 |        "      <td>Female</td>\n",
 332 |        "      <td>No</td>\n",
 333 |        "    </tr>\n",
 334 |        "  </tbody>\n",
 335 |        "</table>\n",
 336 |        "</div>"
 337 |       ],
 338 |       "text/plain": [
 339 |        "      sex smoker\n",
 340 |        "2    Male     No\n",
 341 |        "3    Male     No\n",
 342 |        "4  Female     No"
 343 |       ]
 344 |      },
 345 |      "execution_count": 5,
 346 |      "metadata": {},
 347 |      "output_type": "execute_result"
 348 |     }
 349 |    ],
 350 |    "source": [
 351 |     "# 3) select rows and columns based on their name\n",
 352 |     "tips.loc[2:4, 'sex': 'smoker']"
 353 |    ]
 354 |   },
 355 |   {
 356 |    "cell_type": "code",
 357 |    "execution_count": 6,
 358 |    "metadata": {},
 359 |    "outputs": [
 360 |     {
 361 |      "data": {
 362 |       "text/html": [
 363 |        "<div>\n",
 364 |        "<style scoped>\n",
 365 |        "    .dataframe tbody tr th:only-of-type {\n",
 366 |        "        vertical-align: middle;\n",
 367 |        "    }\n",
 368 |        "\n",
 369 |        "    .dataframe tbody tr th {\n",
 370 |        "        vertical-align: top;\n",
 371 |        "    }\n",
 372 |        "\n",
 373 |        "    .dataframe thead th {\n",
 374 |        "        text-align: right;\n",
 375 |        "    }\n",
 376 |        "</style>\n",
 377 |        "<table border=\"1\" class=\"dataframe\">\n",
 378 |        "  <thead>\n",
 379 |        "    <tr style=\"text-align: right;\">\n",
 380 |        "      <th></th>\n",
 381 |        "      <th>total_bill</th>\n",
 382 |        "      <th>tip</th>\n",
 383 |        "    </tr>\n",
 384 |        "  </thead>\n",
 385 |        "  <tbody>\n",
 386 |        "    <tr>\n",
 387 |        "      <th>1</th>\n",
 388 |        "      <td>10.34</td>\n",
 389 |        "      <td>1.66</td>\n",
 390 |        "    </tr>\n",
 391 |        "    <tr>\n",
 392 |        "      <th>2</th>\n",
 393 |        "      <td>21.01</td>\n",
 394 |        "      <td>3.50</td>\n",
 395 |        "    </tr>\n",
 396 |        "  </tbody>\n",
 397 |        "</table>\n",
 398 |        "</div>"
 399 |       ],
 400 |       "text/plain": [
 401 |        "   total_bill   tip\n",
 402 |        "1       10.34  1.66\n",
 403 |        "2       21.01  3.50"
 404 |       ]
 405 |      },
 406 |      "execution_count": 6,
 407 |      "metadata": {},
 408 |      "output_type": "execute_result"
 409 |     }
 410 |    ],
 411 |    "source": [
 412 |     "# select rows and columns by their ordering\n",
 413 |     "tips.iloc[1:3, 0:2]"
 414 |    ]
 415 |   },
 416 |   {
 417 |    "cell_type": "code",
 418 |    "execution_count": 9,
 419 |    "metadata": {},
 420 |    "outputs": [
 421 |     {
 422 |      "data": {
 423 |       "text/html": [
 424 |        "<div>\n",
 425 |        "<style scoped>\n",
 426 |        "    .dataframe tbody tr th:only-of-type {\n",
 427 |        "        vertical-align: middle;\n",
 428 |        "    }\n",
 429 |        "\n",
 430 |        "    .dataframe tbody tr th {\n",
 431 |        "        vertical-align: top;\n",
 432 |        "    }\n",
 433 |        "\n",
 434 |        "    .dataframe thead th {\n",
 435 |        "        text-align: right;\n",
 436 |        "    }\n",
 437 |        "</style>\n",
 438 |        "<table border=\"1\" class=\"dataframe\">\n",
 439 |        "  <thead>\n",
 440 |        "    <tr style=\"text-align: right;\">\n",
 441 |        "      <th></th>\n",
 442 |        "      <th>total_bill</th>\n",
 443 |        "      <th>tip</th>\n",
 444 |        "      <th>sex</th>\n",
 445 |        "      <th>smoker</th>\n",
 446 |        "      <th>day</th>\n",
 447 |        "      <th>time</th>\n",
 448 |        "      <th>size</th>\n",
 449 |        "    </tr>\n",
 450 |        "  </thead>\n",
 451 |        "  <tbody>\n",
 452 |        "    <tr>\n",
 453 |        "      <th>0</th>\n",
 454 |        "      <td>16.99</td>\n",
 455 |        "      <td>1.01</td>\n",
 456 |        "      <td>Female</td>\n",
 457 |        "      <td>No</td>\n",
 458 |        "      <td>Sun</td>\n",
 459 |        "      <td>Dinner</td>\n",
 460 |        "      <td>2</td>\n",
 461 |        "    </tr>\n",
 462 |        "    <tr>\n",
 463 |        "      <th>1</th>\n",
 464 |        "      <td>10.34</td>\n",
 465 |        "      <td>1.66</td>\n",
 466 |        "      <td>Male</td>\n",
 467 |        "      <td>No</td>\n",
 468 |        "      <td>Sun</td>\n",
 469 |        "      <td>Dinner</td>\n",
 470 |        "      <td>3</td>\n",
 471 |        "    </tr>\n",
 472 |        "    <tr>\n",
 473 |        "      <th>2</th>\n",
 474 |        "      <td>21.01</td>\n",
 475 |        "      <td>3.50</td>\n",
 476 |        "      <td>Male</td>\n",
 477 |        "      <td>No</td>\n",
 478 |        "      <td>Sun</td>\n",
 479 |        "      <td>Dinner</td>\n",
 480 |        "      <td>3</td>\n",
 481 |        "    </tr>\n",
 482 |        "    <tr>\n",
 483 |        "      <th>3</th>\n",
 484 |        "      <td>23.68</td>\n",
 485 |        "      <td>3.31</td>\n",
 486 |        "      <td>Male</td>\n",
 487 |        "      <td>No</td>\n",
 488 |        "      <td>Sun</td>\n",
 489 |        "      <td>Dinner</td>\n",
 490 |        "      <td>2</td>\n",
 491 |        "    </tr>\n",
 492 |        "    <tr>\n",
 493 |        "      <th>4</th>\n",
 494 |        "      <td>24.59</td>\n",
 495 |        "      <td>3.61</td>\n",
 496 |        "      <td>Female</td>\n",
 497 |        "      <td>No</td>\n",
 498 |        "      <td>Sun</td>\n",
 499 |        "      <td>Dinner</td>\n",
 500 |        "      <td>4</td>\n",
 501 |        "    </tr>\n",
 502 |        "  </tbody>\n",
 503 |        "</table>\n",
 504 |        "</div>"
 505 |       ],
 506 |       "text/plain": [
 507 |        "   total_bill   tip     sex smoker  day    time  size\n",
 508 |        "0       16.99  1.01  Female     No  Sun  Dinner     2\n",
 509 |        "1       10.34  1.66    Male     No  Sun  Dinner     3\n",
 510 |        "2       21.01  3.50    Male     No  Sun  Dinner     3\n",
 511 |        "3       23.68  3.31    Male     No  Sun  Dinner     2\n",
 512 |        "4       24.59  3.61  Female     No  Sun  Dinner     4"
 513 |       ]
 514 |      },
 515 |      "execution_count": 9,
 516 |      "metadata": {},
 517 |      "output_type": "execute_result"
 518 |     }
 519 |    ],
 520 |    "source": [
 521 |     "# 5) select using a bool series\n",
 522 |     "tips[tips['tip'] > 1].head()"
 523 |    ]
 524 |   },
 525 |   {
 526 |    "cell_type": "markdown",
 527 |    "metadata": {},
 528 |    "source": [
 529 |     "But this is just the tip of the iceberg (well actually it's 90% of the iceberg). \n",
 530 |     "\n",
 531 |     "But there are a couple of other important concepts that you will most likely get into when diving into other pandas functionalities."
 532 |    ]
 533 |   },
 534 |   {
 535 |    "cell_type": "markdown",
 536 |    "metadata": {},
 537 |    "source": [
 538 |     "# Multi-index\n",
 539 |     "\n",
 540 |     "A subject that you might not think that you'd need - but turns out to be a rather frequent usecase. \n",
 541 |     "\n",
 542 |     "The initial idea behind the multi-index was to provide a framework to work with higher dim data (and thus a replacement for panels).\n",
 543 |     "\n",
 544 |     "But because of some operations it became quite commonplace. In almost all cases multi-index comes from [groupby's](https://github.com/knathanieltucker/pandas-tutorial/blob/master/notebooks/Group%20Operations.ipynb) (you will almost never construct it or read it in yourself).\n",
 545 |     "\n",
 546 |     "Let's do an example below:"
 547 |    ]
 548 |   },
 549 |   {
 550 |    "cell_type": "code",
 551 |    "execution_count": 10,
 552 |    "metadata": {},
 553 |    "outputs": [
 554 |     {
 555 |      "data": {
 556 |       "text/html": [
 557 |        "<div>\n",
 558 |        "<style scoped>\n",
 559 |        "    .dataframe tbody tr th:only-of-type {\n",
 560 |        "        vertical-align: middle;\n",
 561 |        "    }\n",
 562 |        "\n",
 563 |        "    .dataframe tbody tr th {\n",
 564 |        "        vertical-align: top;\n",
 565 |        "    }\n",
 566 |        "\n",
 567 |        "    .dataframe thead th {\n",
 568 |        "        text-align: right;\n",
 569 |        "    }\n",
 570 |        "</style>\n",
 571 |        "<table border=\"1\" class=\"dataframe\">\n",
 572 |        "  <thead>\n",
 573 |        "    <tr style=\"text-align: right;\">\n",
 574 |        "      <th></th>\n",
 575 |        "      <th>total_bill</th>\n",
 576 |        "      <th>tip</th>\n",
 577 |        "      <th>sex</th>\n",
 578 |        "      <th>smoker</th>\n",
 579 |        "      <th>day</th>\n",
 580 |        "      <th>time</th>\n",
 581 |        "      <th>size</th>\n",
 582 |        "    </tr>\n",
 583 |        "  </thead>\n",
 584 |        "  <tbody>\n",
 585 |        "    <tr>\n",
 586 |        "      <th>0</th>\n",
 587 |        "      <td>16.99</td>\n",
 588 |        "      <td>1.01</td>\n",
 589 |        "      <td>Female</td>\n",
 590 |        "      <td>No</td>\n",
 591 |        "      <td>Sun</td>\n",
 592 |        "      <td>Dinner</td>\n",
 593 |        "      <td>2</td>\n",
 594 |        "    </tr>\n",
 595 |        "    <tr>\n",
 596 |        "      <th>1</th>\n",
 597 |        "      <td>10.34</td>\n",
 598 |        "      <td>1.66</td>\n",
 599 |        "      <td>Male</td>\n",
 600 |        "      <td>No</td>\n",
 601 |        "      <td>Sun</td>\n",
 602 |        "      <td>Dinner</td>\n",
 603 |        "      <td>3</td>\n",
 604 |        "    </tr>\n",
 605 |        "    <tr>\n",
 606 |        "      <th>2</th>\n",
 607 |        "      <td>21.01</td>\n",
 608 |        "      <td>3.50</td>\n",
 609 |        "      <td>Male</td>\n",
 610 |        "      <td>No</td>\n",
 611 |        "      <td>Sun</td>\n",
 612 |        "      <td>Dinner</td>\n",
 613 |        "      <td>3</td>\n",
 614 |        "    </tr>\n",
 615 |        "    <tr>\n",
 616 |        "      <th>3</th>\n",
 617 |        "      <td>23.68</td>\n",
 618 |        "      <td>3.31</td>\n",
 619 |        "      <td>Male</td>\n",
 620 |        "      <td>No</td>\n",
 621 |        "      <td>Sun</td>\n",
 622 |        "      <td>Dinner</td>\n",
 623 |        "      <td>2</td>\n",
 624 |        "    </tr>\n",
 625 |        "    <tr>\n",
 626 |        "      <th>4</th>\n",
 627 |        "      <td>24.59</td>\n",
 628 |        "      <td>3.61</td>\n",
 629 |        "      <td>Female</td>\n",
 630 |        "      <td>No</td>\n",
 631 |        "      <td>Sun</td>\n",
 632 |        "      <td>Dinner</td>\n",
 633 |        "      <td>4</td>\n",
 634 |        "    </tr>\n",
 635 |        "  </tbody>\n",
 636 |        "</table>\n",
 637 |        "</div>"
 638 |       ],
 639 |       "text/plain": [
 640 |        "   total_bill   tip     sex smoker  day    time  size\n",
 641 |        "0       16.99  1.01  Female     No  Sun  Dinner     2\n",
 642 |        "1       10.34  1.66    Male     No  Sun  Dinner     3\n",
 643 |        "2       21.01  3.50    Male     No  Sun  Dinner     3\n",
 644 |        "3       23.68  3.31    Male     No  Sun  Dinner     2\n",
 645 |        "4       24.59  3.61  Female     No  Sun  Dinner     4"
 646 |       ]
 647 |      },
 648 |      "execution_count": 10,
 649 |      "metadata": {},
 650 |      "output_type": "execute_result"
 651 |     }
 652 |    ],
 653 |    "source": [
 654 |     "tips.head()"
 655 |    ]
 656 |   },
 657 |   {
 658 |    "cell_type": "code",
 659 |    "execution_count": 11,
 660 |    "metadata": {},
 661 |    "outputs": [
 662 |     {
 663 |      "data": {
 664 |       "text/html": [
 665 |        "<div>\n",
 666 |        "<style scoped>\n",
 667 |        "    .dataframe tbody tr th:only-of-type {\n",
 668 |        "        vertical-align: middle;\n",
 669 |        "    }\n",
 670 |        "\n",
 671 |        "    .dataframe tbody tr th {\n",
 672 |        "        vertical-align: top;\n",
 673 |        "    }\n",
 674 |        "\n",
 675 |        "    .dataframe thead th {\n",
 676 |        "        text-align: right;\n",
 677 |        "    }\n",
 678 |        "</style>\n",
 679 |        "<table border=\"1\" class=\"dataframe\">\n",
 680 |        "  <thead>\n",
 681 |        "    <tr style=\"text-align: right;\">\n",
 682 |        "      <th></th>\n",
 683 |        "      <th></th>\n",
 684 |        "      <th>tip</th>\n",
 685 |        "    </tr>\n",
 686 |        "    <tr>\n",
 687 |        "      <th>sex</th>\n",
 688 |        "      <th>smoker</th>\n",
 689 |        "      <th></th>\n",
 690 |        "    </tr>\n",
 691 |        "  </thead>\n",
 692 |        "  <tbody>\n",
 693 |        "    <tr>\n",
 694 |        "      <th rowspan=\"2\" valign=\"top\">Male</th>\n",
 695 |        "      <th>Yes</th>\n",
 696 |        "      <td>3.051167</td>\n",
 697 |        "    </tr>\n",
 698 |        "    <tr>\n",
 699 |        "      <th>No</th>\n",
 700 |        "      <td>3.113402</td>\n",
 701 |        "    </tr>\n",
 702 |        "    <tr>\n",
 703 |        "      <th rowspan=\"2\" valign=\"top\">Female</th>\n",
 704 |        "      <th>Yes</th>\n",
 705 |        "      <td>2.931515</td>\n",
 706 |        "    </tr>\n",
 707 |        "    <tr>\n",
 708 |        "      <th>No</th>\n",
 709 |        "      <td>2.773519</td>\n",
 710 |        "    </tr>\n",
 711 |        "  </tbody>\n",
 712 |        "</table>\n",
 713 |        "</div>"
 714 |       ],
 715 |       "text/plain": [
 716 |        "                    tip\n",
 717 |        "sex    smoker          \n",
 718 |        "Male   Yes     3.051167\n",
 719 |        "       No      3.113402\n",
 720 |        "Female Yes     2.931515\n",
 721 |        "       No      2.773519"
 722 |       ]
 723 |      },
 724 |      "execution_count": 11,
 725 |      "metadata": {},
 726 |      "output_type": "execute_result"
 727 |     }
 728 |    ],
 729 |    "source": [
 730 |     "mi_tips = tips.groupby(['sex', 'smoker']).agg({'tip': 'mean'})\n",
 731 |     "mi_tips"
 732 |    ]
 733 |   },
 734 |   {
 735 |    "cell_type": "code",
 736 |    "execution_count": 12,
 737 |    "metadata": {},
 738 |    "outputs": [
 739 |     {
 740 |      "data": {
 741 |       "text/plain": [
 742 |        "MultiIndex(levels=[['Male', 'Female'], ['Yes', 'No']],\n",
 743 |        "           codes=[[0, 0, 1, 1], [0, 1, 0, 1]],\n",
 744 |        "           names=['sex', 'smoker'])"
 745 |       ]
 746 |      },
 747 |      "execution_count": 12,
 748 |      "metadata": {},
 749 |      "output_type": "execute_result"
 750 |     }
 751 |    ],
 752 |    "source": [
 753 |     "mi_tips.index"
 754 |    ]
 755 |   },
 756 |   {
 757 |    "cell_type": "markdown",
 758 |    "metadata": {},
 759 |    "source": [
 760 |     "Ultimately there are a ton of operations that you can do on top of this type of data. And there are equivalent multi-index operations you can do, like this:"
 761 |    ]
 762 |   },
 763 |   {
 764 |    "cell_type": "code",
 765 |    "execution_count": 13,
 766 |    "metadata": {},
 767 |    "outputs": [
 768 |     {
 769 |      "data": {
 770 |       "text/plain": [
 771 |        "tip    3.113402\n",
 772 |        "Name: (Male, No), dtype: float64"
 773 |       ]
 774 |      },
 775 |      "execution_count": 13,
 776 |      "metadata": {},
 777 |      "output_type": "execute_result"
 778 |     }
 779 |    ],
 780 |    "source": [
 781 |     "mi_tips.loc[('Male', 'No')]"
 782 |    ]
 783 |   },
 784 |   {
 785 |    "cell_type": "markdown",
 786 |    "metadata": {},
 787 |    "source": [
 788 |     "But in that way you'd have a learn a lot of details and there are always exceptions. \n",
 789 |     "\n",
 790 |     "So the way that I have always deal with this is simply by resetting the index."
 791 |    ]
 792 |   },
 793 |   {
 794 |    "cell_type": "code",
 795 |    "execution_count": 14,
 796 |    "metadata": {},
 797 |    "outputs": [
 798 |     {
 799 |      "data": {
 800 |       "text/html": [
 801 |        "<div>\n",
 802 |        "<style scoped>\n",
 803 |        "    .dataframe tbody tr th:only-of-type {\n",
 804 |        "        vertical-align: middle;\n",
 805 |        "    }\n",
 806 |        "\n",
 807 |        "    .dataframe tbody tr th {\n",
 808 |        "        vertical-align: top;\n",
 809 |        "    }\n",
 810 |        "\n",
 811 |        "    .dataframe thead th {\n",
 812 |        "        text-align: right;\n",
 813 |        "    }\n",
 814 |        "</style>\n",
 815 |        "<table border=\"1\" class=\"dataframe\">\n",
 816 |        "  <thead>\n",
 817 |        "    <tr style=\"text-align: right;\">\n",
 818 |        "      <th></th>\n",
 819 |        "      <th>sex</th>\n",
 820 |        "      <th>smoker</th>\n",
 821 |        "      <th>tip</th>\n",
 822 |        "    </tr>\n",
 823 |        "  </thead>\n",
 824 |        "  <tbody>\n",
 825 |        "    <tr>\n",
 826 |        "      <th>0</th>\n",
 827 |        "      <td>Male</td>\n",
 828 |        "      <td>Yes</td>\n",
 829 |        "      <td>3.051167</td>\n",
 830 |        "    </tr>\n",
 831 |        "    <tr>\n",
 832 |        "      <th>1</th>\n",
 833 |        "      <td>Male</td>\n",
 834 |        "      <td>No</td>\n",
 835 |        "      <td>3.113402</td>\n",
 836 |        "    </tr>\n",
 837 |        "    <tr>\n",
 838 |        "      <th>2</th>\n",
 839 |        "      <td>Female</td>\n",
 840 |        "      <td>Yes</td>\n",
 841 |        "      <td>2.931515</td>\n",
 842 |        "    </tr>\n",
 843 |        "    <tr>\n",
 844 |        "      <th>3</th>\n",
 845 |        "      <td>Female</td>\n",
 846 |        "      <td>No</td>\n",
 847 |        "      <td>2.773519</td>\n",
 848 |        "    </tr>\n",
 849 |        "  </tbody>\n",
 850 |        "</table>\n",
 851 |        "</div>"
 852 |       ],
 853 |       "text/plain": [
 854 |        "      sex smoker       tip\n",
 855 |        "0    Male    Yes  3.051167\n",
 856 |        "1    Male     No  3.113402\n",
 857 |        "2  Female    Yes  2.931515\n",
 858 |        "3  Female     No  2.773519"
 859 |       ]
 860 |      },
 861 |      "execution_count": 14,
 862 |      "metadata": {},
 863 |      "output_type": "execute_result"
 864 |     }
 865 |    ],
 866 |    "source": [
 867 |     "ri_tips = mi_tips.reset_index()\n",
 868 |     "ri_tips"
 869 |    ]
 870 |   },
 871 |   {
 872 |    "cell_type": "markdown",
 873 |    "metadata": {},
 874 |    "source": [
 875 |     "Notice how we get values spread out over the full column now. So in this way it is easy to select only the male non-smokers:"
 876 |    ]
 877 |   },
 878 |   {
 879 |    "cell_type": "code",
 880 |    "execution_count": 18,
 881 |    "metadata": {},
 882 |    "outputs": [
 883 |     {
 884 |      "data": {
 885 |       "text/html": [
 886 |        "<div>\n",
 887 |        "<style scoped>\n",
 888 |        "    .dataframe tbody tr th:only-of-type {\n",
 889 |        "        vertical-align: middle;\n",
 890 |        "    }\n",
 891 |        "\n",
 892 |        "    .dataframe tbody tr th {\n",
 893 |        "        vertical-align: top;\n",
 894 |        "    }\n",
 895 |        "\n",
 896 |        "    .dataframe thead th {\n",
 897 |        "        text-align: right;\n",
 898 |        "    }\n",
 899 |        "</style>\n",
 900 |        "<table border=\"1\" class=\"dataframe\">\n",
 901 |        "  <thead>\n",
 902 |        "    <tr style=\"text-align: right;\">\n",
 903 |        "      <th></th>\n",
 904 |        "      <th>sex</th>\n",
 905 |        "      <th>smoker</th>\n",
 906 |        "      <th>tip</th>\n",
 907 |        "    </tr>\n",
 908 |        "  </thead>\n",
 909 |        "  <tbody>\n",
 910 |        "    <tr>\n",
 911 |        "      <th>1</th>\n",
 912 |        "      <td>Male</td>\n",
 913 |        "      <td>No</td>\n",
 914 |        "      <td>3.113402</td>\n",
 915 |        "    </tr>\n",
 916 |        "  </tbody>\n",
 917 |        "</table>\n",
 918 |        "</div>"
 919 |       ],
 920 |       "text/plain": [
 921 |        "    sex smoker       tip\n",
 922 |        "1  Male     No  3.113402"
 923 |       ]
 924 |      },
 925 |      "execution_count": 18,
 926 |      "metadata": {},
 927 |      "output_type": "execute_result"
 928 |     }
 929 |    ],
 930 |    "source": [
 931 |     "ri_tips[(ri_tips['smoker'] == 'No') & (ri_tips['sex'] == 'Male')]"
 932 |    ]
 933 |   },
 934 |   {
 935 |    "cell_type": "markdown",
 936 |    "metadata": {},
 937 |    "source": [
 938 |     "Another way you can deal with this is to only certain indexes out:"
 939 |    ]
 940 |   },
 941 |   {
 942 |    "cell_type": "code",
 943 |    "execution_count": 19,
 944 |    "metadata": {},
 945 |    "outputs": [
 946 |     {
 947 |      "data": {
 948 |       "text/html": [
 949 |        "<div>\n",
 950 |        "<style scoped>\n",
 951 |        "    .dataframe tbody tr th:only-of-type {\n",
 952 |        "        vertical-align: middle;\n",
 953 |        "    }\n",
 954 |        "\n",
 955 |        "    .dataframe tbody tr th {\n",
 956 |        "        vertical-align: top;\n",
 957 |        "    }\n",
 958 |        "\n",
 959 |        "    .dataframe thead th {\n",
 960 |        "        text-align: right;\n",
 961 |        "    }\n",
 962 |        "</style>\n",
 963 |        "<table border=\"1\" class=\"dataframe\">\n",
 964 |        "  <thead>\n",
 965 |        "    <tr style=\"text-align: right;\">\n",
 966 |        "      <th></th>\n",
 967 |        "      <th>sex</th>\n",
 968 |        "      <th>tip</th>\n",
 969 |        "    </tr>\n",
 970 |        "    <tr>\n",
 971 |        "      <th>smoker</th>\n",
 972 |        "      <th></th>\n",
 973 |        "      <th></th>\n",
 974 |        "    </tr>\n",
 975 |        "  </thead>\n",
 976 |        "  <tbody>\n",
 977 |        "    <tr>\n",
 978 |        "      <th>Yes</th>\n",
 979 |        "      <td>Male</td>\n",
 980 |        "      <td>3.051167</td>\n",
 981 |        "    </tr>\n",
 982 |        "    <tr>\n",
 983 |        "      <th>Yes</th>\n",
 984 |        "      <td>Female</td>\n",
 985 |        "      <td>2.931515</td>\n",
 986 |        "    </tr>\n",
 987 |        "  </tbody>\n",
 988 |        "</table>\n",
 989 |        "</div>"
 990 |       ],
 991 |       "text/plain": [
 992 |        "           sex       tip\n",
 993 |        "smoker                  \n",
 994 |        "Yes       Male  3.051167\n",
 995 |        "Yes     Female  2.931515"
 996 |       ]
 997 |      },
 998 |      "execution_count": 19,
 999 |      "metadata": {},
1000 |      "output_type": "execute_result"
1001 |     }
1002 |    ],
1003 |    "source": [
1004 |     "ri0_tips = mi_tips.reset_index(level=0)\n",
1005 |     "ri0_tips.loc['Yes']"
1006 |    ]
1007 |   },
1008 |   {
1009 |    "cell_type": "markdown",
1010 |    "metadata": {},
1011 |    "source": [
1012 |     "And finally you can [pull indexes back into the index](https://github.com/knathanieltucker/pandas-tutorial/blob/master/notebooks/Indexing%20and%20Selecting.ipynb) (basically only useful for certain types of merges)."
1013 |    ]
1014 |   },
1015 |   {
1016 |    "cell_type": "code",
1017 |    "execution_count": 20,
1018 |    "metadata": {},
1019 |    "outputs": [
1020 |     {
1021 |      "data": {
1022 |       "text/html": [
1023 |        "<div>\n",
1024 |        "<style scoped>\n",
1025 |        "    .dataframe tbody tr th:only-of-type {\n",
1026 |        "        vertical-align: middle;\n",
1027 |        "    }\n",
1028 |        "\n",
1029 |        "    .dataframe tbody tr th {\n",
1030 |        "        vertical-align: top;\n",
1031 |        "    }\n",
1032 |        "\n",
1033 |        "    .dataframe thead th {\n",
1034 |        "        text-align: right;\n",
1035 |        "    }\n",
1036 |        "</style>\n",
1037 |        "<table border=\"1\" class=\"dataframe\">\n",
1038 |        "  <thead>\n",
1039 |        "    <tr style=\"text-align: right;\">\n",
1040 |        "      <th></th>\n",
1041 |        "      <th></th>\n",
1042 |        "      <th>tip</th>\n",
1043 |        "    </tr>\n",
1044 |        "    <tr>\n",
1045 |        "      <th>sex</th>\n",
1046 |        "      <th>smoker</th>\n",
1047 |        "      <th></th>\n",
1048 |        "    </tr>\n",
1049 |        "  </thead>\n",
1050 |        "  <tbody>\n",
1051 |        "    <tr>\n",
1052 |        "      <th rowspan=\"2\" valign=\"top\">Male</th>\n",
1053 |        "      <th>Yes</th>\n",
1054 |        "      <td>3.051167</td>\n",
1055 |        "    </tr>\n",
1056 |        "    <tr>\n",
1057 |        "      <th>No</th>\n",
1058 |        "      <td>3.113402</td>\n",
1059 |        "    </tr>\n",
1060 |        "    <tr>\n",
1061 |        "      <th rowspan=\"2\" valign=\"top\">Female</th>\n",
1062 |        "      <th>Yes</th>\n",
1063 |        "      <td>2.931515</td>\n",
1064 |        "    </tr>\n",
1065 |        "    <tr>\n",
1066 |        "      <th>No</th>\n",
1067 |        "      <td>2.773519</td>\n",
1068 |        "    </tr>\n",
1069 |        "  </tbody>\n",
1070 |        "</table>\n",
1071 |        "</div>"
1072 |       ],
1073 |       "text/plain": [
1074 |        "                    tip\n",
1075 |        "sex    smoker          \n",
1076 |        "Male   Yes     3.051167\n",
1077 |        "       No      3.113402\n",
1078 |        "Female Yes     2.931515\n",
1079 |        "       No      2.773519"
1080 |       ]
1081 |      },
1082 |      "execution_count": 20,
1083 |      "metadata": {},
1084 |      "output_type": "execute_result"
1085 |     }
1086 |    ],
1087 |    "source": [
1088 |     "ri_tips.set_index(['sex', 'smoker'])"
1089 |    ]
1090 |   },
1091 |   {
1092 |    "cell_type": "code",
1093 |    "execution_count": 21,
1094 |    "metadata": {},
1095 |    "outputs": [
1096 |     {
1097 |      "data": {
1098 |       "text/html": [
1099 |        "<div>\n",
1100 |        "<style scoped>\n",
1101 |        "    .dataframe tbody tr th:only-of-type {\n",
1102 |        "        vertical-align: middle;\n",
1103 |        "    }\n",
1104 |        "\n",
1105 |        "    .dataframe tbody tr th {\n",
1106 |        "        vertical-align: top;\n",
1107 |        "    }\n",
1108 |        "\n",
1109 |        "    .dataframe thead th {\n",
1110 |        "        text-align: right;\n",
1111 |        "    }\n",
1112 |        "</style>\n",
1113 |        "<table border=\"1\" class=\"dataframe\">\n",
1114 |        "  <thead>\n",
1115 |        "    <tr style=\"text-align: right;\">\n",
1116 |        "      <th></th>\n",
1117 |        "      <th></th>\n",
1118 |        "      <th>tip</th>\n",
1119 |        "    </tr>\n",
1120 |        "    <tr>\n",
1121 |        "      <th>smoker</th>\n",
1122 |        "      <th>sex</th>\n",
1123 |        "      <th></th>\n",
1124 |        "    </tr>\n",
1125 |        "  </thead>\n",
1126 |        "  <tbody>\n",
1127 |        "    <tr>\n",
1128 |        "      <th>Yes</th>\n",
1129 |        "      <th>Male</th>\n",
1130 |        "      <td>3.051167</td>\n",
1131 |        "    </tr>\n",
1132 |        "    <tr>\n",
1133 |        "      <th>No</th>\n",
1134 |        "      <th>Male</th>\n",
1135 |        "      <td>3.113402</td>\n",
1136 |        "    </tr>\n",
1137 |        "    <tr>\n",
1138 |        "      <th>Yes</th>\n",
1139 |        "      <th>Female</th>\n",
1140 |        "      <td>2.931515</td>\n",
1141 |        "    </tr>\n",
1142 |        "    <tr>\n",
1143 |        "      <th>No</th>\n",
1144 |        "      <th>Female</th>\n",
1145 |        "      <td>2.773519</td>\n",
1146 |        "    </tr>\n",
1147 |        "  </tbody>\n",
1148 |        "</table>\n",
1149 |        "</div>"
1150 |       ],
1151 |       "text/plain": [
1152 |        "                    tip\n",
1153 |        "smoker sex             \n",
1154 |        "Yes    Male    3.051167\n",
1155 |        "No     Male    3.113402\n",
1156 |        "Yes    Female  2.931515\n",
1157 |        "No     Female  2.773519"
1158 |       ]
1159 |      },
1160 |      "execution_count": 21,
1161 |      "metadata": {},
1162 |      "output_type": "execute_result"
1163 |     }
1164 |    ],
1165 |    "source": [
1166 |     "ri0_tips.set_index('sex', append=True)"
1167 |    ]
1168 |   },
1169 |   {
1170 |    "cell_type": "markdown",
1171 |    "metadata": {},
1172 |    "source": [
1173 |     "# Getting Single Values\n",
1174 |     "\n",
1175 |     "The next little indexing trick is one that is mostly about speed. But it is getting and setting single values. It is a pretty simple:"
1176 |    ]
1177 |   },
1178 |   {
1179 |    "cell_type": "code",
1180 |    "execution_count": 37,
1181 |    "metadata": {},
1182 |    "outputs": [
1183 |     {
1184 |      "data": {
1185 |       "text/html": [
1186 |        "<div>\n",
1187 |        "<style scoped>\n",
1188 |        "    .dataframe tbody tr th:only-of-type {\n",
1189 |        "        vertical-align: middle;\n",
1190 |        "    }\n",
1191 |        "\n",
1192 |        "    .dataframe tbody tr th {\n",
1193 |        "        vertical-align: top;\n",
1194 |        "    }\n",
1195 |        "\n",
1196 |        "    .dataframe thead th {\n",
1197 |        "        text-align: right;\n",
1198 |        "    }\n",
1199 |        "</style>\n",
1200 |        "<table border=\"1\" class=\"dataframe\">\n",
1201 |        "  <thead>\n",
1202 |        "    <tr style=\"text-align: right;\">\n",
1203 |        "      <th></th>\n",
1204 |        "      <th>total_bill</th>\n",
1205 |        "      <th>tip</th>\n",
1206 |        "      <th>sex</th>\n",
1207 |        "      <th>smoker</th>\n",
1208 |        "      <th>day</th>\n",
1209 |        "      <th>time</th>\n",
1210 |        "      <th>size</th>\n",
1211 |        "    </tr>\n",
1212 |        "  </thead>\n",
1213 |        "  <tbody>\n",
1214 |        "    <tr>\n",
1215 |        "      <th>0</th>\n",
1216 |        "      <td>6.00</td>\n",
1217 |        "      <td>1.01</td>\n",
1218 |        "      <td>Female</td>\n",
1219 |        "      <td>No</td>\n",
1220 |        "      <td>Sun</td>\n",
1221 |        "      <td>Dinner</td>\n",
1222 |        "      <td>2</td>\n",
1223 |        "    </tr>\n",
1224 |        "    <tr>\n",
1225 |        "      <th>1</th>\n",
1226 |        "      <td>10.34</td>\n",
1227 |        "      <td>1.66</td>\n",
1228 |        "      <td>Male</td>\n",
1229 |        "      <td>No</td>\n",
1230 |        "      <td>Sun</td>\n",
1231 |        "      <td>Dinner</td>\n",
1232 |        "      <td>3</td>\n",
1233 |        "    </tr>\n",
1234 |        "    <tr>\n",
1235 |        "      <th>2</th>\n",
1236 |        "      <td>21.01</td>\n",
1237 |        "      <td>3.50</td>\n",
1238 |        "      <td>Male</td>\n",
1239 |        "      <td>No</td>\n",
1240 |        "      <td>Sun</td>\n",
1241 |        "      <td>Dinner</td>\n",
1242 |        "      <td>3</td>\n",
1243 |        "    </tr>\n",
1244 |        "  </tbody>\n",
1245 |        "</table>\n",
1246 |        "</div>"
1247 |       ],
1248 |       "text/plain": [
1249 |        "   total_bill   tip     sex smoker  day    time  size\n",
1250 |        "0        6.00  1.01  Female     No  Sun  Dinner     2\n",
1251 |        "1       10.34  1.66    Male     No  Sun  Dinner     3\n",
1252 |        "2       21.01  3.50    Male     No  Sun  Dinner     3"
1253 |       ]
1254 |      },
1255 |      "execution_count": 37,
1256 |      "metadata": {},
1257 |      "output_type": "execute_result"
1258 |     }
1259 |    ],
1260 |    "source": [
1261 |     "tips.head(3)"
1262 |    ]
1263 |   },
1264 |   {
1265 |    "cell_type": "markdown",
1266 |    "metadata": {},
1267 |    "source": [
1268 |     "When getting/setting single values you should use the `at` function"
1269 |    ]
1270 |   },
1271 |   {
1272 |    "cell_type": "code",
1273 |    "execution_count": 23,
1274 |    "metadata": {},
1275 |    "outputs": [
1276 |     {
1277 |      "data": {
1278 |       "text/html": [
1279 |        "<div>\n",
1280 |        "<style scoped>\n",
1281 |        "    .dataframe tbody tr th:only-of-type {\n",
1282 |        "        vertical-align: middle;\n",
1283 |        "    }\n",
1284 |        "\n",
1285 |        "    .dataframe tbody tr th {\n",
1286 |        "        vertical-align: top;\n",
1287 |        "    }\n",
1288 |        "\n",
1289 |        "    .dataframe thead th {\n",
1290 |        "        text-align: right;\n",
1291 |        "    }\n",
1292 |        "</style>\n",
1293 |        "<table border=\"1\" class=\"dataframe\">\n",
1294 |        "  <thead>\n",
1295 |        "    <tr style=\"text-align: right;\">\n",
1296 |        "      <th></th>\n",
1297 |        "      <th>total_bill</th>\n",
1298 |        "      <th>tip</th>\n",
1299 |        "      <th>sex</th>\n",
1300 |        "      <th>smoker</th>\n",
1301 |        "      <th>day</th>\n",
1302 |        "      <th>time</th>\n",
1303 |        "      <th>size</th>\n",
1304 |        "    </tr>\n",
1305 |        "  </thead>\n",
1306 |        "  <tbody>\n",
1307 |        "    <tr>\n",
1308 |        "      <th>0</th>\n",
1309 |        "      <td>9000.00</td>\n",
1310 |        "      <td>1.01</td>\n",
1311 |        "      <td>Female</td>\n",
1312 |        "      <td>No</td>\n",
1313 |        "      <td>Sun</td>\n",
1314 |        "      <td>Dinner</td>\n",
1315 |        "      <td>2</td>\n",
1316 |        "    </tr>\n",
1317 |        "    <tr>\n",
1318 |        "      <th>1</th>\n",
1319 |        "      <td>10.34</td>\n",
1320 |        "      <td>1.66</td>\n",
1321 |        "      <td>Male</td>\n",
1322 |        "      <td>No</td>\n",
1323 |        "      <td>Sun</td>\n",
1324 |        "      <td>Dinner</td>\n",
1325 |        "      <td>3</td>\n",
1326 |        "    </tr>\n",
1327 |        "    <tr>\n",
1328 |        "      <th>2</th>\n",
1329 |        "      <td>21.01</td>\n",
1330 |        "      <td>3.50</td>\n",
1331 |        "      <td>Male</td>\n",
1332 |        "      <td>No</td>\n",
1333 |        "      <td>Sun</td>\n",
1334 |        "      <td>Dinner</td>\n",
1335 |        "      <td>3</td>\n",
1336 |        "    </tr>\n",
1337 |        "  </tbody>\n",
1338 |        "</table>\n",
1339 |        "</div>"
1340 |       ],
1341 |       "text/plain": [
1342 |        "   total_bill   tip     sex smoker  day    time  size\n",
1343 |        "0     9000.00  1.01  Female     No  Sun  Dinner     2\n",
1344 |        "1       10.34  1.66    Male     No  Sun  Dinner     3\n",
1345 |        "2       21.01  3.50    Male     No  Sun  Dinner     3"
1346 |       ]
1347 |      },
1348 |      "execution_count": 23,
1349 |      "metadata": {},
1350 |      "output_type": "execute_result"
1351 |     }
1352 |    ],
1353 |    "source": [
1354 |     "tips.at[0, 'total_bill'] = 9000\n",
1355 |     "tips.head(3)"
1356 |    ]
1357 |   },
1358 |   {
1359 |    "cell_type": "code",
1360 |    "execution_count": 24,
1361 |    "metadata": {},
1362 |    "outputs": [
1363 |     {
1364 |      "data": {
1365 |       "text/plain": [
1366 |        "9000.0"
1367 |       ]
1368 |      },
1369 |      "execution_count": 24,
1370 |      "metadata": {},
1371 |      "output_type": "execute_result"
1372 |     }
1373 |    ],
1374 |    "source": [
1375 |     "tips.iat[0, 0]"
1376 |    ]
1377 |   },
1378 |   {
1379 |    "cell_type": "markdown",
1380 |    "metadata": {},
1381 |    "source": [
1382 |     "If you are modifying single values of a dataframe you should always use these guys. It's faster and it is a good way to know that you are not messing up (often times modifying the data can result in odd errors).\n",
1383 |     "\n",
1384 |     "So just to prove it's faster let's time it!"
1385 |    ]
1386 |   },
1387 |   {
1388 |    "cell_type": "code",
1389 |    "execution_count": 25,
1390 |    "metadata": {},
1391 |    "outputs": [
1392 |     {
1393 |      "name": "stdout",
1394 |      "output_type": "stream",
1395 |      "text": [
1396 |       "5.85 µs ± 96.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
1397 |      ]
1398 |     }
1399 |    ],
1400 |    "source": [
1401 |     "%%timeit\n",
1402 |     "tips.at[0, 'total_bill'] = 6"
1403 |    ]
1404 |   },
1405 |   {
1406 |    "cell_type": "code",
1407 |    "execution_count": 26,
1408 |    "metadata": {},
1409 |    "outputs": [
1410 |     {
1411 |      "name": "stdout",
1412 |      "output_type": "stream",
1413 |      "text": [
1414 |       "304 µs ± 8.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
1415 |      ]
1416 |     }
1417 |    ],
1418 |    "source": [
1419 |     "%%timeit\n",
1420 |     "tips.loc['total_bill', 0] = 6"
1421 |    ]
1422 |   },
1423 |   {
1424 |    "cell_type": "markdown",
1425 |    "metadata": {},
1426 |    "source": [
1427 |     "# Where, Masks and Queries\n",
1428 |     "\n",
1429 |     "These are things that are built into pandas that I have personally never used, mostly because they are pretty redundant and don't happen too often.\n",
1430 |     "\n",
1431 |     "They are a bit faster, yes. But the mental space is probably not worth it. So if you wanna learn it, go for it (docs are [here](http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#the-query-method)). If not, probably won't matter.\n",
1432 |     "\n",
1433 |     "Let me show you how you'd duplicate mask functionality below. "
1434 |    ]
1435 |   },
1436 |   {
1437 |    "cell_type": "code",
1438 |    "execution_count": 27,
1439 |    "metadata": {},
1440 |    "outputs": [
1441 |     {
1442 |      "data": {
1443 |       "text/html": [
1444 |        "<div>\n",
1445 |        "<style scoped>\n",
1446 |        "    .dataframe tbody tr th:only-of-type {\n",
1447 |        "        vertical-align: middle;\n",
1448 |        "    }\n",
1449 |        "\n",
1450 |        "    .dataframe tbody tr th {\n",
1451 |        "        vertical-align: top;\n",
1452 |        "    }\n",
1453 |        "\n",
1454 |        "    .dataframe thead th {\n",
1455 |        "        text-align: right;\n",
1456 |        "    }\n",
1457 |        "</style>\n",
1458 |        "<table border=\"1\" class=\"dataframe\">\n",
1459 |        "  <thead>\n",
1460 |        "    <tr style=\"text-align: right;\">\n",
1461 |        "      <th></th>\n",
1462 |        "      <th>0</th>\n",
1463 |        "      <th>1</th>\n",
1464 |        "      <th>2</th>\n",
1465 |        "      <th>3</th>\n",
1466 |        "      <th>4</th>\n",
1467 |        "    </tr>\n",
1468 |        "  </thead>\n",
1469 |        "  <tbody>\n",
1470 |        "    <tr>\n",
1471 |        "      <th>0</th>\n",
1472 |        "      <td>-1.438781</td>\n",
1473 |        "      <td>0.584173</td>\n",
1474 |        "      <td>-0.694112</td>\n",
1475 |        "      <td>0.135304</td>\n",
1476 |        "      <td>0.409292</td>\n",
1477 |        "    </tr>\n",
1478 |        "    <tr>\n",
1479 |        "      <th>1</th>\n",
1480 |        "      <td>-2.203219</td>\n",
1481 |        "      <td>1.232487</td>\n",
1482 |        "      <td>1.284779</td>\n",
1483 |        "      <td>-2.460982</td>\n",
1484 |        "      <td>-0.855321</td>\n",
1485 |        "    </tr>\n",
1486 |        "    <tr>\n",
1487 |        "      <th>2</th>\n",
1488 |        "      <td>-0.827212</td>\n",
1489 |        "      <td>-0.293645</td>\n",
1490 |        "      <td>-0.679745</td>\n",
1491 |        "      <td>0.209145</td>\n",
1492 |        "      <td>-0.402497</td>\n",
1493 |        "    </tr>\n",
1494 |        "    <tr>\n",
1495 |        "      <th>3</th>\n",
1496 |        "      <td>0.471747</td>\n",
1497 |        "      <td>1.141361</td>\n",
1498 |        "      <td>0.429878</td>\n",
1499 |        "      <td>2.290840</td>\n",
1500 |        "      <td>-0.655701</td>\n",
1501 |        "    </tr>\n",
1502 |        "    <tr>\n",
1503 |        "      <th>4</th>\n",
1504 |        "      <td>-1.944334</td>\n",
1505 |        "      <td>0.186785</td>\n",
1506 |        "      <td>1.031003</td>\n",
1507 |        "      <td>-0.633808</td>\n",
1508 |        "      <td>0.413554</td>\n",
1509 |        "    </tr>\n",
1510 |        "  </tbody>\n",
1511 |        "</table>\n",
1512 |        "</div>"
1513 |       ],
1514 |       "text/plain": [
1515 |        "          0         1         2         3         4\n",
1516 |        "0 -1.438781  0.584173 -0.694112  0.135304  0.409292\n",
1517 |        "1 -2.203219  1.232487  1.284779 -2.460982 -0.855321\n",
1518 |        "2 -0.827212 -0.293645 -0.679745  0.209145 -0.402497\n",
1519 |        "3  0.471747  1.141361  0.429878  2.290840 -0.655701\n",
1520 |        "4 -1.944334  0.186785  1.031003 -0.633808  0.413554"
1521 |       ]
1522 |      },
1523 |      "execution_count": 27,
1524 |      "metadata": {},
1525 |      "output_type": "execute_result"
1526 |     }
1527 |    ],
1528 |    "source": [
1529 |     "df = pd.DataFrame(np.random.randn(25).reshape((5, 5)))\n",
1530 |     "df.head()"
1531 |    ]
1532 |   },
1533 |   {
1534 |    "cell_type": "code",
1535 |    "execution_count": 28,
1536 |    "metadata": {},
1537 |    "outputs": [
1538 |     {
1539 |      "data": {
1540 |       "text/html": [
1541 |        "<div>\n",
1542 |        "<style scoped>\n",
1543 |        "    .dataframe tbody tr th:only-of-type {\n",
1544 |        "        vertical-align: middle;\n",
1545 |        "    }\n",
1546 |        "\n",
1547 |        "    .dataframe tbody tr th {\n",
1548 |        "        vertical-align: top;\n",
1549 |        "    }\n",
1550 |        "\n",
1551 |        "    .dataframe thead th {\n",
1552 |        "        text-align: right;\n",
1553 |        "    }\n",
1554 |        "</style>\n",
1555 |        "<table border=\"1\" class=\"dataframe\">\n",
1556 |        "  <thead>\n",
1557 |        "    <tr style=\"text-align: right;\">\n",
1558 |        "      <th></th>\n",
1559 |        "      <th>0</th>\n",
1560 |        "      <th>1</th>\n",
1561 |        "      <th>2</th>\n",
1562 |        "      <th>3</th>\n",
1563 |        "      <th>4</th>\n",
1564 |        "    </tr>\n",
1565 |        "  </thead>\n",
1566 |        "  <tbody>\n",
1567 |        "    <tr>\n",
1568 |        "      <th>0</th>\n",
1569 |        "      <td>NaN</td>\n",
1570 |        "      <td>0.584173</td>\n",
1571 |        "      <td>NaN</td>\n",
1572 |        "      <td>0.135304</td>\n",
1573 |        "      <td>0.409292</td>\n",
1574 |        "    </tr>\n",
1575 |        "    <tr>\n",
1576 |        "      <th>1</th>\n",
1577 |        "      <td>NaN</td>\n",
1578 |        "      <td>1.232487</td>\n",
1579 |        "      <td>1.284779</td>\n",
1580 |        "      <td>NaN</td>\n",
1581 |        "      <td>NaN</td>\n",
1582 |        "    </tr>\n",
1583 |        "    <tr>\n",
1584 |        "      <th>2</th>\n",
1585 |        "      <td>NaN</td>\n",
1586 |        "      <td>NaN</td>\n",
1587 |        "      <td>NaN</td>\n",
1588 |        "      <td>0.209145</td>\n",
1589 |        "      <td>NaN</td>\n",
1590 |        "    </tr>\n",
1591 |        "    <tr>\n",
1592 |        "      <th>3</th>\n",
1593 |        "      <td>0.471747</td>\n",
1594 |        "      <td>1.141361</td>\n",
1595 |        "      <td>0.429878</td>\n",
1596 |        "      <td>2.290840</td>\n",
1597 |        "      <td>NaN</td>\n",
1598 |        "    </tr>\n",
1599 |        "    <tr>\n",
1600 |        "      <th>4</th>\n",
1601 |        "      <td>NaN</td>\n",
1602 |        "      <td>0.186785</td>\n",
1603 |        "      <td>1.031003</td>\n",
1604 |        "      <td>NaN</td>\n",
1605 |        "      <td>0.413554</td>\n",
1606 |        "    </tr>\n",
1607 |        "  </tbody>\n",
1608 |        "</table>\n",
1609 |        "</div>"
1610 |       ],
1611 |       "text/plain": [
1612 |        "          0         1         2         3         4\n",
1613 |        "0       NaN  0.584173       NaN  0.135304  0.409292\n",
1614 |        "1       NaN  1.232487  1.284779       NaN       NaN\n",
1615 |        "2       NaN       NaN       NaN  0.209145       NaN\n",
1616 |        "3  0.471747  1.141361  0.429878  2.290840       NaN\n",
1617 |        "4       NaN  0.186785  1.031003       NaN  0.413554"
1618 |       ]
1619 |      },
1620 |      "execution_count": 28,
1621 |      "metadata": {},
1622 |      "output_type": "execute_result"
1623 |     }
1624 |    ],
1625 |    "source": [
1626 |     "df.where(df > 0)"
1627 |    ]
1628 |   },
1629 |   {
1630 |    "cell_type": "code",
1631 |    "execution_count": 29,
1632 |    "metadata": {},
1633 |    "outputs": [
1634 |     {
1635 |      "data": {
1636 |       "text/html": [
1637 |        "<div>\n",
1638 |        "<style scoped>\n",
1639 |        "    .dataframe tbody tr th:only-of-type {\n",
1640 |        "        vertical-align: middle;\n",
1641 |        "    }\n",
1642 |        "\n",
1643 |        "    .dataframe tbody tr th {\n",
1644 |        "        vertical-align: top;\n",
1645 |        "    }\n",
1646 |        "\n",
1647 |        "    .dataframe thead th {\n",
1648 |        "        text-align: right;\n",
1649 |        "    }\n",
1650 |        "</style>\n",
1651 |        "<table border=\"1\" class=\"dataframe\">\n",
1652 |        "  <thead>\n",
1653 |        "    <tr style=\"text-align: right;\">\n",
1654 |        "      <th></th>\n",
1655 |        "      <th>0</th>\n",
1656 |        "      <th>1</th>\n",
1657 |        "      <th>2</th>\n",
1658 |        "      <th>3</th>\n",
1659 |        "      <th>4</th>\n",
1660 |        "    </tr>\n",
1661 |        "  </thead>\n",
1662 |        "  <tbody>\n",
1663 |        "    <tr>\n",
1664 |        "      <th>0</th>\n",
1665 |        "      <td>NaN</td>\n",
1666 |        "      <td>0.584173</td>\n",
1667 |        "      <td>NaN</td>\n",
1668 |        "      <td>0.135304</td>\n",
1669 |        "      <td>0.409292</td>\n",
1670 |        "    </tr>\n",
1671 |        "    <tr>\n",
1672 |        "      <th>1</th>\n",
1673 |        "      <td>NaN</td>\n",
1674 |        "      <td>1.232487</td>\n",
1675 |        "      <td>1.284779</td>\n",
1676 |        "      <td>NaN</td>\n",
1677 |        "      <td>NaN</td>\n",
1678 |        "    </tr>\n",
1679 |        "    <tr>\n",
1680 |        "      <th>2</th>\n",
1681 |        "      <td>NaN</td>\n",
1682 |        "      <td>NaN</td>\n",
1683 |        "      <td>NaN</td>\n",
1684 |        "      <td>0.209145</td>\n",
1685 |        "      <td>NaN</td>\n",
1686 |        "    </tr>\n",
1687 |        "    <tr>\n",
1688 |        "      <th>3</th>\n",
1689 |        "      <td>0.471747</td>\n",
1690 |        "      <td>1.141361</td>\n",
1691 |        "      <td>0.429878</td>\n",
1692 |        "      <td>2.290840</td>\n",
1693 |        "      <td>NaN</td>\n",
1694 |        "    </tr>\n",
1695 |        "    <tr>\n",
1696 |        "      <th>4</th>\n",
1697 |        "      <td>NaN</td>\n",
1698 |        "      <td>0.186785</td>\n",
1699 |        "      <td>1.031003</td>\n",
1700 |        "      <td>NaN</td>\n",
1701 |        "      <td>0.413554</td>\n",
1702 |        "    </tr>\n",
1703 |        "  </tbody>\n",
1704 |        "</table>\n",
1705 |        "</div>"
1706 |       ],
1707 |       "text/plain": [
1708 |        "          0         1         2         3         4\n",
1709 |        "0       NaN  0.584173       NaN  0.135304  0.409292\n",
1710 |        "1       NaN  1.232487  1.284779       NaN       NaN\n",
1711 |        "2       NaN       NaN       NaN  0.209145       NaN\n",
1712 |        "3  0.471747  1.141361  0.429878  2.290840       NaN\n",
1713 |        "4       NaN  0.186785  1.031003       NaN  0.413554"
1714 |       ]
1715 |      },
1716 |      "execution_count": 29,
1717 |      "metadata": {},
1718 |      "output_type": "execute_result"
1719 |     }
1720 |    ],
1721 |    "source": [
1722 |     "df[df < 0] = np.NaN\n",
1723 |     "df"
1724 |    ]
1725 |   },
1726 |   {
1727 |    "cell_type": "markdown",
1728 |    "metadata": {},
1729 |    "source": [
1730 |     "## Conclusion\n",
1731 |     "\n",
1732 |     "So that's it. This is really all I know about indexing and prob all you'll need to know too. If you've got any question or comment please add them! \n",
1733 |     "\n",
1734 |     "p.s. there are not really any great tutorials on this in particular, but if you know of one I should link, let me know."
1735 |    ]
1736 |   },
1737 |   {
1738 |    "cell_type": "code",
1739 |    "execution_count": null,
1740 |    "metadata": {},
1741 |    "outputs": [],
1742 |    "source": []
1743 |   }
1744 |  ],
1745 |  "metadata": {
1746 |   "kernelspec": {
1747 |    "display_name": "Python 3",
1748 |    "language": "python",
1749 |    "name": "python3"
1750 |   },
1751 |   "language_info": {
1752 |    "codemirror_mode": {
1753 |     "name": "ipython",
1754 |     "version": 3
1755 |    },
1756 |    "file_extension": ".py",
1757 |    "mimetype": "text/x-python",
1758 |    "name": "python",
1759 |    "nbconvert_exporter": "python",
1760 |    "pygments_lexer": "ipython3",
1761 |    "version": "3.7.3"
1762 |   }
1763 |  },
1764 |  "nbformat": 4,
1765 |  "nbformat_minor": 2
1766 | }
1767 | 


--------------------------------------------------------------------------------
/notebooks/Misc Functions.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 2,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import seaborn as sns\n",
 10 |     "import pandas as pd\n",
 11 |     "import numpy as np"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "# Pandas Misc Useful Functions\n",
 19 |     "\n",
 20 |     "Pandas is massive. I mean really massive! There are hundreds of functions. So we are not going to go over all of them here, but I'll show you a couple of the most useful ones:\n",
 21 |     "\n",
 22 |     "\n",
 23 |     "This time each function has a bit of documentation, so let's just jump right in. "
 24 |    ]
 25 |   },
 26 |   {
 27 |    "cell_type": "code",
 28 |    "execution_count": 3,
 29 |    "metadata": {},
 30 |    "outputs": [
 31 |     {
 32 |      "data": {
 33 |       "text/html": [
 34 |        "<div>\n",
 35 |        "<style scoped>\n",
 36 |        "    .dataframe tbody tr th:only-of-type {\n",
 37 |        "        vertical-align: middle;\n",
 38 |        "    }\n",
 39 |        "\n",
 40 |        "    .dataframe tbody tr th {\n",
 41 |        "        vertical-align: top;\n",
 42 |        "    }\n",
 43 |        "\n",
 44 |        "    .dataframe thead th {\n",
 45 |        "        text-align: right;\n",
 46 |        "    }\n",
 47 |        "</style>\n",
 48 |        "<table border=\"1\" class=\"dataframe\">\n",
 49 |        "  <thead>\n",
 50 |        "    <tr style=\"text-align: right;\">\n",
 51 |        "      <th></th>\n",
 52 |        "      <th>total_bill</th>\n",
 53 |        "      <th>tip</th>\n",
 54 |        "      <th>sex</th>\n",
 55 |        "      <th>smoker</th>\n",
 56 |        "      <th>day</th>\n",
 57 |        "      <th>time</th>\n",
 58 |        "      <th>size</th>\n",
 59 |        "    </tr>\n",
 60 |        "  </thead>\n",
 61 |        "  <tbody>\n",
 62 |        "    <tr>\n",
 63 |        "      <th>0</th>\n",
 64 |        "      <td>16.99</td>\n",
 65 |        "      <td>1.01</td>\n",
 66 |        "      <td>Female</td>\n",
 67 |        "      <td>No</td>\n",
 68 |        "      <td>Sun</td>\n",
 69 |        "      <td>Dinner</td>\n",
 70 |        "      <td>2</td>\n",
 71 |        "    </tr>\n",
 72 |        "    <tr>\n",
 73 |        "      <th>1</th>\n",
 74 |        "      <td>10.34</td>\n",
 75 |        "      <td>1.66</td>\n",
 76 |        "      <td>Male</td>\n",
 77 |        "      <td>No</td>\n",
 78 |        "      <td>Sun</td>\n",
 79 |        "      <td>Dinner</td>\n",
 80 |        "      <td>3</td>\n",
 81 |        "    </tr>\n",
 82 |        "    <tr>\n",
 83 |        "      <th>2</th>\n",
 84 |        "      <td>21.01</td>\n",
 85 |        "      <td>3.50</td>\n",
 86 |        "      <td>Male</td>\n",
 87 |        "      <td>No</td>\n",
 88 |        "      <td>Sun</td>\n",
 89 |        "      <td>Dinner</td>\n",
 90 |        "      <td>3</td>\n",
 91 |        "    </tr>\n",
 92 |        "  </tbody>\n",
 93 |        "</table>\n",
 94 |        "</div>"
 95 |       ],
 96 |       "text/plain": [
 97 |        "   total_bill   tip     sex smoker  day    time  size\n",
 98 |        "0       16.99  1.01  Female     No  Sun  Dinner     2\n",
 99 |        "1       10.34  1.66    Male     No  Sun  Dinner     3\n",
100 |        "2       21.01  3.50    Male     No  Sun  Dinner     3"
101 |       ]
102 |      },
103 |      "execution_count": 3,
104 |      "metadata": {},
105 |      "output_type": "execute_result"
106 |     }
107 |    ],
108 |    "source": [
109 |     "tips = sns.load_dataset('tips')\n",
110 |     "tips.head(3)"
111 |    ]
112 |   },
113 |   {
114 |    "cell_type": "markdown",
115 |    "metadata": {},
116 |    "source": [
117 |     "## Sample\n",
118 |     "\n",
119 |     "Pretty useful. Let's you get samples from a dataframe in a pretty powerful diverse way.\n",
120 |     "\n",
121 |     "http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#selecting-random-samples"
122 |    ]
123 |   },
124 |   {
125 |    "cell_type": "code",
126 |    "execution_count": 4,
127 |    "metadata": {},
128 |    "outputs": [
129 |     {
130 |      "data": {
131 |       "text/html": [
132 |        "<div>\n",
133 |        "<style scoped>\n",
134 |        "    .dataframe tbody tr th:only-of-type {\n",
135 |        "        vertical-align: middle;\n",
136 |        "    }\n",
137 |        "\n",
138 |        "    .dataframe tbody tr th {\n",
139 |        "        vertical-align: top;\n",
140 |        "    }\n",
141 |        "\n",
142 |        "    .dataframe thead th {\n",
143 |        "        text-align: right;\n",
144 |        "    }\n",
145 |        "</style>\n",
146 |        "<table border=\"1\" class=\"dataframe\">\n",
147 |        "  <thead>\n",
148 |        "    <tr style=\"text-align: right;\">\n",
149 |        "      <th></th>\n",
150 |        "      <th>total_bill</th>\n",
151 |        "      <th>tip</th>\n",
152 |        "      <th>sex</th>\n",
153 |        "      <th>smoker</th>\n",
154 |        "      <th>day</th>\n",
155 |        "      <th>time</th>\n",
156 |        "      <th>size</th>\n",
157 |        "    </tr>\n",
158 |        "  </thead>\n",
159 |        "  <tbody>\n",
160 |        "    <tr>\n",
161 |        "      <th>137</th>\n",
162 |        "      <td>14.15</td>\n",
163 |        "      <td>2.0</td>\n",
164 |        "      <td>Female</td>\n",
165 |        "      <td>No</td>\n",
166 |        "      <td>Thur</td>\n",
167 |        "      <td>Lunch</td>\n",
168 |        "      <td>2</td>\n",
169 |        "    </tr>\n",
170 |        "    <tr>\n",
171 |        "      <th>159</th>\n",
172 |        "      <td>16.49</td>\n",
173 |        "      <td>2.0</td>\n",
174 |        "      <td>Male</td>\n",
175 |        "      <td>No</td>\n",
176 |        "      <td>Sun</td>\n",
177 |        "      <td>Dinner</td>\n",
178 |        "      <td>4</td>\n",
179 |        "    </tr>\n",
180 |        "    <tr>\n",
181 |        "      <th>203</th>\n",
182 |        "      <td>16.40</td>\n",
183 |        "      <td>2.5</td>\n",
184 |        "      <td>Female</td>\n",
185 |        "      <td>Yes</td>\n",
186 |        "      <td>Thur</td>\n",
187 |        "      <td>Lunch</td>\n",
188 |        "      <td>2</td>\n",
189 |        "    </tr>\n",
190 |        "    <tr>\n",
191 |        "      <th>216</th>\n",
192 |        "      <td>28.15</td>\n",
193 |        "      <td>3.0</td>\n",
194 |        "      <td>Male</td>\n",
195 |        "      <td>Yes</td>\n",
196 |        "      <td>Sat</td>\n",
197 |        "      <td>Dinner</td>\n",
198 |        "      <td>5</td>\n",
199 |        "    </tr>\n",
200 |        "    <tr>\n",
201 |        "      <th>32</th>\n",
202 |        "      <td>15.06</td>\n",
203 |        "      <td>3.0</td>\n",
204 |        "      <td>Female</td>\n",
205 |        "      <td>No</td>\n",
206 |        "      <td>Sat</td>\n",
207 |        "      <td>Dinner</td>\n",
208 |        "      <td>2</td>\n",
209 |        "    </tr>\n",
210 |        "  </tbody>\n",
211 |        "</table>\n",
212 |        "</div>"
213 |       ],
214 |       "text/plain": [
215 |        "     total_bill  tip     sex smoker   day    time  size\n",
216 |        "137       14.15  2.0  Female     No  Thur   Lunch     2\n",
217 |        "159       16.49  2.0    Male     No   Sun  Dinner     4\n",
218 |        "203       16.40  2.5  Female    Yes  Thur   Lunch     2\n",
219 |        "216       28.15  3.0    Male    Yes   Sat  Dinner     5\n",
220 |        "32        15.06  3.0  Female     No   Sat  Dinner     2"
221 |       ]
222 |      },
223 |      "execution_count": 4,
224 |      "metadata": {},
225 |      "output_type": "execute_result"
226 |     }
227 |    ],
228 |    "source": [
229 |     "tips.sample(5)"
230 |    ]
231 |   },
232 |   {
233 |    "cell_type": "code",
234 |    "execution_count": 5,
235 |    "metadata": {},
236 |    "outputs": [],
237 |    "source": [
238 |     "tips.sample?"
239 |    ]
240 |   },
241 |   {
242 |    "cell_type": "markdown",
243 |    "metadata": {},
244 |    "source": [
245 |     "## isin\n",
246 |     "\n",
247 |     "The next pretty useful function is called is in. It is applied to an entire column and is very useful in selecting specific rows\n",
248 |     "\n",
249 |     "http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-with-isin"
250 |    ]
251 |   },
252 |   {
253 |    "cell_type": "code",
254 |    "execution_count": 8,
255 |    "metadata": {},
256 |    "outputs": [
257 |     {
258 |      "data": {
259 |       "text/plain": [
260 |        "107     True\n",
261 |        "217     True\n",
262 |        "193    False\n",
263 |        "226    False\n",
264 |        "214     True\n",
265 |        "Name: day, dtype: bool"
266 |       ]
267 |      },
268 |      "execution_count": 8,
269 |      "metadata": {},
270 |      "output_type": "execute_result"
271 |     }
272 |    ],
273 |    "source": [
274 |     "is_weekend = tips.day.isin(['Sat', 'Sun']).sample(5)\n",
275 |     "is_weekend"
276 |    ]
277 |   },
278 |   {
279 |    "cell_type": "code",
280 |    "execution_count": 10,
281 |    "metadata": {},
282 |    "outputs": [
283 |     {
284 |      "data": {
285 |       "text/html": [
286 |        "<div>\n",
287 |        "<style scoped>\n",
288 |        "    .dataframe tbody tr th:only-of-type {\n",
289 |        "        vertical-align: middle;\n",
290 |        "    }\n",
291 |        "\n",
292 |        "    .dataframe tbody tr th {\n",
293 |        "        vertical-align: top;\n",
294 |        "    }\n",
295 |        "\n",
296 |        "    .dataframe thead th {\n",
297 |        "        text-align: right;\n",
298 |        "    }\n",
299 |        "</style>\n",
300 |        "<table border=\"1\" class=\"dataframe\">\n",
301 |        "  <thead>\n",
302 |        "    <tr style=\"text-align: right;\">\n",
303 |        "      <th></th>\n",
304 |        "      <th>total_bill</th>\n",
305 |        "      <th>tip</th>\n",
306 |        "      <th>sex</th>\n",
307 |        "      <th>smoker</th>\n",
308 |        "      <th>day</th>\n",
309 |        "      <th>time</th>\n",
310 |        "      <th>size</th>\n",
311 |        "    </tr>\n",
312 |        "  </thead>\n",
313 |        "  <tbody>\n",
314 |        "    <tr>\n",
315 |        "      <th>44</th>\n",
316 |        "      <td>30.40</td>\n",
317 |        "      <td>5.60</td>\n",
318 |        "      <td>Male</td>\n",
319 |        "      <td>No</td>\n",
320 |        "      <td>Sun</td>\n",
321 |        "      <td>Dinner</td>\n",
322 |        "      <td>4</td>\n",
323 |        "    </tr>\n",
324 |        "    <tr>\n",
325 |        "      <th>61</th>\n",
326 |        "      <td>13.81</td>\n",
327 |        "      <td>2.00</td>\n",
328 |        "      <td>Male</td>\n",
329 |        "      <td>Yes</td>\n",
330 |        "      <td>Sat</td>\n",
331 |        "      <td>Dinner</td>\n",
332 |        "      <td>2</td>\n",
333 |        "    </tr>\n",
334 |        "    <tr>\n",
335 |        "      <th>240</th>\n",
336 |        "      <td>27.18</td>\n",
337 |        "      <td>2.00</td>\n",
338 |        "      <td>Female</td>\n",
339 |        "      <td>Yes</td>\n",
340 |        "      <td>Sat</td>\n",
341 |        "      <td>Dinner</td>\n",
342 |        "      <td>2</td>\n",
343 |        "    </tr>\n",
344 |        "    <tr>\n",
345 |        "      <th>168</th>\n",
346 |        "      <td>10.59</td>\n",
347 |        "      <td>1.61</td>\n",
348 |        "      <td>Female</td>\n",
349 |        "      <td>Yes</td>\n",
350 |        "      <td>Sat</td>\n",
351 |        "      <td>Dinner</td>\n",
352 |        "      <td>2</td>\n",
353 |        "    </tr>\n",
354 |        "    <tr>\n",
355 |        "      <th>103</th>\n",
356 |        "      <td>22.42</td>\n",
357 |        "      <td>3.48</td>\n",
358 |        "      <td>Female</td>\n",
359 |        "      <td>Yes</td>\n",
360 |        "      <td>Sat</td>\n",
361 |        "      <td>Dinner</td>\n",
362 |        "      <td>2</td>\n",
363 |        "    </tr>\n",
364 |        "  </tbody>\n",
365 |        "</table>\n",
366 |        "</div>"
367 |       ],
368 |       "text/plain": [
369 |        "     total_bill   tip     sex smoker  day    time  size\n",
370 |        "44        30.40  5.60    Male     No  Sun  Dinner     4\n",
371 |        "61        13.81  2.00    Male    Yes  Sat  Dinner     2\n",
372 |        "240       27.18  2.00  Female    Yes  Sat  Dinner     2\n",
373 |        "168       10.59  1.61  Female    Yes  Sat  Dinner     2\n",
374 |        "103       22.42  3.48  Female    Yes  Sat  Dinner     2"
375 |       ]
376 |      },
377 |      "execution_count": 10,
378 |      "metadata": {},
379 |      "output_type": "execute_result"
380 |     }
381 |    ],
382 |    "source": [
383 |     "tips[tips.day.isin(['Sat', 'Sun'])].sample(5)"
384 |    ]
385 |   },
386 |   {
387 |    "cell_type": "markdown",
388 |    "metadata": {},
389 |    "source": [
390 |     "## drop_duplicates\n",
391 |     "\n",
392 |     "This one is a pretty useful function in a lot of respects, and it works on more than one column\n",
393 |     "\n",
394 |     "http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#duplicate-data"
395 |    ]
396 |   },
397 |   {
398 |    "cell_type": "code",
399 |    "execution_count": 11,
400 |    "metadata": {},
401 |    "outputs": [
402 |     {
403 |      "data": {
404 |       "text/html": [
405 |        "<div>\n",
406 |        "<style scoped>\n",
407 |        "    .dataframe tbody tr th:only-of-type {\n",
408 |        "        vertical-align: middle;\n",
409 |        "    }\n",
410 |        "\n",
411 |        "    .dataframe tbody tr th {\n",
412 |        "        vertical-align: top;\n",
413 |        "    }\n",
414 |        "\n",
415 |        "    .dataframe thead th {\n",
416 |        "        text-align: right;\n",
417 |        "    }\n",
418 |        "</style>\n",
419 |        "<table border=\"1\" class=\"dataframe\">\n",
420 |        "  <thead>\n",
421 |        "    <tr style=\"text-align: right;\">\n",
422 |        "      <th></th>\n",
423 |        "      <th>time</th>\n",
424 |        "      <th>day</th>\n",
425 |        "    </tr>\n",
426 |        "  </thead>\n",
427 |        "  <tbody>\n",
428 |        "    <tr>\n",
429 |        "      <th>0</th>\n",
430 |        "      <td>Dinner</td>\n",
431 |        "      <td>Sun</td>\n",
432 |        "    </tr>\n",
433 |        "    <tr>\n",
434 |        "      <th>19</th>\n",
435 |        "      <td>Dinner</td>\n",
436 |        "      <td>Sat</td>\n",
437 |        "    </tr>\n",
438 |        "    <tr>\n",
439 |        "      <th>77</th>\n",
440 |        "      <td>Lunch</td>\n",
441 |        "      <td>Thur</td>\n",
442 |        "    </tr>\n",
443 |        "    <tr>\n",
444 |        "      <th>90</th>\n",
445 |        "      <td>Dinner</td>\n",
446 |        "      <td>Fri</td>\n",
447 |        "    </tr>\n",
448 |        "    <tr>\n",
449 |        "      <th>220</th>\n",
450 |        "      <td>Lunch</td>\n",
451 |        "      <td>Fri</td>\n",
452 |        "    </tr>\n",
453 |        "    <tr>\n",
454 |        "      <th>243</th>\n",
455 |        "      <td>Dinner</td>\n",
456 |        "      <td>Thur</td>\n",
457 |        "    </tr>\n",
458 |        "  </tbody>\n",
459 |        "</table>\n",
460 |        "</div>"
461 |       ],
462 |       "text/plain": [
463 |        "       time   day\n",
464 |        "0    Dinner   Sun\n",
465 |        "19   Dinner   Sat\n",
466 |        "77    Lunch  Thur\n",
467 |        "90   Dinner   Fri\n",
468 |        "220   Lunch   Fri\n",
469 |        "243  Dinner  Thur"
470 |       ]
471 |      },
472 |      "execution_count": 11,
473 |      "metadata": {},
474 |      "output_type": "execute_result"
475 |     }
476 |    ],
477 |    "source": [
478 |     "tips[['time', 'day']].drop_duplicates(keep='first')"
479 |    ]
480 |   },
481 |   {
482 |    "cell_type": "markdown",
483 |    "metadata": {},
484 |    "source": [
485 |     "## cut\n",
486 |     "\n",
487 |     "This will cut your numeric data into equal buckets and then assign them labels depending on the bucket. Pretty useful and if you need something more granular you can use qcut.\n",
488 |     "\n",
489 |     "http://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html#tiling"
490 |    ]
491 |   },
492 |   {
493 |    "cell_type": "code",
494 |    "execution_count": 12,
495 |    "metadata": {},
496 |    "outputs": [
497 |     {
498 |      "data": {
499 |       "text/plain": [
500 |        "0    low\n",
501 |        "1    low\n",
502 |        "2    mid\n",
503 |        "3    mid\n",
504 |        "4    mid\n",
505 |        "Name: total_bill, dtype: category\n",
506 |        "Categories (3, object): [low < mid < high]"
507 |       ]
508 |      },
509 |      "execution_count": 12,
510 |      "metadata": {},
511 |      "output_type": "execute_result"
512 |     }
513 |    ],
514 |    "source": [
515 |     "pd.cut(tips['total_bill'], 3, labels=['low', 'mid', 'high']).head()"
516 |    ]
517 |   },
518 |   {
519 |    "cell_type": "markdown",
520 |    "metadata": {},
521 |    "source": [
522 |     "## str\n",
523 |     "\n",
524 |     "The str functions are really really useful and there are a ton of them. If you ever need to compute a string operation on a column first look here.\n",
525 |     "\n",
526 |     "http://pandas.pydata.org/pandas-docs/stable/user_guide/text.html"
527 |    ]
528 |   },
529 |   {
530 |    "cell_type": "code",
531 |    "execution_count": 13,
532 |    "metadata": {},
533 |    "outputs": [
534 |     {
535 |      "data": {
536 |       "text/plain": [
537 |        "0    female\n",
538 |        "1      male\n",
539 |        "2      male\n",
540 |        "3      male\n",
541 |        "4    female\n",
542 |        "Name: sex, dtype: object"
543 |       ]
544 |      },
545 |      "execution_count": 13,
546 |      "metadata": {},
547 |      "output_type": "execute_result"
548 |     }
549 |    ],
550 |    "source": [
551 |     "tips.sex.str.lower().head()"
552 |    ]
553 |   },
554 |   {
555 |    "cell_type": "markdown",
556 |    "metadata": {},
557 |    "source": [
558 |     "## NaNs\n",
559 |     "\n",
560 |     "There are three that are pretty useful:\n",
561 |     "\n",
562 |     "* isna\n",
563 |     "* fillna\n",
564 |     "* dropna\n",
565 |     "\n",
566 |     "They are all pretty self expanitory, but it is nice to know that they exist."
567 |    ]
568 |   },
569 |   {
570 |    "cell_type": "code",
571 |    "execution_count": 16,
572 |    "metadata": {},
573 |    "outputs": [
574 |     {
575 |      "data": {
576 |       "text/plain": [
577 |        "total_bill    0\n",
578 |        "tip           0\n",
579 |        "sex           0\n",
580 |        "smoker        0\n",
581 |        "day           0\n",
582 |        "time          0\n",
583 |        "size          0\n",
584 |        "dtype: int64"
585 |       ]
586 |      },
587 |      "execution_count": 16,
588 |      "metadata": {},
589 |      "output_type": "execute_result"
590 |     }
591 |    ],
592 |    "source": [
593 |     "tips.isna().sum()"
594 |    ]
595 |   },
596 |   {
597 |    "cell_type": "code",
598 |    "execution_count": 17,
599 |    "metadata": {},
600 |    "outputs": [],
601 |    "source": [
602 |     "tips.tip.fillna(0, inplace=True)"
603 |    ]
604 |   },
605 |   {
606 |    "cell_type": "code",
607 |    "execution_count": 18,
608 |    "metadata": {},
609 |    "outputs": [],
610 |    "source": [
611 |     "tips.dropna(axis=1, how='any', inplace=True)"
612 |    ]
613 |   },
614 |   {
615 |    "cell_type": "markdown",
616 |    "metadata": {},
617 |    "source": [
618 |     "## corr\n",
619 |     "\n",
620 |     "Calculate correlation. Pretty straightforward\n",
621 |     "\n",
622 |     "http://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#correlation"
623 |    ]
624 |   },
625 |   {
626 |    "cell_type": "code",
627 |    "execution_count": 19,
628 |    "metadata": {},
629 |    "outputs": [
630 |     {
631 |      "data": {
632 |       "text/html": [
633 |        "<div>\n",
634 |        "<style scoped>\n",
635 |        "    .dataframe tbody tr th:only-of-type {\n",
636 |        "        vertical-align: middle;\n",
637 |        "    }\n",
638 |        "\n",
639 |        "    .dataframe tbody tr th {\n",
640 |        "        vertical-align: top;\n",
641 |        "    }\n",
642 |        "\n",
643 |        "    .dataframe thead th {\n",
644 |        "        text-align: right;\n",
645 |        "    }\n",
646 |        "</style>\n",
647 |        "<table border=\"1\" class=\"dataframe\">\n",
648 |        "  <thead>\n",
649 |        "    <tr style=\"text-align: right;\">\n",
650 |        "      <th></th>\n",
651 |        "      <th>tip</th>\n",
652 |        "      <th>total_bill</th>\n",
653 |        "    </tr>\n",
654 |        "  </thead>\n",
655 |        "  <tbody>\n",
656 |        "    <tr>\n",
657 |        "      <th>tip</th>\n",
658 |        "      <td>1.000000</td>\n",
659 |        "      <td>0.675734</td>\n",
660 |        "    </tr>\n",
661 |        "    <tr>\n",
662 |        "      <th>total_bill</th>\n",
663 |        "      <td>0.675734</td>\n",
664 |        "      <td>1.000000</td>\n",
665 |        "    </tr>\n",
666 |        "  </tbody>\n",
667 |        "</table>\n",
668 |        "</div>"
669 |       ],
670 |       "text/plain": [
671 |        "                 tip  total_bill\n",
672 |        "tip         1.000000    0.675734\n",
673 |        "total_bill  0.675734    1.000000"
674 |       ]
675 |      },
676 |      "execution_count": 19,
677 |      "metadata": {},
678 |      "output_type": "execute_result"
679 |     }
680 |    ],
681 |    "source": [
682 |     "tips[['tip', 'total_bill']].corr('pearson')"
683 |    ]
684 |   },
685 |   {
686 |    "cell_type": "markdown",
687 |    "metadata": {},
688 |    "source": [
689 |     "## rank\n",
690 |     "\n",
691 |     "This will calculate what rank each entry is in the column.\n",
692 |     "\n",
693 |     "http://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#data-ranking"
694 |    ]
695 |   },
696 |   {
697 |    "cell_type": "code",
698 |    "execution_count": 20,
699 |    "metadata": {},
700 |    "outputs": [
701 |     {
702 |      "data": {
703 |       "text/plain": [
704 |        "0      5.0\n",
705 |        "1     33.0\n",
706 |        "2    177.0\n",
707 |        "3    165.0\n",
708 |        "4    185.0\n",
709 |        "Name: tip, dtype: float64"
710 |       ]
711 |      },
712 |      "execution_count": 20,
713 |      "metadata": {},
714 |      "output_type": "execute_result"
715 |     }
716 |    ],
717 |    "source": [
718 |     "tips.tip.rank().head()"
719 |    ]
720 |   },
721 |   {
722 |    "cell_type": "markdown",
723 |    "metadata": {},
724 |    "source": [
725 |     "## rename\n",
726 |     "\n",
727 |     "Rename while not completely needed, is a nice convienience funtion. You can rename columns or indexes.\n",
728 |     "\n",
729 |     "http://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#renaming-mapping-labels"
730 |    ]
731 |   },
732 |   {
733 |    "cell_type": "code",
734 |    "execution_count": 21,
735 |    "metadata": {},
736 |    "outputs": [],
737 |    "source": [
738 |     "tips.rename(columns={'total_bill': 'bill'}, inplace=True)"
739 |    ]
740 |   },
741 |   {
742 |    "cell_type": "markdown",
743 |    "metadata": {},
744 |    "source": [
745 |     "## itertuples\n",
746 |     "\n",
747 |     "There are a couple of iteraters for dataframes. I would very much so caution you to not use these unless you are really sure that you know what you are doing. These are not very fast compared to many functions, but when working with a small dataframe this can be really useful.\n",
748 |     "\n",
749 |     "http://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#itertuples"
750 |    ]
751 |   },
752 |   {
753 |    "cell_type": "code",
754 |    "execution_count": 22,
755 |    "metadata": {},
756 |    "outputs": [
757 |     {
758 |      "name": "stdout",
759 |      "output_type": "stream",
760 |      "text": [
761 |       "Pandas(Index=0, bill=16.99, tip=1.01, sex='Female', smoker='No', day='Sun', time='Dinner', size=2)\n"
762 |      ]
763 |     }
764 |    ],
765 |    "source": [
766 |     "for tup in tips.itertuples():\n",
767 |     "    print(tup)\n",
768 |     "    break"
769 |    ]
770 |   },
771 |   {
772 |    "cell_type": "markdown",
773 |    "metadata": {},
774 |    "source": [
775 |     "## Conclusion\n",
776 |     "\n",
777 |     "I hope this has been a bit interesting, but these are the functions that I use most (other than the funcitons I demonstrated in the other notebooks)\n",
778 |     "\n",
779 |     "There are a couple of other things that I have not gone over, but if I get enough interest I'd be happy to make:\n",
780 |     "\n",
781 |     "\n",
782 |     "* timeseries \n",
783 |     "* io\n",
784 |     "* performance\n",
785 |     "\n",
786 |     "Please let me know if these interest you! And at this point you should be ready for all the exercises listed [here](https://github.com/guipsamora/pandas_exercises#merge)"
787 |    ]
788 |   },
789 |   {
790 |    "cell_type": "code",
791 |    "execution_count": 23,
792 |    "metadata": {},
793 |    "outputs": [
794 |     {
795 |      "name": "stdout",
796 |      "output_type": "stream",
797 |      "text": [
798 |       "Combining DataFrames.ipynb             README.md\r\n",
799 |       "Group Operations.ipynb                 Row-Column Transformations.ipynb\r\n",
800 |       "Indexing and Selecting.ipynb           \u001b[34menv\u001b[m\u001b[m/\r\n",
801 |       "Misc Functions.ipynb                   requirements.txt\r\n",
802 |       "Pandas Intro to Data Structures.ipynb\r\n"
803 |      ]
804 |     }
805 |    ],
806 |    "source": [
807 |     "ls"
808 |    ]
809 |   },
810 |   {
811 |    "cell_type": "code",
812 |    "execution_count": null,
813 |    "metadata": {},
814 |    "outputs": [],
815 |    "source": []
816 |   }
817 |  ],
818 |  "metadata": {
819 |   "kernelspec": {
820 |    "display_name": "Python 3",
821 |    "language": "python",
822 |    "name": "python3"
823 |   },
824 |   "language_info": {
825 |    "codemirror_mode": {
826 |     "name": "ipython",
827 |     "version": 3
828 |    },
829 |    "file_extension": ".py",
830 |    "mimetype": "text/x-python",
831 |    "name": "python",
832 |    "nbconvert_exporter": "python",
833 |    "pygments_lexer": "ipython3",
834 |    "version": "3.7.3"
835 |   }
836 |  },
837 |  "nbformat": 4,
838 |  "nbformat_minor": 2
839 | }
840 | 


--------------------------------------------------------------------------------
/notebooks/Row-Column Transformations.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "code",
   5 |    "execution_count": 1,
   6 |    "metadata": {},
   7 |    "outputs": [],
   8 |    "source": [
   9 |     "import seaborn as sns\n",
  10 |     "import pandas as pd\n",
  11 |     "import numpy as np"
  12 |    ]
  13 |   },
  14 |   {
  15 |    "cell_type": "markdown",
  16 |    "metadata": {},
  17 |    "source": [
  18 |     "# Pandas Row-Column Transformations\n",
  19 |     "\n",
  20 |     "There comes a time in the life of any data scientist when he or she needs to transform the set of columns in a dataset into rows and vice versa.\n",
  21 |     "\n",
  22 |     "This is not a common operation, but it does happen every now and then. Pandas has two set of methods to do this:\n",
  23 |     "\n",
  24 |     "* stack and unstack\n",
  25 |     "* pivot and melt\n",
  26 |     "\n",
  27 |     "Again these sets of methods basically do the same thing.\n",
  28 |     "\n",
  29 |     "\n",
  30 |     "I have found that stack and unstack are much more stable but a bit less powerful. So those are the ones I use. \n",
  31 |     "\n",
  32 |     "Right at the end we will go over pandas dummy variables being the last way to make the transformation. \n",
  33 |     "\n",
  34 |     "Check out the full documentation for both [stack and unstack](http://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html) and [dummy variables](http://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html#computing-indicator-dummy-variables), but be warned it is a bit long :)"
  35 |    ]
  36 |   },
  37 |   {
  38 |    "cell_type": "markdown",
  39 |    "metadata": {},
  40 |    "source": [
  41 |     "Okay Let's get started"
  42 |    ]
  43 |   },
  44 |   {
  45 |    "cell_type": "code",
  46 |    "execution_count": 2,
  47 |    "metadata": {},
  48 |    "outputs": [
  49 |     {
  50 |      "data": {
  51 |       "text/html": [
  52 |        "<div>\n",
  53 |        "<style scoped>\n",
  54 |        "    .dataframe tbody tr th:only-of-type {\n",
  55 |        "        vertical-align: middle;\n",
  56 |        "    }\n",
  57 |        "\n",
  58 |        "    .dataframe tbody tr th {\n",
  59 |        "        vertical-align: top;\n",
  60 |        "    }\n",
  61 |        "\n",
  62 |        "    .dataframe thead th {\n",
  63 |        "        text-align: right;\n",
  64 |        "    }\n",
  65 |        "</style>\n",
  66 |        "<table border=\"1\" class=\"dataframe\">\n",
  67 |        "  <thead>\n",
  68 |        "    <tr style=\"text-align: right;\">\n",
  69 |        "      <th></th>\n",
  70 |        "      <th>total_bill</th>\n",
  71 |        "      <th>tip</th>\n",
  72 |        "      <th>sex</th>\n",
  73 |        "      <th>smoker</th>\n",
  74 |        "      <th>day</th>\n",
  75 |        "      <th>time</th>\n",
  76 |        "      <th>size</th>\n",
  77 |        "    </tr>\n",
  78 |        "  </thead>\n",
  79 |        "  <tbody>\n",
  80 |        "    <tr>\n",
  81 |        "      <th>0</th>\n",
  82 |        "      <td>16.99</td>\n",
  83 |        "      <td>1.01</td>\n",
  84 |        "      <td>Female</td>\n",
  85 |        "      <td>No</td>\n",
  86 |        "      <td>Sun</td>\n",
  87 |        "      <td>Dinner</td>\n",
  88 |        "      <td>2</td>\n",
  89 |        "    </tr>\n",
  90 |        "    <tr>\n",
  91 |        "      <th>1</th>\n",
  92 |        "      <td>10.34</td>\n",
  93 |        "      <td>1.66</td>\n",
  94 |        "      <td>Male</td>\n",
  95 |        "      <td>No</td>\n",
  96 |        "      <td>Sun</td>\n",
  97 |        "      <td>Dinner</td>\n",
  98 |        "      <td>3</td>\n",
  99 |        "    </tr>\n",
 100 |        "    <tr>\n",
 101 |        "      <th>2</th>\n",
 102 |        "      <td>21.01</td>\n",
 103 |        "      <td>3.50</td>\n",
 104 |        "      <td>Male</td>\n",
 105 |        "      <td>No</td>\n",
 106 |        "      <td>Sun</td>\n",
 107 |        "      <td>Dinner</td>\n",
 108 |        "      <td>3</td>\n",
 109 |        "    </tr>\n",
 110 |        "  </tbody>\n",
 111 |        "</table>\n",
 112 |        "</div>"
 113 |       ],
 114 |       "text/plain": [
 115 |        "   total_bill   tip     sex smoker  day    time  size\n",
 116 |        "0       16.99  1.01  Female     No  Sun  Dinner     2\n",
 117 |        "1       10.34  1.66    Male     No  Sun  Dinner     3\n",
 118 |        "2       21.01  3.50    Male     No  Sun  Dinner     3"
 119 |       ]
 120 |      },
 121 |      "execution_count": 2,
 122 |      "metadata": {},
 123 |      "output_type": "execute_result"
 124 |     }
 125 |    ],
 126 |    "source": [
 127 |     "tips = sns.load_dataset('tips')\n",
 128 |     "tips.head(3)"
 129 |    ]
 130 |   },
 131 |   {
 132 |    "cell_type": "markdown",
 133 |    "metadata": {},
 134 |    "source": [
 135 |     "A question we might want to ask is: what is the male to female ratio on different days of the week?\n",
 136 |     "\n",
 137 |     "To do this we might start with a groupby:"
 138 |    ]
 139 |   },
 140 |   {
 141 |    "cell_type": "code",
 142 |    "execution_count": 4,
 143 |    "metadata": {},
 144 |    "outputs": [
 145 |     {
 146 |      "data": {
 147 |       "text/html": [
 148 |        "<div>\n",
 149 |        "<style scoped>\n",
 150 |        "    .dataframe tbody tr th:only-of-type {\n",
 151 |        "        vertical-align: middle;\n",
 152 |        "    }\n",
 153 |        "\n",
 154 |        "    .dataframe tbody tr th {\n",
 155 |        "        vertical-align: top;\n",
 156 |        "    }\n",
 157 |        "\n",
 158 |        "    .dataframe thead th {\n",
 159 |        "        text-align: right;\n",
 160 |        "    }\n",
 161 |        "</style>\n",
 162 |        "<table border=\"1\" class=\"dataframe\">\n",
 163 |        "  <thead>\n",
 164 |        "    <tr style=\"text-align: right;\">\n",
 165 |        "      <th></th>\n",
 166 |        "      <th></th>\n",
 167 |        "      <th>size</th>\n",
 168 |        "    </tr>\n",
 169 |        "    <tr>\n",
 170 |        "      <th>day</th>\n",
 171 |        "      <th>sex</th>\n",
 172 |        "      <th></th>\n",
 173 |        "    </tr>\n",
 174 |        "  </thead>\n",
 175 |        "  <tbody>\n",
 176 |        "    <tr>\n",
 177 |        "      <th rowspan=\"2\" valign=\"top\">Thur</th>\n",
 178 |        "      <th>Male</th>\n",
 179 |        "      <td>73</td>\n",
 180 |        "    </tr>\n",
 181 |        "    <tr>\n",
 182 |        "      <th>Female</th>\n",
 183 |        "      <td>79</td>\n",
 184 |        "    </tr>\n",
 185 |        "    <tr>\n",
 186 |        "      <th rowspan=\"2\" valign=\"top\">Fri</th>\n",
 187 |        "      <th>Male</th>\n",
 188 |        "      <td>21</td>\n",
 189 |        "    </tr>\n",
 190 |        "    <tr>\n",
 191 |        "      <th>Female</th>\n",
 192 |        "      <td>19</td>\n",
 193 |        "    </tr>\n",
 194 |        "    <tr>\n",
 195 |        "      <th rowspan=\"2\" valign=\"top\">Sat</th>\n",
 196 |        "      <th>Male</th>\n",
 197 |        "      <td>156</td>\n",
 198 |        "    </tr>\n",
 199 |        "    <tr>\n",
 200 |        "      <th>Female</th>\n",
 201 |        "      <td>63</td>\n",
 202 |        "    </tr>\n",
 203 |        "    <tr>\n",
 204 |        "      <th rowspan=\"2\" valign=\"top\">Sun</th>\n",
 205 |        "      <th>Male</th>\n",
 206 |        "      <td>163</td>\n",
 207 |        "    </tr>\n",
 208 |        "    <tr>\n",
 209 |        "      <th>Female</th>\n",
 210 |        "      <td>53</td>\n",
 211 |        "    </tr>\n",
 212 |        "  </tbody>\n",
 213 |        "</table>\n",
 214 |        "</div>"
 215 |       ],
 216 |       "text/plain": [
 217 |        "             size\n",
 218 |        "day  sex         \n",
 219 |        "Thur Male      73\n",
 220 |        "     Female    79\n",
 221 |        "Fri  Male      21\n",
 222 |        "     Female    19\n",
 223 |        "Sat  Male     156\n",
 224 |        "     Female    63\n",
 225 |        "Sun  Male     163\n",
 226 |        "     Female    53"
 227 |       ]
 228 |      },
 229 |      "execution_count": 4,
 230 |      "metadata": {},
 231 |      "output_type": "execute_result"
 232 |     }
 233 |    ],
 234 |    "source": [
 235 |     "tips_gb = tips.groupby(['day', 'sex']).agg({'size': 'sum'})\n",
 236 |     "tips_gb"
 237 |    ]
 238 |   },
 239 |   {
 240 |    "cell_type": "markdown",
 241 |    "metadata": {},
 242 |    "source": [
 243 |     "So we are getting somewhere, but it is a bit hard to tell the number of male and female visitors by looking at it, and you might want to do more columnwise operations comparing the male to the female visitors.\n",
 244 |     "\n",
 245 |     "So what you might want to do is take the values in the column sex and make them into column. This is where unstacking comes in!\n",
 246 |     "\n",
 247 |     "## Unstack"
 248 |    ]
 249 |   },
 250 |   {
 251 |    "cell_type": "code",
 252 |    "execution_count": 14,
 253 |    "metadata": {},
 254 |    "outputs": [
 255 |     {
 256 |      "data": {
 257 |       "text/html": [
 258 |        "<div>\n",
 259 |        "<style scoped>\n",
 260 |        "    .dataframe tbody tr th:only-of-type {\n",
 261 |        "        vertical-align: middle;\n",
 262 |        "    }\n",
 263 |        "\n",
 264 |        "    .dataframe tbody tr th {\n",
 265 |        "        vertical-align: top;\n",
 266 |        "    }\n",
 267 |        "\n",
 268 |        "    .dataframe thead tr th {\n",
 269 |        "        text-align: left;\n",
 270 |        "    }\n",
 271 |        "\n",
 272 |        "    .dataframe thead tr:last-of-type th {\n",
 273 |        "        text-align: right;\n",
 274 |        "    }\n",
 275 |        "</style>\n",
 276 |        "<table border=\"1\" class=\"dataframe\">\n",
 277 |        "  <thead>\n",
 278 |        "    <tr>\n",
 279 |        "      <th></th>\n",
 280 |        "      <th colspan=\"2\" halign=\"left\">size</th>\n",
 281 |        "    </tr>\n",
 282 |        "    <tr>\n",
 283 |        "      <th>sex</th>\n",
 284 |        "      <th>Male</th>\n",
 285 |        "      <th>Female</th>\n",
 286 |        "    </tr>\n",
 287 |        "    <tr>\n",
 288 |        "      <th>day</th>\n",
 289 |        "      <th></th>\n",
 290 |        "      <th></th>\n",
 291 |        "    </tr>\n",
 292 |        "  </thead>\n",
 293 |        "  <tbody>\n",
 294 |        "    <tr>\n",
 295 |        "      <th>Thur</th>\n",
 296 |        "      <td>73</td>\n",
 297 |        "      <td>79</td>\n",
 298 |        "    </tr>\n",
 299 |        "    <tr>\n",
 300 |        "      <th>Fri</th>\n",
 301 |        "      <td>21</td>\n",
 302 |        "      <td>19</td>\n",
 303 |        "    </tr>\n",
 304 |        "    <tr>\n",
 305 |        "      <th>Sat</th>\n",
 306 |        "      <td>156</td>\n",
 307 |        "      <td>63</td>\n",
 308 |        "    </tr>\n",
 309 |        "    <tr>\n",
 310 |        "      <th>Sun</th>\n",
 311 |        "      <td>163</td>\n",
 312 |        "      <td>53</td>\n",
 313 |        "    </tr>\n",
 314 |        "  </tbody>\n",
 315 |        "</table>\n",
 316 |        "</div>"
 317 |       ],
 318 |       "text/plain": [
 319 |        "     size       \n",
 320 |        "sex  Male Female\n",
 321 |        "day             \n",
 322 |        "Thur   73     79\n",
 323 |        "Fri    21     19\n",
 324 |        "Sat   156     63\n",
 325 |        "Sun   163     53"
 326 |       ]
 327 |      },
 328 |      "execution_count": 14,
 329 |      "metadata": {},
 330 |      "output_type": "execute_result"
 331 |     }
 332 |    ],
 333 |    "source": [
 334 |     "tips_us = tips_gb.unstack()\n",
 335 |     "tips_us"
 336 |    ]
 337 |   },
 338 |   {
 339 |    "cell_type": "markdown",
 340 |    "metadata": {},
 341 |    "source": [
 342 |     "Notice we basically moved an index to the columns!"
 343 |    ]
 344 |   },
 345 |   {
 346 |    "cell_type": "code",
 347 |    "execution_count": 15,
 348 |    "metadata": {},
 349 |    "outputs": [
 350 |     {
 351 |      "data": {
 352 |       "text/html": [
 353 |        "<div>\n",
 354 |        "<style scoped>\n",
 355 |        "    .dataframe tbody tr th:only-of-type {\n",
 356 |        "        vertical-align: middle;\n",
 357 |        "    }\n",
 358 |        "\n",
 359 |        "    .dataframe tbody tr th {\n",
 360 |        "        vertical-align: top;\n",
 361 |        "    }\n",
 362 |        "\n",
 363 |        "    .dataframe thead tr th {\n",
 364 |        "        text-align: left;\n",
 365 |        "    }\n",
 366 |        "\n",
 367 |        "    .dataframe thead tr:last-of-type th {\n",
 368 |        "        text-align: right;\n",
 369 |        "    }\n",
 370 |        "</style>\n",
 371 |        "<table border=\"1\" class=\"dataframe\">\n",
 372 |        "  <thead>\n",
 373 |        "    <tr>\n",
 374 |        "      <th></th>\n",
 375 |        "      <th colspan=\"4\" halign=\"left\">size</th>\n",
 376 |        "    </tr>\n",
 377 |        "    <tr>\n",
 378 |        "      <th>day</th>\n",
 379 |        "      <th>Thur</th>\n",
 380 |        "      <th>Fri</th>\n",
 381 |        "      <th>Sat</th>\n",
 382 |        "      <th>Sun</th>\n",
 383 |        "    </tr>\n",
 384 |        "    <tr>\n",
 385 |        "      <th>sex</th>\n",
 386 |        "      <th></th>\n",
 387 |        "      <th></th>\n",
 388 |        "      <th></th>\n",
 389 |        "      <th></th>\n",
 390 |        "    </tr>\n",
 391 |        "  </thead>\n",
 392 |        "  <tbody>\n",
 393 |        "    <tr>\n",
 394 |        "      <th>Male</th>\n",
 395 |        "      <td>73</td>\n",
 396 |        "      <td>21</td>\n",
 397 |        "      <td>156</td>\n",
 398 |        "      <td>163</td>\n",
 399 |        "    </tr>\n",
 400 |        "    <tr>\n",
 401 |        "      <th>Female</th>\n",
 402 |        "      <td>79</td>\n",
 403 |        "      <td>19</td>\n",
 404 |        "      <td>63</td>\n",
 405 |        "      <td>53</td>\n",
 406 |        "    </tr>\n",
 407 |        "  </tbody>\n",
 408 |        "</table>\n",
 409 |        "</div>"
 410 |       ],
 411 |       "text/plain": [
 412 |        "       size              \n",
 413 |        "day    Thur Fri  Sat  Sun\n",
 414 |        "sex                      \n",
 415 |        "Male     73  21  156  163\n",
 416 |        "Female   79  19   63   53"
 417 |       ]
 418 |      },
 419 |      "execution_count": 15,
 420 |      "metadata": {},
 421 |      "output_type": "execute_result"
 422 |     }
 423 |    ],
 424 |    "source": [
 425 |     "# you could do the same with the days of the week\n",
 426 |     "tips_gb.unstack(0)"
 427 |    ]
 428 |   },
 429 |   {
 430 |    "cell_type": "markdown",
 431 |    "metadata": {},
 432 |    "source": [
 433 |     "The problem is that now we have this odd new object as the columns:"
 434 |    ]
 435 |   },
 436 |   {
 437 |    "cell_type": "code",
 438 |    "execution_count": 16,
 439 |    "metadata": {},
 440 |    "outputs": [
 441 |     {
 442 |      "data": {
 443 |       "text/plain": [
 444 |        "MultiIndex(levels=[['size'], ['Male', 'Female']],\n",
 445 |        "           codes=[[0, 0], [0, 1]],\n",
 446 |        "           names=[None, 'sex'])"
 447 |       ]
 448 |      },
 449 |      "execution_count": 16,
 450 |      "metadata": {},
 451 |      "output_type": "execute_result"
 452 |     }
 453 |    ],
 454 |    "source": [
 455 |     "tips_us.columns"
 456 |    ]
 457 |   },
 458 |   {
 459 |    "cell_type": "markdown",
 460 |    "metadata": {},
 461 |    "source": [
 462 |     "And while you can do things with it:"
 463 |    ]
 464 |   },
 465 |   {
 466 |    "cell_type": "code",
 467 |    "execution_count": 17,
 468 |    "metadata": {},
 469 |    "outputs": [
 470 |     {
 471 |      "data": {
 472 |       "text/html": [
 473 |        "<div>\n",
 474 |        "<style scoped>\n",
 475 |        "    .dataframe tbody tr th:only-of-type {\n",
 476 |        "        vertical-align: middle;\n",
 477 |        "    }\n",
 478 |        "\n",
 479 |        "    .dataframe tbody tr th {\n",
 480 |        "        vertical-align: top;\n",
 481 |        "    }\n",
 482 |        "\n",
 483 |        "    .dataframe thead tr th {\n",
 484 |        "        text-align: left;\n",
 485 |        "    }\n",
 486 |        "\n",
 487 |        "    .dataframe thead tr:last-of-type th {\n",
 488 |        "        text-align: right;\n",
 489 |        "    }\n",
 490 |        "</style>\n",
 491 |        "<table border=\"1\" class=\"dataframe\">\n",
 492 |        "  <thead>\n",
 493 |        "    <tr>\n",
 494 |        "      <th></th>\n",
 495 |        "      <th>size</th>\n",
 496 |        "    </tr>\n",
 497 |        "    <tr>\n",
 498 |        "      <th>sex</th>\n",
 499 |        "      <th>Male</th>\n",
 500 |        "    </tr>\n",
 501 |        "    <tr>\n",
 502 |        "      <th>day</th>\n",
 503 |        "      <th></th>\n",
 504 |        "    </tr>\n",
 505 |        "  </thead>\n",
 506 |        "  <tbody>\n",
 507 |        "    <tr>\n",
 508 |        "      <th>Thur</th>\n",
 509 |        "      <td>73</td>\n",
 510 |        "    </tr>\n",
 511 |        "    <tr>\n",
 512 |        "      <th>Fri</th>\n",
 513 |        "      <td>21</td>\n",
 514 |        "    </tr>\n",
 515 |        "    <tr>\n",
 516 |        "      <th>Sat</th>\n",
 517 |        "      <td>156</td>\n",
 518 |        "    </tr>\n",
 519 |        "    <tr>\n",
 520 |        "      <th>Sun</th>\n",
 521 |        "      <td>163</td>\n",
 522 |        "    </tr>\n",
 523 |        "  </tbody>\n",
 524 |        "</table>\n",
 525 |        "</div>"
 526 |       ],
 527 |       "text/plain": [
 528 |        "     size\n",
 529 |        "sex  Male\n",
 530 |        "day      \n",
 531 |        "Thur   73\n",
 532 |        "Fri    21\n",
 533 |        "Sat   156\n",
 534 |        "Sun   163"
 535 |       ]
 536 |      },
 537 |      "execution_count": 17,
 538 |      "metadata": {},
 539 |      "output_type": "execute_result"
 540 |     }
 541 |    ],
 542 |    "source": [
 543 |     "tips_us[[('size', 'Male')]]"
 544 |    ]
 545 |   },
 546 |   {
 547 |    "cell_type": "markdown",
 548 |    "metadata": {},
 549 |    "source": [
 550 |     "I find it a bit annoying to memorize a separate set of syntax, so I always convert it with a line of code like so (ps I wish this were in pandas core):"
 551 |    ]
 552 |   },
 553 |   {
 554 |    "cell_type": "code",
 555 |    "execution_count": 18,
 556 |    "metadata": {},
 557 |    "outputs": [],
 558 |    "source": [
 559 |     "tips_us_copy = tips_us.copy()\n",
 560 |     "\n",
 561 |     "tips_us_copy.columns = ['__'.join(col).strip() for col in tips_us.columns.values]"
 562 |    ]
 563 |   },
 564 |   {
 565 |    "cell_type": "code",
 566 |    "execution_count": 19,
 567 |    "metadata": {},
 568 |    "outputs": [
 569 |     {
 570 |      "data": {
 571 |       "text/html": [
 572 |        "<div>\n",
 573 |        "<style scoped>\n",
 574 |        "    .dataframe tbody tr th:only-of-type {\n",
 575 |        "        vertical-align: middle;\n",
 576 |        "    }\n",
 577 |        "\n",
 578 |        "    .dataframe tbody tr th {\n",
 579 |        "        vertical-align: top;\n",
 580 |        "    }\n",
 581 |        "\n",
 582 |        "    .dataframe thead th {\n",
 583 |        "        text-align: right;\n",
 584 |        "    }\n",
 585 |        "</style>\n",
 586 |        "<table border=\"1\" class=\"dataframe\">\n",
 587 |        "  <thead>\n",
 588 |        "    <tr style=\"text-align: right;\">\n",
 589 |        "      <th></th>\n",
 590 |        "      <th>size__Male</th>\n",
 591 |        "      <th>size__Female</th>\n",
 592 |        "    </tr>\n",
 593 |        "    <tr>\n",
 594 |        "      <th>day</th>\n",
 595 |        "      <th></th>\n",
 596 |        "      <th></th>\n",
 597 |        "    </tr>\n",
 598 |        "  </thead>\n",
 599 |        "  <tbody>\n",
 600 |        "    <tr>\n",
 601 |        "      <th>Thur</th>\n",
 602 |        "      <td>73</td>\n",
 603 |        "      <td>79</td>\n",
 604 |        "    </tr>\n",
 605 |        "    <tr>\n",
 606 |        "      <th>Fri</th>\n",
 607 |        "      <td>21</td>\n",
 608 |        "      <td>19</td>\n",
 609 |        "    </tr>\n",
 610 |        "    <tr>\n",
 611 |        "      <th>Sat</th>\n",
 612 |        "      <td>156</td>\n",
 613 |        "      <td>63</td>\n",
 614 |        "    </tr>\n",
 615 |        "    <tr>\n",
 616 |        "      <th>Sun</th>\n",
 617 |        "      <td>163</td>\n",
 618 |        "      <td>53</td>\n",
 619 |        "    </tr>\n",
 620 |        "  </tbody>\n",
 621 |        "</table>\n",
 622 |        "</div>"
 623 |       ],
 624 |       "text/plain": [
 625 |        "      size__Male  size__Female\n",
 626 |        "day                           \n",
 627 |        "Thur          73            79\n",
 628 |        "Fri           21            19\n",
 629 |        "Sat          156            63\n",
 630 |        "Sun          163            53"
 631 |       ]
 632 |      },
 633 |      "execution_count": 19,
 634 |      "metadata": {},
 635 |      "output_type": "execute_result"
 636 |     }
 637 |    ],
 638 |    "source": [
 639 |     "tips_us_copy"
 640 |    ]
 641 |   },
 642 |   {
 643 |    "cell_type": "markdown",
 644 |    "metadata": {},
 645 |    "source": [
 646 |     "You can of course repeat that operation as many times as you need to get the desired granularity of columns. \n",
 647 |     "\n",
 648 |     "But now let's try out the reverse operation. This is useful if somebody gives you data in pivot form.\n",
 649 |     "\n",
 650 |     "## Stack"
 651 |    ]
 652 |   },
 653 |   {
 654 |    "cell_type": "code",
 655 |    "execution_count": 20,
 656 |    "metadata": {},
 657 |    "outputs": [
 658 |     {
 659 |      "data": {
 660 |       "text/html": [
 661 |        "<div>\n",
 662 |        "<style scoped>\n",
 663 |        "    .dataframe tbody tr th:only-of-type {\n",
 664 |        "        vertical-align: middle;\n",
 665 |        "    }\n",
 666 |        "\n",
 667 |        "    .dataframe tbody tr th {\n",
 668 |        "        vertical-align: top;\n",
 669 |        "    }\n",
 670 |        "\n",
 671 |        "    .dataframe thead th {\n",
 672 |        "        text-align: right;\n",
 673 |        "    }\n",
 674 |        "</style>\n",
 675 |        "<table border=\"1\" class=\"dataframe\">\n",
 676 |        "  <thead>\n",
 677 |        "    <tr style=\"text-align: right;\">\n",
 678 |        "      <th></th>\n",
 679 |        "      <th></th>\n",
 680 |        "      <th>size</th>\n",
 681 |        "    </tr>\n",
 682 |        "    <tr>\n",
 683 |        "      <th>day</th>\n",
 684 |        "      <th>sex</th>\n",
 685 |        "      <th></th>\n",
 686 |        "    </tr>\n",
 687 |        "  </thead>\n",
 688 |        "  <tbody>\n",
 689 |        "    <tr>\n",
 690 |        "      <th rowspan=\"2\" valign=\"top\">Thur</th>\n",
 691 |        "      <th>Male</th>\n",
 692 |        "      <td>73</td>\n",
 693 |        "    </tr>\n",
 694 |        "    <tr>\n",
 695 |        "      <th>Female</th>\n",
 696 |        "      <td>79</td>\n",
 697 |        "    </tr>\n",
 698 |        "    <tr>\n",
 699 |        "      <th rowspan=\"2\" valign=\"top\">Fri</th>\n",
 700 |        "      <th>Male</th>\n",
 701 |        "      <td>21</td>\n",
 702 |        "    </tr>\n",
 703 |        "    <tr>\n",
 704 |        "      <th>Female</th>\n",
 705 |        "      <td>19</td>\n",
 706 |        "    </tr>\n",
 707 |        "    <tr>\n",
 708 |        "      <th rowspan=\"2\" valign=\"top\">Sat</th>\n",
 709 |        "      <th>Male</th>\n",
 710 |        "      <td>156</td>\n",
 711 |        "    </tr>\n",
 712 |        "    <tr>\n",
 713 |        "      <th>Female</th>\n",
 714 |        "      <td>63</td>\n",
 715 |        "    </tr>\n",
 716 |        "    <tr>\n",
 717 |        "      <th rowspan=\"2\" valign=\"top\">Sun</th>\n",
 718 |        "      <th>Male</th>\n",
 719 |        "      <td>163</td>\n",
 720 |        "    </tr>\n",
 721 |        "    <tr>\n",
 722 |        "      <th>Female</th>\n",
 723 |        "      <td>53</td>\n",
 724 |        "    </tr>\n",
 725 |        "  </tbody>\n",
 726 |        "</table>\n",
 727 |        "</div>"
 728 |       ],
 729 |       "text/plain": [
 730 |        "             size\n",
 731 |        "day  sex         \n",
 732 |        "Thur Male      73\n",
 733 |        "     Female    79\n",
 734 |        "Fri  Male      21\n",
 735 |        "     Female    19\n",
 736 |        "Sat  Male     156\n",
 737 |        "     Female    63\n",
 738 |        "Sun  Male     163\n",
 739 |        "     Female    53"
 740 |       ]
 741 |      },
 742 |      "execution_count": 20,
 743 |      "metadata": {},
 744 |      "output_type": "execute_result"
 745 |     }
 746 |    ],
 747 |    "source": [
 748 |     "tips_us.stack()"
 749 |    ]
 750 |   },
 751 |   {
 752 |    "cell_type": "markdown",
 753 |    "metadata": {},
 754 |    "source": [
 755 |     "Again you can unstack either column index:"
 756 |    ]
 757 |   },
 758 |   {
 759 |    "cell_type": "code",
 760 |    "execution_count": 22,
 761 |    "metadata": {},
 762 |    "outputs": [
 763 |     {
 764 |      "data": {
 765 |       "text/html": [
 766 |        "<div>\n",
 767 |        "<style scoped>\n",
 768 |        "    .dataframe tbody tr th:only-of-type {\n",
 769 |        "        vertical-align: middle;\n",
 770 |        "    }\n",
 771 |        "\n",
 772 |        "    .dataframe tbody tr th {\n",
 773 |        "        vertical-align: top;\n",
 774 |        "    }\n",
 775 |        "\n",
 776 |        "    .dataframe thead th {\n",
 777 |        "        text-align: right;\n",
 778 |        "    }\n",
 779 |        "</style>\n",
 780 |        "<table border=\"1\" class=\"dataframe\">\n",
 781 |        "  <thead>\n",
 782 |        "    <tr style=\"text-align: right;\">\n",
 783 |        "      <th></th>\n",
 784 |        "      <th>sex</th>\n",
 785 |        "      <th>Male</th>\n",
 786 |        "      <th>Female</th>\n",
 787 |        "    </tr>\n",
 788 |        "    <tr>\n",
 789 |        "      <th>day</th>\n",
 790 |        "      <th></th>\n",
 791 |        "      <th></th>\n",
 792 |        "      <th></th>\n",
 793 |        "    </tr>\n",
 794 |        "  </thead>\n",
 795 |        "  <tbody>\n",
 796 |        "    <tr>\n",
 797 |        "      <th>Thur</th>\n",
 798 |        "      <th>size</th>\n",
 799 |        "      <td>73</td>\n",
 800 |        "      <td>79</td>\n",
 801 |        "    </tr>\n",
 802 |        "    <tr>\n",
 803 |        "      <th>Fri</th>\n",
 804 |        "      <th>size</th>\n",
 805 |        "      <td>21</td>\n",
 806 |        "      <td>19</td>\n",
 807 |        "    </tr>\n",
 808 |        "    <tr>\n",
 809 |        "      <th>Sat</th>\n",
 810 |        "      <th>size</th>\n",
 811 |        "      <td>156</td>\n",
 812 |        "      <td>63</td>\n",
 813 |        "    </tr>\n",
 814 |        "    <tr>\n",
 815 |        "      <th>Sun</th>\n",
 816 |        "      <th>size</th>\n",
 817 |        "      <td>163</td>\n",
 818 |        "      <td>53</td>\n",
 819 |        "    </tr>\n",
 820 |        "  </tbody>\n",
 821 |        "</table>\n",
 822 |        "</div>"
 823 |       ],
 824 |       "text/plain": [
 825 |        "sex        Male  Female\n",
 826 |        "day                    \n",
 827 |        "Thur size    73      79\n",
 828 |        "Fri  size    21      19\n",
 829 |        "Sat  size   156      63\n",
 830 |        "Sun  size   163      53"
 831 |       ]
 832 |      },
 833 |      "execution_count": 22,
 834 |      "metadata": {},
 835 |      "output_type": "execute_result"
 836 |     }
 837 |    ],
 838 |    "source": [
 839 |     "tips_us.stack(0)"
 840 |    ]
 841 |   },
 842 |   {
 843 |    "cell_type": "markdown",
 844 |    "metadata": {},
 845 |    "source": [
 846 |     "## What about Melting and Pivoting?\n",
 847 |     "\n",
 848 |     "That is about it when it comes to stacking and unstacking. Anything you can do with melting and pivoting can be done with stacking and unstacking. Let's do a single example from pandas:"
 849 |    ]
 850 |   },
 851 |   {
 852 |    "cell_type": "code",
 853 |    "execution_count": 26,
 854 |    "metadata": {},
 855 |    "outputs": [
 856 |     {
 857 |      "data": {
 858 |       "text/html": [
 859 |        "<div>\n",
 860 |        "<style scoped>\n",
 861 |        "    .dataframe tbody tr th:only-of-type {\n",
 862 |        "        vertical-align: middle;\n",
 863 |        "    }\n",
 864 |        "\n",
 865 |        "    .dataframe tbody tr th {\n",
 866 |        "        vertical-align: top;\n",
 867 |        "    }\n",
 868 |        "\n",
 869 |        "    .dataframe thead th {\n",
 870 |        "        text-align: right;\n",
 871 |        "    }\n",
 872 |        "</style>\n",
 873 |        "<table border=\"1\" class=\"dataframe\">\n",
 874 |        "  <thead>\n",
 875 |        "    <tr style=\"text-align: right;\">\n",
 876 |        "      <th></th>\n",
 877 |        "      <th>first</th>\n",
 878 |        "      <th>last</th>\n",
 879 |        "      <th>height</th>\n",
 880 |        "      <th>weight</th>\n",
 881 |        "    </tr>\n",
 882 |        "  </thead>\n",
 883 |        "  <tbody>\n",
 884 |        "    <tr>\n",
 885 |        "      <th>0</th>\n",
 886 |        "      <td>John</td>\n",
 887 |        "      <td>Doe</td>\n",
 888 |        "      <td>5.5</td>\n",
 889 |        "      <td>130</td>\n",
 890 |        "    </tr>\n",
 891 |        "    <tr>\n",
 892 |        "      <th>1</th>\n",
 893 |        "      <td>Mary</td>\n",
 894 |        "      <td>Bo</td>\n",
 895 |        "      <td>6.0</td>\n",
 896 |        "      <td>150</td>\n",
 897 |        "    </tr>\n",
 898 |        "  </tbody>\n",
 899 |        "</table>\n",
 900 |        "</div>"
 901 |       ],
 902 |       "text/plain": [
 903 |        "  first last  height  weight\n",
 904 |        "0  John  Doe     5.5     130\n",
 905 |        "1  Mary   Bo     6.0     150"
 906 |       ]
 907 |      },
 908 |      "execution_count": 26,
 909 |      "metadata": {},
 910 |      "output_type": "execute_result"
 911 |     }
 912 |    ],
 913 |    "source": [
 914 |     "cheese = pd.DataFrame({'first': ['John', 'Mary'],\n",
 915 |     "                        'last': ['Doe', 'Bo'],\n",
 916 |     "                        'height': [5.5, 6.0],\n",
 917 |     "                        'weight': [130, 150]})\n",
 918 |     "cheese"
 919 |    ]
 920 |   },
 921 |   {
 922 |    "cell_type": "code",
 923 |    "execution_count": 27,
 924 |    "metadata": {},
 925 |    "outputs": [
 926 |     {
 927 |      "data": {
 928 |       "text/html": [
 929 |        "<div>\n",
 930 |        "<style scoped>\n",
 931 |        "    .dataframe tbody tr th:only-of-type {\n",
 932 |        "        vertical-align: middle;\n",
 933 |        "    }\n",
 934 |        "\n",
 935 |        "    .dataframe tbody tr th {\n",
 936 |        "        vertical-align: top;\n",
 937 |        "    }\n",
 938 |        "\n",
 939 |        "    .dataframe thead th {\n",
 940 |        "        text-align: right;\n",
 941 |        "    }\n",
 942 |        "</style>\n",
 943 |        "<table border=\"1\" class=\"dataframe\">\n",
 944 |        "  <thead>\n",
 945 |        "    <tr style=\"text-align: right;\">\n",
 946 |        "      <th></th>\n",
 947 |        "      <th>first</th>\n",
 948 |        "      <th>last</th>\n",
 949 |        "      <th>variable</th>\n",
 950 |        "      <th>value</th>\n",
 951 |        "    </tr>\n",
 952 |        "  </thead>\n",
 953 |        "  <tbody>\n",
 954 |        "    <tr>\n",
 955 |        "      <th>0</th>\n",
 956 |        "      <td>John</td>\n",
 957 |        "      <td>Doe</td>\n",
 958 |        "      <td>height</td>\n",
 959 |        "      <td>5.5</td>\n",
 960 |        "    </tr>\n",
 961 |        "    <tr>\n",
 962 |        "      <th>1</th>\n",
 963 |        "      <td>Mary</td>\n",
 964 |        "      <td>Bo</td>\n",
 965 |        "      <td>height</td>\n",
 966 |        "      <td>6.0</td>\n",
 967 |        "    </tr>\n",
 968 |        "    <tr>\n",
 969 |        "      <th>2</th>\n",
 970 |        "      <td>John</td>\n",
 971 |        "      <td>Doe</td>\n",
 972 |        "      <td>weight</td>\n",
 973 |        "      <td>130.0</td>\n",
 974 |        "    </tr>\n",
 975 |        "    <tr>\n",
 976 |        "      <th>3</th>\n",
 977 |        "      <td>Mary</td>\n",
 978 |        "      <td>Bo</td>\n",
 979 |        "      <td>weight</td>\n",
 980 |        "      <td>150.0</td>\n",
 981 |        "    </tr>\n",
 982 |        "  </tbody>\n",
 983 |        "</table>\n",
 984 |        "</div>"
 985 |       ],
 986 |       "text/plain": [
 987 |        "  first last variable  value\n",
 988 |        "0  John  Doe   height    5.5\n",
 989 |        "1  Mary   Bo   height    6.0\n",
 990 |        "2  John  Doe   weight  130.0\n",
 991 |        "3  Mary   Bo   weight  150.0"
 992 |       ]
 993 |      },
 994 |      "execution_count": 27,
 995 |      "metadata": {},
 996 |      "output_type": "execute_result"
 997 |     }
 998 |    ],
 999 |    "source": [
1000 |     "# melt does stacking in one operation\n",
1001 |     "cheese.melt(id_vars=['first', 'last'])"
1002 |    ]
1003 |   },
1004 |   {
1005 |    "cell_type": "markdown",
1006 |    "metadata": {},
1007 |    "source": [
1008 |     "To do this with stacking we just need to do it in two steps:"
1009 |    ]
1010 |   },
1011 |   {
1012 |    "cell_type": "code",
1013 |    "execution_count": 28,
1014 |    "metadata": {},
1015 |    "outputs": [
1016 |     {
1017 |      "data": {
1018 |       "text/html": [
1019 |        "<div>\n",
1020 |        "<style scoped>\n",
1021 |        "    .dataframe tbody tr th:only-of-type {\n",
1022 |        "        vertical-align: middle;\n",
1023 |        "    }\n",
1024 |        "\n",
1025 |        "    .dataframe tbody tr th {\n",
1026 |        "        vertical-align: top;\n",
1027 |        "    }\n",
1028 |        "\n",
1029 |        "    .dataframe thead th {\n",
1030 |        "        text-align: right;\n",
1031 |        "    }\n",
1032 |        "</style>\n",
1033 |        "<table border=\"1\" class=\"dataframe\">\n",
1034 |        "  <thead>\n",
1035 |        "    <tr style=\"text-align: right;\">\n",
1036 |        "      <th></th>\n",
1037 |        "      <th>first</th>\n",
1038 |        "      <th>last</th>\n",
1039 |        "      <th>level_2</th>\n",
1040 |        "      <th>0</th>\n",
1041 |        "    </tr>\n",
1042 |        "  </thead>\n",
1043 |        "  <tbody>\n",
1044 |        "    <tr>\n",
1045 |        "      <th>0</th>\n",
1046 |        "      <td>John</td>\n",
1047 |        "      <td>Doe</td>\n",
1048 |        "      <td>height</td>\n",
1049 |        "      <td>5.5</td>\n",
1050 |        "    </tr>\n",
1051 |        "    <tr>\n",
1052 |        "      <th>1</th>\n",
1053 |        "      <td>John</td>\n",
1054 |        "      <td>Doe</td>\n",
1055 |        "      <td>weight</td>\n",
1056 |        "      <td>130.0</td>\n",
1057 |        "    </tr>\n",
1058 |        "    <tr>\n",
1059 |        "      <th>2</th>\n",
1060 |        "      <td>Mary</td>\n",
1061 |        "      <td>Bo</td>\n",
1062 |        "      <td>height</td>\n",
1063 |        "      <td>6.0</td>\n",
1064 |        "    </tr>\n",
1065 |        "    <tr>\n",
1066 |        "      <th>3</th>\n",
1067 |        "      <td>Mary</td>\n",
1068 |        "      <td>Bo</td>\n",
1069 |        "      <td>weight</td>\n",
1070 |        "      <td>150.0</td>\n",
1071 |        "    </tr>\n",
1072 |        "  </tbody>\n",
1073 |        "</table>\n",
1074 |        "</div>"
1075 |       ],
1076 |       "text/plain": [
1077 |        "  first last level_2      0\n",
1078 |        "0  John  Doe  height    5.5\n",
1079 |        "1  John  Doe  weight  130.0\n",
1080 |        "2  Mary   Bo  height    6.0\n",
1081 |        "3  Mary   Bo  weight  150.0"
1082 |       ]
1083 |      },
1084 |      "execution_count": 28,
1085 |      "metadata": {},
1086 |      "output_type": "execute_result"
1087 |     }
1088 |    ],
1089 |    "source": [
1090 |     "cheese.set_index(['first', 'last'], inplace=True)\n",
1091 |     "cheese.stack().reset_index()"
1092 |    ]
1093 |   },
1094 |   {
1095 |    "cell_type": "markdown",
1096 |    "metadata": {},
1097 |    "source": [
1098 |     "I have used melt and pivot before, but after getting a better understanding of stack and unstack I have found them more versitile and stable than the former. So why learn both!"
1099 |    ]
1100 |   },
1101 |   {
1102 |    "cell_type": "markdown",
1103 |    "metadata": {},
1104 |    "source": [
1105 |     "## Dummy Variables\n",
1106 |     "\n",
1107 |     "There is one final way to transform the values in a column into headers, and this is called making dummy vars (well not quite, if you are interested in more ways to do it you can check out my [YT video](https://www.youtube.com/watch?v=WRxHfnl-Pcs&t=2s)).\n",
1108 |     "\n",
1109 |     "Making a dummy variable will take all the `k` distinct values in one column and make `k` columns out of them. \n",
1110 |     "\n",
1111 |     "Let's look at an example below:"
1112 |    ]
1113 |   },
1114 |   {
1115 |    "cell_type": "code",
1116 |    "execution_count": 31,
1117 |    "metadata": {},
1118 |    "outputs": [
1119 |     {
1120 |      "data": {
1121 |       "text/html": [
1122 |        "<div>\n",
1123 |        "<style scoped>\n",
1124 |        "    .dataframe tbody tr th:only-of-type {\n",
1125 |        "        vertical-align: middle;\n",
1126 |        "    }\n",
1127 |        "\n",
1128 |        "    .dataframe tbody tr th {\n",
1129 |        "        vertical-align: top;\n",
1130 |        "    }\n",
1131 |        "\n",
1132 |        "    .dataframe thead th {\n",
1133 |        "        text-align: right;\n",
1134 |        "    }\n",
1135 |        "</style>\n",
1136 |        "<table border=\"1\" class=\"dataframe\">\n",
1137 |        "  <thead>\n",
1138 |        "    <tr style=\"text-align: right;\">\n",
1139 |        "      <th></th>\n",
1140 |        "      <th>total_bill</th>\n",
1141 |        "      <th>tip</th>\n",
1142 |        "      <th>sex</th>\n",
1143 |        "      <th>smoker</th>\n",
1144 |        "      <th>day</th>\n",
1145 |        "      <th>time</th>\n",
1146 |        "      <th>size</th>\n",
1147 |        "    </tr>\n",
1148 |        "  </thead>\n",
1149 |        "  <tbody>\n",
1150 |        "    <tr>\n",
1151 |        "      <th>0</th>\n",
1152 |        "      <td>16.99</td>\n",
1153 |        "      <td>1.01</td>\n",
1154 |        "      <td>Female</td>\n",
1155 |        "      <td>No</td>\n",
1156 |        "      <td>Sun</td>\n",
1157 |        "      <td>Dinner</td>\n",
1158 |        "      <td>2</td>\n",
1159 |        "    </tr>\n",
1160 |        "    <tr>\n",
1161 |        "      <th>1</th>\n",
1162 |        "      <td>10.34</td>\n",
1163 |        "      <td>1.66</td>\n",
1164 |        "      <td>Male</td>\n",
1165 |        "      <td>No</td>\n",
1166 |        "      <td>Sun</td>\n",
1167 |        "      <td>Dinner</td>\n",
1168 |        "      <td>3</td>\n",
1169 |        "    </tr>\n",
1170 |        "    <tr>\n",
1171 |        "      <th>2</th>\n",
1172 |        "      <td>21.01</td>\n",
1173 |        "      <td>3.50</td>\n",
1174 |        "      <td>Male</td>\n",
1175 |        "      <td>No</td>\n",
1176 |        "      <td>Sun</td>\n",
1177 |        "      <td>Dinner</td>\n",
1178 |        "      <td>3</td>\n",
1179 |        "    </tr>\n",
1180 |        "    <tr>\n",
1181 |        "      <th>3</th>\n",
1182 |        "      <td>23.68</td>\n",
1183 |        "      <td>3.31</td>\n",
1184 |        "      <td>Male</td>\n",
1185 |        "      <td>No</td>\n",
1186 |        "      <td>Sun</td>\n",
1187 |        "      <td>Dinner</td>\n",
1188 |        "      <td>2</td>\n",
1189 |        "    </tr>\n",
1190 |        "    <tr>\n",
1191 |        "      <th>4</th>\n",
1192 |        "      <td>24.59</td>\n",
1193 |        "      <td>3.61</td>\n",
1194 |        "      <td>Female</td>\n",
1195 |        "      <td>No</td>\n",
1196 |        "      <td>Sun</td>\n",
1197 |        "      <td>Dinner</td>\n",
1198 |        "      <td>4</td>\n",
1199 |        "    </tr>\n",
1200 |        "  </tbody>\n",
1201 |        "</table>\n",
1202 |        "</div>"
1203 |       ],
1204 |       "text/plain": [
1205 |        "   total_bill   tip     sex smoker  day    time  size\n",
1206 |        "0       16.99  1.01  Female     No  Sun  Dinner     2\n",
1207 |        "1       10.34  1.66    Male     No  Sun  Dinner     3\n",
1208 |        "2       21.01  3.50    Male     No  Sun  Dinner     3\n",
1209 |        "3       23.68  3.31    Male     No  Sun  Dinner     2\n",
1210 |        "4       24.59  3.61  Female     No  Sun  Dinner     4"
1211 |       ]
1212 |      },
1213 |      "execution_count": 31,
1214 |      "metadata": {},
1215 |      "output_type": "execute_result"
1216 |     }
1217 |    ],
1218 |    "source": [
1219 |     "tips.head()"
1220 |    ]
1221 |   },
1222 |   {
1223 |    "cell_type": "code",
1224 |    "execution_count": 30,
1225 |    "metadata": {},
1226 |    "outputs": [
1227 |     {
1228 |      "data": {
1229 |       "text/html": [
1230 |        "<div>\n",
1231 |        "<style scoped>\n",
1232 |        "    .dataframe tbody tr th:only-of-type {\n",
1233 |        "        vertical-align: middle;\n",
1234 |        "    }\n",
1235 |        "\n",
1236 |        "    .dataframe tbody tr th {\n",
1237 |        "        vertical-align: top;\n",
1238 |        "    }\n",
1239 |        "\n",
1240 |        "    .dataframe thead th {\n",
1241 |        "        text-align: right;\n",
1242 |        "    }\n",
1243 |        "</style>\n",
1244 |        "<table border=\"1\" class=\"dataframe\">\n",
1245 |        "  <thead>\n",
1246 |        "    <tr style=\"text-align: right;\">\n",
1247 |        "      <th></th>\n",
1248 |        "      <th>total_bill</th>\n",
1249 |        "      <th>tip</th>\n",
1250 |        "      <th>smoker</th>\n",
1251 |        "      <th>day</th>\n",
1252 |        "      <th>time</th>\n",
1253 |        "      <th>size</th>\n",
1254 |        "      <th>sex_Male</th>\n",
1255 |        "      <th>sex_Female</th>\n",
1256 |        "    </tr>\n",
1257 |        "  </thead>\n",
1258 |        "  <tbody>\n",
1259 |        "    <tr>\n",
1260 |        "      <th>0</th>\n",
1261 |        "      <td>16.99</td>\n",
1262 |        "      <td>1.01</td>\n",
1263 |        "      <td>No</td>\n",
1264 |        "      <td>Sun</td>\n",
1265 |        "      <td>Dinner</td>\n",
1266 |        "      <td>2</td>\n",
1267 |        "      <td>0</td>\n",
1268 |        "      <td>1</td>\n",
1269 |        "    </tr>\n",
1270 |        "    <tr>\n",
1271 |        "      <th>1</th>\n",
1272 |        "      <td>10.34</td>\n",
1273 |        "      <td>1.66</td>\n",
1274 |        "      <td>No</td>\n",
1275 |        "      <td>Sun</td>\n",
1276 |        "      <td>Dinner</td>\n",
1277 |        "      <td>3</td>\n",
1278 |        "      <td>1</td>\n",
1279 |        "      <td>0</td>\n",
1280 |        "    </tr>\n",
1281 |        "    <tr>\n",
1282 |        "      <th>2</th>\n",
1283 |        "      <td>21.01</td>\n",
1284 |        "      <td>3.50</td>\n",
1285 |        "      <td>No</td>\n",
1286 |        "      <td>Sun</td>\n",
1287 |        "      <td>Dinner</td>\n",
1288 |        "      <td>3</td>\n",
1289 |        "      <td>1</td>\n",
1290 |        "      <td>0</td>\n",
1291 |        "    </tr>\n",
1292 |        "    <tr>\n",
1293 |        "      <th>3</th>\n",
1294 |        "      <td>23.68</td>\n",
1295 |        "      <td>3.31</td>\n",
1296 |        "      <td>No</td>\n",
1297 |        "      <td>Sun</td>\n",
1298 |        "      <td>Dinner</td>\n",
1299 |        "      <td>2</td>\n",
1300 |        "      <td>1</td>\n",
1301 |        "      <td>0</td>\n",
1302 |        "    </tr>\n",
1303 |        "    <tr>\n",
1304 |        "      <th>4</th>\n",
1305 |        "      <td>24.59</td>\n",
1306 |        "      <td>3.61</td>\n",
1307 |        "      <td>No</td>\n",
1308 |        "      <td>Sun</td>\n",
1309 |        "      <td>Dinner</td>\n",
1310 |        "      <td>4</td>\n",
1311 |        "      <td>0</td>\n",
1312 |        "      <td>1</td>\n",
1313 |        "    </tr>\n",
1314 |        "  </tbody>\n",
1315 |        "</table>\n",
1316 |        "</div>"
1317 |       ],
1318 |       "text/plain": [
1319 |        "   total_bill   tip smoker  day    time  size  sex_Male  sex_Female\n",
1320 |        "0       16.99  1.01     No  Sun  Dinner     2         0           1\n",
1321 |        "1       10.34  1.66     No  Sun  Dinner     3         1           0\n",
1322 |        "2       21.01  3.50     No  Sun  Dinner     3         1           0\n",
1323 |        "3       23.68  3.31     No  Sun  Dinner     2         1           0\n",
1324 |        "4       24.59  3.61     No  Sun  Dinner     4         0           1"
1325 |       ]
1326 |      },
1327 |      "execution_count": 30,
1328 |      "metadata": {},
1329 |      "output_type": "execute_result"
1330 |     }
1331 |    ],
1332 |    "source": [
1333 |     "pd.get_dummies(tips.head(), columns=['sex'])"
1334 |    ]
1335 |   },
1336 |   {
1337 |    "cell_type": "markdown",
1338 |    "metadata": {},
1339 |    "source": [
1340 |     "Notice the sex column was split into the sex_Male and sex_Female column. When the sex is female the sex_Female is 1 and 0 otherwise. And similarly for the sex_Male column.\n",
1341 |     "\n",
1342 |     "This can be very useful for ML models and doing some types of analysis."
1343 |    ]
1344 |   },
1345 |   {
1346 |    "cell_type": "markdown",
1347 |    "metadata": {},
1348 |    "source": [
1349 |     "## Conclusion\n",
1350 |     "\n",
1351 |     "These three ways to transform rows to columns and back again have served me quite well, and I'd be surprised if you'd need anything more than these. \n",
1352 |     "\n",
1353 |     "They are pretty intuitive, so you might not need to do too much practice. I actually don't know a good exercise for these guys as well - so if somebody has a good one they know of please send it over. "
1354 |    ]
1355 |   },
1356 |   {
1357 |    "cell_type": "code",
1358 |    "execution_count": null,
1359 |    "metadata": {},
1360 |    "outputs": [],
1361 |    "source": []
1362 |   },
1363 |   {
1364 |    "cell_type": "code",
1365 |    "execution_count": null,
1366 |    "metadata": {},
1367 |    "outputs": [],
1368 |    "source": []
1369 |   }
1370 |  ],
1371 |  "metadata": {
1372 |   "kernelspec": {
1373 |    "display_name": "Python 3",
1374 |    "language": "python",
1375 |    "name": "python3"
1376 |   },
1377 |   "language_info": {
1378 |    "codemirror_mode": {
1379 |     "name": "ipython",
1380 |     "version": 3
1381 |    },
1382 |    "file_extension": ".py",
1383 |    "mimetype": "text/x-python",
1384 |    "name": "python",
1385 |    "nbconvert_exporter": "python",
1386 |    "pygments_lexer": "ipython3",
1387 |    "version": "3.7.3"
1388 |   }
1389 |  },
1390 |  "nbformat": 4,
1391 |  "nbformat_minor": 2
1392 | }
1393 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | alabaster==0.7.12
 2 | appnope==0.1.0
 3 | attrs==19.1.0
 4 | Babel==2.6.0
 5 | backcall==0.1.0
 6 | bleach==3.1.0
 7 | certifi==2019.3.9
 8 | chardet==3.0.4
 9 | cycler==0.10.0
10 | decorator==4.4.0
11 | defusedxml==0.5.0
12 | docutils==0.14
13 | entrypoints==0.3
14 | idna==2.8
15 | imagesize==1.1.0
16 | ipykernel==5.1.0
17 | ipyparallel==6.2.3
18 | ipython==7.4.0
19 | ipython-genutils==0.2.0
20 | ipywidgets==7.4.2
21 | jedi==0.13.3
22 | Jinja2==2.10.1
23 | jsonschema==3.0.1
24 | jupyter-client==5.2.4
25 | jupyter-core==4.4.0
26 | kiwisolver==1.0.1
27 | MarkupSafe==1.1.1
28 | matplotlib==3.0.3
29 | mistune==0.8.4
30 | nbconvert==5.4.1
31 | nbformat==4.4.0
32 | nose==1.3.7
33 | notebook==5.7.8
34 | numpy==1.16.2
35 | packaging==19.0
36 | pandas==0.24.2
37 | pandocfilters==1.4.2
38 | parso==0.4.0
39 | pexpect==4.7.0
40 | pickleshare==0.7.5
41 | prometheus-client==0.6.0
42 | prompt-toolkit==2.0.9
43 | ptyprocess==0.6.0
44 | Pygments==2.3.1
45 | pyparsing==2.4.0
46 | pyrsistent==0.14.11
47 | python-dateutil==2.8.0
48 | pytz==2019.1
49 | pyzmq==18.0.1
50 | qtconsole==4.4.3
51 | requests==2.21.0
52 | scipy==1.2.1
53 | seaborn==0.9.0
54 | Send2Trash==1.5.0
55 | six==1.12.0
56 | snowballstemmer==1.2.1
57 | Sphinx==2.0.1
58 | sphinxcontrib-applehelp==1.0.1
59 | sphinxcontrib-devhelp==1.0.1
60 | sphinxcontrib-htmlhelp==1.0.2
61 | sphinxcontrib-jsmath==1.0.1
62 | sphinxcontrib-qthelp==1.0.2
63 | sphinxcontrib-serializinghtml==1.1.3
64 | terminado==0.8.2
65 | testpath==0.4.2
66 | tornado==6.0.2
67 | traitlets==4.3.2
68 | urllib3==1.24.2
69 | wcwidth==0.1.7
70 | webencodings==0.5.1
71 | widgetsnbextension==3.4.2
72 | 


--------------------------------------------------------------------------------