├── README.md
└── business-problem-with-customer-segmentation.ipynb


/README.md:
--------------------------------------------------------------------------------
 1 | # Customer-Segmentation-with-RFM-Analysis
 2 | 
 3 | 
 4 | ## Context
 5 | A real online retail transaction data set of two years.
 6 | 
 7 | 
 8 | ## Content
 9 | This Online Retail II data set contains all the transactions occurring for a UK-based and registered, non-store online retail between 01/12/2009 and 09/12/2011.The company mainly sells unique all-occasion gift-ware. Many customers of the company are wholesalers.
10 | 
11 | 
12 | ## Column Descriptors
13 | InvoiceNo: Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'c', it indicates a cancellation.
14 | 
15 | StockCode: Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product.
16 | 
17 | Description: Product (item) name. Nominal.
18 | 
19 | Quantity: The quantities of each product (item) per transaction. Numeric.
20 | 
21 | InvoiceDate: Invice date and time. Numeric. The day and time when a transaction was generated.
22 | 
23 | UnitPrice: Unit price. Numeric. Product price per unit in sterling (Â£).
24 | 
25 | CustomerID: Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer.
26 | 
27 | Country: Country name. Nominal. The name of the country where a customer resides.
28 | 
29 | 
30 | ## Acknowledgements
31 | Here you can find references about data set:
32 | https://archive.ics.uci.edu/ml/datasets/Online+Retail+II
33 | and
34 | You can find data set and example kernel on #Kaggle with this link:
35 | https://www.kaggle.com/mathchi/business-problem-with-customer-segmentation
36 | 
37 | 
38 | ## Relevant Papers:
39 | Chen, D. Sain, S.L., and Guo, K. (2012), Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining, Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197-208. doi: [Web Link].
40 | 
41 | Chen, D., Guo, K. and Ubakanma, G. (2015), Predicting customer profitability over time based on RFM time series, International Journal of Business Forecasting and Marketing Intelligence, Vol. 2, No. 1, pp.1-18. doi: [Web Link].
42 | 
43 | Chen, D., Guo, K., and Li, Bo (2019), Predicting Customer Profitability Dynamically over Time: An Experimental Comparative Study, 24th Iberoamerican Congress on Pattern Recognition (CIARP 2019), Havana, Cuba, 28-31 Oct, 2019.
44 | 
45 | Laha Ale, Ning Zhang, Huici Wu, Dajiang Chen, and Tao Han, Online Proactive Caching in Mobile Edge Computing Using Bidirectional Deep Recurrent Neural Network, IEEE Internet of Things Journal, Vol. 6, Issue 3, pp. 5520-5530, 2019.
46 | 
47 | Rina Singh, Jeffrey A. Graves, Douglas A. Talbert, William Eberle, Prefix and Suffix Sequential Pattern Mining, Industrial Conference on Data Mining 2018: Advances in Data Mining. Applications and Theoretical Aspects, pp. 309-324. 2018.
48 | 
49 | 
50 | ## Inspiration
51 | This is Data Set Characteristics: Multivariate, Sequential, Time-Series, Text
52 | 


--------------------------------------------------------------------------------
/business-problem-with-customer-segmentation.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "execution_count": null,
   6 |    "metadata": {},
   7 |    "source": [
   8 |     "# Business Problem with Customer Segmentation\n",
   9 |     "\n",
  10 |     "\n",
  11 |     "An e-commerce company wants to segment its customers and determine marketing strategies according to these segments.\n",
  12 |     "\n",
  13 |     "For this purpose, we will define the behavior of customers and we will form groups according to clustering.\n",
  14 |     "\n",
  15 |     "In other words, we will take those who exhibit common behaviors into the same groups and we will try to develop sales and marketing techniques specific to these groups.\n",
  16 |     "\n",
  17 |     "\n",
  18 |     "\n",
  19 |     "### Data Set Story:\n",
  20 |     "\n",
  21 |     "https://archive.ics.uci.edu/ml/datasets/Online+Retail+II\n",
  22 |     "\n",
  23 |     "This Online Retail II data set contains all the transactions occurring for a UK-based and registered, non-store online retail between 01/12/2009 and 09/12/2011.\n",
  24 |     "\n",
  25 |     "The company mainly sells unique all-occasion gift-ware. \n",
  26 |     "\n",
  27 |     "Many customers of the company are wholesalers.\n",
  28 |     "\n",
  29 |     "\n",
  30 |     "\n",
  31 |     "\n",
  32 |     "### Attribute Information:\n",
  33 |     "\n",
  34 |     "- InvoiceNo: Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'c', it indicates a cancellation.\n",
  35 |     "- StockCode: Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product.\n",
  36 |     "- Description: Product (item) name. Nominal.\n",
  37 |     "- Quantity: The quantities of each product (item) per transaction. Numeric.\n",
  38 |     "- InvoiceDate: Invice date and time. Numeric. The day and time when a transaction was generated.\n",
  39 |     "- UnitPrice: Unit price. Numeric. Product price per unit in sterling (Â£).\n",
  40 |     "- CustomerID: Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer.\n",
  41 |     "- Country: Country name. Nominal. The name of the country where a customer resides.\n",
  42 |     "\n"
  43 |    ]
  44 |   },
  45 |   {
  46 |    "cell_type": "markdown",
  47 |    "execution_count": null,
  48 |    "metadata": {},
  49 |    "source": [
  50 |     "# Questions from data set\n",
  51 |     "\n",
  52 |     "\n",
  53 |     "All questions about 2009-2010 years\n",
  54 |     "\n",
  55 |     "1. What is the number of unique products?\n",
  56 |     "2. Which product do you have?\n",
  57 |     "3. Which product is the most ordered?\n",
  58 |     "4. How do we rank this output?\n",
  59 |     "5. How many invoices have been issued?\n",
  60 |     "6. How much money has been earned per invoice?\n",
  61 |     "7. Which are the most expensive products?\n",
  62 |     "8. How many orders came from which country?\n",
  63 |     "9. which country gained how much?\n",
  64 |     "10. which product is the most returned?\n",
  65 |     "11. What should we do for customer segmentation with RFM?\n",
  66 |     "12. Scoring for RFM.\n",
  67 |     "13. Finally, create an excel file named New Customer."
  68 |    ]
  69 |   },
  70 |   {
  71 |    "cell_type": "markdown",
  72 |    "execution_count": null,
  73 |    "metadata": {},
  74 |    "source": [
  75 |     "# Data Understanding "
  76 |    ]
  77 |   },
  78 |   {
  79 |    "cell_type": "code",
  80 |    "execution_count": 1,
  81 |    "metadata": {},
  82 |    "outputs": [],
  83 |    "source": [
  84 |     "import pandas as pd\n",
  85 |     "import numpy as np\n",
  86 |     "import seaborn as sns\n",
  87 |     "\n",
  88 |     "# to display all columns and rows:\n",
  89 |     "pd.set_option('display.max_columns', None); pd.set_option('display.max_rows', None);\n"
  90 |    ]
  91 |   },
  92 |   {
  93 |    "cell_type": "markdown",
  94 |    "execution_count": null,
  95 |    "metadata": {},
  96 |    "source": [
  97 |     "The number of numbers that will be shown after the comma. For variables such as 'price', the option below is replaced with 0 instead."
  98 |    ]
  99 |   },
 100 |   {
 101 |    "cell_type": "code",
 102 |    "execution_count": 2,
 103 |    "metadata": {},
 104 |    "outputs": [],
 105 |    "source": [
 106 |     "pd.set_option('display.float_format', lambda x: '%.0f' % x)\n",
 107 |     "import matplotlib.pyplot as plt"
 108 |    ]
 109 |   },
 110 |   {
 111 |    "cell_type": "code",
 112 |    "execution_count": 3,
 113 |    "metadata": {},
 114 |    "outputs": [],
 115 |    "source": [
 116 |     "df_2009_2010 = pd.read_excel(\"../input/online-retail-ii-data-set-from-ml-repository/online_retail_II.xlsx\", sheet_name = \"Year 2009-2010\")"
 117 |    ]
 118 |   },
 119 |   {
 120 |    "cell_type": "code",
 121 |    "execution_count": 4,
 122 |    "metadata": {},
 123 |    "outputs": [],
 124 |    "source": [
 125 |     "df = df_2009_2010.copy()"
 126 |    ]
 127 |   },
 128 |   {
 129 |    "cell_type": "markdown",
 130 |    "execution_count": null,
 131 |    "metadata": {},
 132 |    "source": [
 133 |     "Try to understand the data by using the functions that can be used as a first look at the data in the pandas section."
 134 |    ]
 135 |   },
 136 |   {
 137 |    "cell_type": "markdown",
 138 |    "execution_count": null,
 139 |    "metadata": {},
 140 |    "source": [
 141 |     "## 1. What is the number of unique products?"
 142 |    ]
 143 |   },
 144 |   {
 145 |    "cell_type": "code",
 146 |    "execution_count": 5,
 147 |    "metadata": {},
 148 |    "outputs": [
 149 |     {
 150 |      "data": {
 151 |       "text/plain": [
 152 |        "4681"
 153 |       ]
 154 |      },
 155 |      "execution_count": 5,
 156 |      "metadata": {},
 157 |      "output_type": "execute_result"
 158 |     }
 159 |    ],
 160 |    "source": [
 161 |     "df[\"Description\"].nunique()"
 162 |    ]
 163 |   },
 164 |   {
 165 |    "cell_type": "markdown",
 166 |    "execution_count": null,
 167 |    "metadata": {},
 168 |    "source": [
 169 |     "## 2. Which product do you have?"
 170 |    ]
 171 |   },
 172 |   {
 173 |    "cell_type": "code",
 174 |    "execution_count": 6,
 175 |    "metadata": {},
 176 |    "outputs": [
 177 |     {
 178 |      "data": {
 179 |       "text/plain": [
 180 |        "WHITE HANGING HEART T-LIGHT HOLDER    3549\n",
 181 |        "REGENCY CAKESTAND 3 TIER              2212\n",
 182 |        "STRAWBERRY CERAMIC TRINKET BOX        1843\n",
 183 |        "PACK OF 72 RETRO SPOT CAKE CASES      1466\n",
 184 |        "ASSORTED COLOUR BIRD ORNAMENT         1457\n",
 185 |        "Name: Description, dtype: int64"
 186 |       ]
 187 |      },
 188 |      "execution_count": 6,
 189 |      "metadata": {},
 190 |      "output_type": "execute_result"
 191 |     }
 192 |    ],
 193 |    "source": [
 194 |     "df[\"Description\"].value_counts().head()"
 195 |    ]
 196 |   },
 197 |   {
 198 |    "cell_type": "markdown",
 199 |    "execution_count": null,
 200 |    "metadata": {},
 201 |    "source": [
 202 |     "## 3. Which product is the most ordered?"
 203 |    ]
 204 |   },
 205 |   {
 206 |    "cell_type": "code",
 207 |    "execution_count": 7,
 208 |    "metadata": {},
 209 |    "outputs": [
 210 |     {
 211 |      "data": {
 212 |       "text/html": [
 213 |        "<div>\n",
 214 |        "<style scoped>\n",
 215 |        "    .dataframe tbody tr th:only-of-type {\n",
 216 |        "        vertical-align: middle;\n",
 217 |        "    }\n",
 218 |        "\n",
 219 |        "    .dataframe tbody tr th {\n",
 220 |        "        vertical-align: top;\n",
 221 |        "    }\n",
 222 |        "\n",
 223 |        "    .dataframe thead th {\n",
 224 |        "        text-align: right;\n",
 225 |        "    }\n",
 226 |        "</style>\n",
 227 |        "<table border=\"1\" class=\"dataframe\">\n",
 228 |        "  <thead>\n",
 229 |        "    <tr style=\"text-align: right;\">\n",
 230 |        "      <th></th>\n",
 231 |        "      <th>Quantity</th>\n",
 232 |        "    </tr>\n",
 233 |        "    <tr>\n",
 234 |        "      <th>Description</th>\n",
 235 |        "      <th></th>\n",
 236 |        "    </tr>\n",
 237 |        "  </thead>\n",
 238 |        "  <tbody>\n",
 239 |        "    <tr>\n",
 240 |        "      <th>21494</th>\n",
 241 |        "      <td>-720</td>\n",
 242 |        "    </tr>\n",
 243 |        "    <tr>\n",
 244 |        "      <th>22467</th>\n",
 245 |        "      <td>-2</td>\n",
 246 |        "    </tr>\n",
 247 |        "    <tr>\n",
 248 |        "      <th>22719</th>\n",
 249 |        "      <td>2</td>\n",
 250 |        "    </tr>\n",
 251 |        "    <tr>\n",
 252 |        "      <th>DOORMAT UNION JACK GUNS AND ROSES</th>\n",
 253 |        "      <td>179</td>\n",
 254 |        "    </tr>\n",
 255 |        "    <tr>\n",
 256 |        "      <th>3 STRIPEY MICE FELTCRAFT</th>\n",
 257 |        "      <td>690</td>\n",
 258 |        "    </tr>\n",
 259 |        "  </tbody>\n",
 260 |        "</table>\n",
 261 |        "</div>"
 262 |       ],
 263 |       "text/plain": [
 264 |        "                                     Quantity\n",
 265 |        "Description                                  \n",
 266 |        "21494                                    -720\n",
 267 |        "22467                                      -2\n",
 268 |        "22719                                       2\n",
 269 |        "  DOORMAT UNION JACK GUNS AND ROSES       179\n",
 270 |        " 3 STRIPEY MICE FELTCRAFT                 690"
 271 |       ]
 272 |      },
 273 |      "execution_count": 7,
 274 |      "metadata": {},
 275 |      "output_type": "execute_result"
 276 |     }
 277 |    ],
 278 |    "source": [
 279 |     "df.groupby(\"Description\").agg({\"Quantity\":\"sum\"}).head()"
 280 |    ]
 281 |   },
 282 |   {
 283 |    "cell_type": "markdown",
 284 |    "execution_count": null,
 285 |    "metadata": {},
 286 |    "source": [
 287 |     "## 4. How do we rank this output?"
 288 |    ]
 289 |   },
 290 |   {
 291 |    "cell_type": "code",
 292 |    "execution_count": 8,
 293 |    "metadata": {},
 294 |    "outputs": [
 295 |     {
 296 |      "data": {
 297 |       "text/html": [
 298 |        "<div>\n",
 299 |        "<style scoped>\n",
 300 |        "    .dataframe tbody tr th:only-of-type {\n",
 301 |        "        vertical-align: middle;\n",
 302 |        "    }\n",
 303 |        "\n",
 304 |        "    .dataframe tbody tr th {\n",
 305 |        "        vertical-align: top;\n",
 306 |        "    }\n",
 307 |        "\n",
 308 |        "    .dataframe thead th {\n",
 309 |        "        text-align: right;\n",
 310 |        "    }\n",
 311 |        "</style>\n",
 312 |        "<table border=\"1\" class=\"dataframe\">\n",
 313 |        "  <thead>\n",
 314 |        "    <tr style=\"text-align: right;\">\n",
 315 |        "      <th></th>\n",
 316 |        "      <th>Quantity</th>\n",
 317 |        "    </tr>\n",
 318 |        "    <tr>\n",
 319 |        "      <th>Description</th>\n",
 320 |        "      <th></th>\n",
 321 |        "    </tr>\n",
 322 |        "  </thead>\n",
 323 |        "  <tbody>\n",
 324 |        "    <tr>\n",
 325 |        "      <th>WHITE HANGING HEART T-LIGHT HOLDER</th>\n",
 326 |        "      <td>57733</td>\n",
 327 |        "    </tr>\n",
 328 |        "    <tr>\n",
 329 |        "      <th>WORLD WAR 2 GLIDERS ASSTD DESIGNS</th>\n",
 330 |        "      <td>54698</td>\n",
 331 |        "    </tr>\n",
 332 |        "    <tr>\n",
 333 |        "      <th>BROCADE RING PURSE</th>\n",
 334 |        "      <td>47647</td>\n",
 335 |        "    </tr>\n",
 336 |        "    <tr>\n",
 337 |        "      <th>PACK OF 72 RETRO SPOT CAKE CASES</th>\n",
 338 |        "      <td>46106</td>\n",
 339 |        "    </tr>\n",
 340 |        "    <tr>\n",
 341 |        "      <th>ASSORTED COLOUR BIRD ORNAMENT</th>\n",
 342 |        "      <td>44925</td>\n",
 343 |        "    </tr>\n",
 344 |        "  </tbody>\n",
 345 |        "</table>\n",
 346 |        "</div>"
 347 |       ],
 348 |       "text/plain": [
 349 |        "                                    Quantity\n",
 350 |        "Description                                 \n",
 351 |        "WHITE HANGING HEART T-LIGHT HOLDER     57733\n",
 352 |        "WORLD WAR 2 GLIDERS ASSTD DESIGNS      54698\n",
 353 |        "BROCADE RING PURSE                     47647\n",
 354 |        "PACK OF 72 RETRO SPOT CAKE CASES       46106\n",
 355 |        "ASSORTED COLOUR BIRD ORNAMENT          44925"
 356 |       ]
 357 |      },
 358 |      "execution_count": 8,
 359 |      "metadata": {},
 360 |      "output_type": "execute_result"
 361 |     }
 362 |    ],
 363 |    "source": [
 364 |     "df.groupby(\"Description\").agg({\"Quantity\":\"sum\"}).sort_values(\"Quantity\", ascending = False).head()"
 365 |    ]
 366 |   },
 367 |   {
 368 |    "cell_type": "markdown",
 369 |    "execution_count": null,
 370 |    "metadata": {},
 371 |    "source": [
 372 |     "## 5. How many invoices have been issued?"
 373 |    ]
 374 |   },
 375 |   {
 376 |    "cell_type": "code",
 377 |    "execution_count": 9,
 378 |    "metadata": {},
 379 |    "outputs": [
 380 |     {
 381 |      "data": {
 382 |       "text/plain": [
 383 |        "28816"
 384 |       ]
 385 |      },
 386 |      "execution_count": 9,
 387 |      "metadata": {},
 388 |      "output_type": "execute_result"
 389 |     }
 390 |    ],
 391 |    "source": [
 392 |     "df[\"Invoice\"].nunique()"
 393 |    ]
 394 |   },
 395 |   {
 396 |    "cell_type": "markdown",
 397 |    "execution_count": null,
 398 |    "metadata": {},
 399 |    "source": [
 400 |     "## 6. How much money has been earned per invoice?"
 401 |    ]
 402 |   },
 403 |   {
 404 |    "cell_type": "code",
 405 |    "execution_count": 10,
 406 |    "metadata": {},
 407 |    "outputs": [],
 408 |    "source": [
 409 |     "# it is necessary to create a new variable by multiplying two variables\n",
 410 |     "\n",
 411 |     "df[\"TotalPrice\"] = df[\"Quantity\"]*df[\"Price\"]"
 412 |    ]
 413 |   },
 414 |   {
 415 |    "cell_type": "code",
 416 |    "execution_count": 11,
 417 |    "metadata": {},
 418 |    "outputs": [
 419 |     {
 420 |      "data": {
 421 |       "text/html": [
 422 |        "<div>\n",
 423 |        "<style scoped>\n",
 424 |        "    .dataframe tbody tr th:only-of-type {\n",
 425 |        "        vertical-align: middle;\n",
 426 |        "    }\n",
 427 |        "\n",
 428 |        "    .dataframe tbody tr th {\n",
 429 |        "        vertical-align: top;\n",
 430 |        "    }\n",
 431 |        "\n",
 432 |        "    .dataframe thead th {\n",
 433 |        "        text-align: right;\n",
 434 |        "    }\n",
 435 |        "</style>\n",
 436 |        "<table border=\"1\" class=\"dataframe\">\n",
 437 |        "  <thead>\n",
 438 |        "    <tr style=\"text-align: right;\">\n",
 439 |        "      <th></th>\n",
 440 |        "      <th>Invoice</th>\n",
 441 |        "      <th>StockCode</th>\n",
 442 |        "      <th>Description</th>\n",
 443 |        "      <th>Quantity</th>\n",
 444 |        "      <th>InvoiceDate</th>\n",
 445 |        "      <th>Price</th>\n",
 446 |        "      <th>Customer ID</th>\n",
 447 |        "      <th>Country</th>\n",
 448 |        "      <th>TotalPrice</th>\n",
 449 |        "    </tr>\n",
 450 |        "  </thead>\n",
 451 |        "  <tbody>\n",
 452 |        "    <tr>\n",
 453 |        "      <th>0</th>\n",
 454 |        "      <td>489434</td>\n",
 455 |        "      <td>85048</td>\n",
 456 |        "      <td>15CM CHRISTMAS GLASS BALL 20 LIGHTS</td>\n",
 457 |        "      <td>12</td>\n",
 458 |        "      <td>2009-12-01 07:45:00</td>\n",
 459 |        "      <td>7</td>\n",
 460 |        "      <td>13085</td>\n",
 461 |        "      <td>United Kingdom</td>\n",
 462 |        "      <td>83</td>\n",
 463 |        "    </tr>\n",
 464 |        "    <tr>\n",
 465 |        "      <th>1</th>\n",
 466 |        "      <td>489434</td>\n",
 467 |        "      <td>79323P</td>\n",
 468 |        "      <td>PINK CHERRY LIGHTS</td>\n",
 469 |        "      <td>12</td>\n",
 470 |        "      <td>2009-12-01 07:45:00</td>\n",
 471 |        "      <td>7</td>\n",
 472 |        "      <td>13085</td>\n",
 473 |        "      <td>United Kingdom</td>\n",
 474 |        "      <td>81</td>\n",
 475 |        "    </tr>\n",
 476 |        "    <tr>\n",
 477 |        "      <th>2</th>\n",
 478 |        "      <td>489434</td>\n",
 479 |        "      <td>79323W</td>\n",
 480 |        "      <td>WHITE CHERRY LIGHTS</td>\n",
 481 |        "      <td>12</td>\n",
 482 |        "      <td>2009-12-01 07:45:00</td>\n",
 483 |        "      <td>7</td>\n",
 484 |        "      <td>13085</td>\n",
 485 |        "      <td>United Kingdom</td>\n",
 486 |        "      <td>81</td>\n",
 487 |        "    </tr>\n",
 488 |        "    <tr>\n",
 489 |        "      <th>3</th>\n",
 490 |        "      <td>489434</td>\n",
 491 |        "      <td>22041</td>\n",
 492 |        "      <td>RECORD FRAME 7\" SINGLE SIZE</td>\n",
 493 |        "      <td>48</td>\n",
 494 |        "      <td>2009-12-01 07:45:00</td>\n",
 495 |        "      <td>2</td>\n",
 496 |        "      <td>13085</td>\n",
 497 |        "      <td>United Kingdom</td>\n",
 498 |        "      <td>101</td>\n",
 499 |        "    </tr>\n",
 500 |        "    <tr>\n",
 501 |        "      <th>4</th>\n",
 502 |        "      <td>489434</td>\n",
 503 |        "      <td>21232</td>\n",
 504 |        "      <td>STRAWBERRY CERAMIC TRINKET BOX</td>\n",
 505 |        "      <td>24</td>\n",
 506 |        "      <td>2009-12-01 07:45:00</td>\n",
 507 |        "      <td>1</td>\n",
 508 |        "      <td>13085</td>\n",
 509 |        "      <td>United Kingdom</td>\n",
 510 |        "      <td>30</td>\n",
 511 |        "    </tr>\n",
 512 |        "  </tbody>\n",
 513 |        "</table>\n",
 514 |        "</div>"
 515 |       ],
 516 |       "text/plain": [
 517 |        "  Invoice StockCode                          Description  Quantity  \\\n",
 518 |        "0  489434     85048  15CM CHRISTMAS GLASS BALL 20 LIGHTS        12   \n",
 519 |        "1  489434    79323P                   PINK CHERRY LIGHTS        12   \n",
 520 |        "2  489434    79323W                  WHITE CHERRY LIGHTS        12   \n",
 521 |        "3  489434     22041         RECORD FRAME 7\" SINGLE SIZE         48   \n",
 522 |        "4  489434     21232       STRAWBERRY CERAMIC TRINKET BOX        24   \n",
 523 |        "\n",
 524 |        "          InvoiceDate  Price  Customer ID         Country  TotalPrice  \n",
 525 |        "0 2009-12-01 07:45:00      7        13085  United Kingdom          83  \n",
 526 |        "1 2009-12-01 07:45:00      7        13085  United Kingdom          81  \n",
 527 |        "2 2009-12-01 07:45:00      7        13085  United Kingdom          81  \n",
 528 |        "3 2009-12-01 07:45:00      2        13085  United Kingdom         101  \n",
 529 |        "4 2009-12-01 07:45:00      1        13085  United Kingdom          30  "
 530 |       ]
 531 |      },
 532 |      "execution_count": 11,
 533 |      "metadata": {},
 534 |      "output_type": "execute_result"
 535 |     }
 536 |    ],
 537 |    "source": [
 538 |     "df.head()"
 539 |    ]
 540 |   },
 541 |   {
 542 |    "cell_type": "code",
 543 |    "execution_count": 12,
 544 |    "metadata": {},
 545 |    "outputs": [
 546 |     {
 547 |      "data": {
 548 |       "text/html": [
 549 |        "<div>\n",
 550 |        "<style scoped>\n",
 551 |        "    .dataframe tbody tr th:only-of-type {\n",
 552 |        "        vertical-align: middle;\n",
 553 |        "    }\n",
 554 |        "\n",
 555 |        "    .dataframe tbody tr th {\n",
 556 |        "        vertical-align: top;\n",
 557 |        "    }\n",
 558 |        "\n",
 559 |        "    .dataframe thead th {\n",
 560 |        "        text-align: right;\n",
 561 |        "    }\n",
 562 |        "</style>\n",
 563 |        "<table border=\"1\" class=\"dataframe\">\n",
 564 |        "  <thead>\n",
 565 |        "    <tr style=\"text-align: right;\">\n",
 566 |        "      <th></th>\n",
 567 |        "      <th>TotalPrice</th>\n",
 568 |        "    </tr>\n",
 569 |        "    <tr>\n",
 570 |        "      <th>Invoice</th>\n",
 571 |        "      <th></th>\n",
 572 |        "    </tr>\n",
 573 |        "  </thead>\n",
 574 |        "  <tbody>\n",
 575 |        "    <tr>\n",
 576 |        "      <th>489434</th>\n",
 577 |        "      <td>505</td>\n",
 578 |        "    </tr>\n",
 579 |        "    <tr>\n",
 580 |        "      <th>489435</th>\n",
 581 |        "      <td>146</td>\n",
 582 |        "    </tr>\n",
 583 |        "    <tr>\n",
 584 |        "      <th>489436</th>\n",
 585 |        "      <td>630</td>\n",
 586 |        "    </tr>\n",
 587 |        "    <tr>\n",
 588 |        "      <th>489437</th>\n",
 589 |        "      <td>311</td>\n",
 590 |        "    </tr>\n",
 591 |        "    <tr>\n",
 592 |        "      <th>489438</th>\n",
 593 |        "      <td>2286</td>\n",
 594 |        "    </tr>\n",
 595 |        "  </tbody>\n",
 596 |        "</table>\n",
 597 |        "</div>"
 598 |       ],
 599 |       "text/plain": [
 600 |        "         TotalPrice\n",
 601 |        "Invoice            \n",
 602 |        "489434          505\n",
 603 |        "489435          146\n",
 604 |        "489436          630\n",
 605 |        "489437          311\n",
 606 |        "489438         2286"
 607 |       ]
 608 |      },
 609 |      "execution_count": 12,
 610 |      "metadata": {},
 611 |      "output_type": "execute_result"
 612 |     }
 613 |    ],
 614 |    "source": [
 615 |     "df.groupby(\"Invoice\").agg({\"TotalPrice\":\"sum\"}).head()"
 616 |    ]
 617 |   },
 618 |   {
 619 |    "cell_type": "markdown",
 620 |    "execution_count": null,
 621 |    "metadata": {},
 622 |    "source": [
 623 |     "## 7. Which are the most expensive products?"
 624 |    ]
 625 |   },
 626 |   {
 627 |    "cell_type": "code",
 628 |    "execution_count": 13,
 629 |    "metadata": {},
 630 |    "outputs": [
 631 |     {
 632 |      "data": {
 633 |       "text/html": [
 634 |        "<div>\n",
 635 |        "<style scoped>\n",
 636 |        "    .dataframe tbody tr th:only-of-type {\n",
 637 |        "        vertical-align: middle;\n",
 638 |        "    }\n",
 639 |        "\n",
 640 |        "    .dataframe tbody tr th {\n",
 641 |        "        vertical-align: top;\n",
 642 |        "    }\n",
 643 |        "\n",
 644 |        "    .dataframe thead th {\n",
 645 |        "        text-align: right;\n",
 646 |        "    }\n",
 647 |        "</style>\n",
 648 |        "<table border=\"1\" class=\"dataframe\">\n",
 649 |        "  <thead>\n",
 650 |        "    <tr style=\"text-align: right;\">\n",
 651 |        "      <th></th>\n",
 652 |        "      <th>Invoice</th>\n",
 653 |        "      <th>StockCode</th>\n",
 654 |        "      <th>Description</th>\n",
 655 |        "      <th>Quantity</th>\n",
 656 |        "      <th>InvoiceDate</th>\n",
 657 |        "      <th>Price</th>\n",
 658 |        "      <th>Customer ID</th>\n",
 659 |        "      <th>Country</th>\n",
 660 |        "      <th>TotalPrice</th>\n",
 661 |        "    </tr>\n",
 662 |        "  </thead>\n",
 663 |        "  <tbody>\n",
 664 |        "    <tr>\n",
 665 |        "      <th>241824</th>\n",
 666 |        "      <td>C512770</td>\n",
 667 |        "      <td>M</td>\n",
 668 |        "      <td>Manual</td>\n",
 669 |        "      <td>-1</td>\n",
 670 |        "      <td>2010-06-17 16:52:00</td>\n",
 671 |        "      <td>25111</td>\n",
 672 |        "      <td>17399</td>\n",
 673 |        "      <td>United Kingdom</td>\n",
 674 |        "      <td>-25111</td>\n",
 675 |        "    </tr>\n",
 676 |        "    <tr>\n",
 677 |        "      <th>241827</th>\n",
 678 |        "      <td>512771</td>\n",
 679 |        "      <td>M</td>\n",
 680 |        "      <td>Manual</td>\n",
 681 |        "      <td>1</td>\n",
 682 |        "      <td>2010-06-17 16:53:00</td>\n",
 683 |        "      <td>25111</td>\n",
 684 |        "      <td>nan</td>\n",
 685 |        "      <td>United Kingdom</td>\n",
 686 |        "      <td>25111</td>\n",
 687 |        "    </tr>\n",
 688 |        "    <tr>\n",
 689 |        "      <th>320581</th>\n",
 690 |        "      <td>C520667</td>\n",
 691 |        "      <td>BANK CHARGES</td>\n",
 692 |        "      <td>Bank Charges</td>\n",
 693 |        "      <td>-1</td>\n",
 694 |        "      <td>2010-08-27 13:42:00</td>\n",
 695 |        "      <td>18911</td>\n",
 696 |        "      <td>nan</td>\n",
 697 |        "      <td>United Kingdom</td>\n",
 698 |        "      <td>-18911</td>\n",
 699 |        "    </tr>\n",
 700 |        "    <tr>\n",
 701 |        "      <th>517953</th>\n",
 702 |        "      <td>C537630</td>\n",
 703 |        "      <td>AMAZONFEE</td>\n",
 704 |        "      <td>AMAZON FEE</td>\n",
 705 |        "      <td>-1</td>\n",
 706 |        "      <td>2010-12-07 15:04:00</td>\n",
 707 |        "      <td>13541</td>\n",
 708 |        "      <td>nan</td>\n",
 709 |        "      <td>United Kingdom</td>\n",
 710 |        "      <td>-13541</td>\n",
 711 |        "    </tr>\n",
 712 |        "    <tr>\n",
 713 |        "      <th>519294</th>\n",
 714 |        "      <td>C537651</td>\n",
 715 |        "      <td>AMAZONFEE</td>\n",
 716 |        "      <td>AMAZON FEE</td>\n",
 717 |        "      <td>-1</td>\n",
 718 |        "      <td>2010-12-07 15:49:00</td>\n",
 719 |        "      <td>13541</td>\n",
 720 |        "      <td>nan</td>\n",
 721 |        "      <td>United Kingdom</td>\n",
 722 |        "      <td>-13541</td>\n",
 723 |        "    </tr>\n",
 724 |        "  </tbody>\n",
 725 |        "</table>\n",
 726 |        "</div>"
 727 |       ],
 728 |       "text/plain": [
 729 |        "        Invoice     StockCode   Description  Quantity         InvoiceDate  \\\n",
 730 |        "241824  C512770             M        Manual        -1 2010-06-17 16:52:00   \n",
 731 |        "241827   512771             M        Manual         1 2010-06-17 16:53:00   \n",
 732 |        "320581  C520667  BANK CHARGES  Bank Charges        -1 2010-08-27 13:42:00   \n",
 733 |        "517953  C537630     AMAZONFEE    AMAZON FEE        -1 2010-12-07 15:04:00   \n",
 734 |        "519294  C537651     AMAZONFEE    AMAZON FEE        -1 2010-12-07 15:49:00   \n",
 735 |        "\n",
 736 |        "        Price  Customer ID         Country  TotalPrice  \n",
 737 |        "241824  25111        17399  United Kingdom      -25111  \n",
 738 |        "241827  25111          nan  United Kingdom       25111  \n",
 739 |        "320581  18911          nan  United Kingdom      -18911  \n",
 740 |        "517953  13541          nan  United Kingdom      -13541  \n",
 741 |        "519294  13541          nan  United Kingdom      -13541  "
 742 |       ]
 743 |      },
 744 |      "execution_count": 13,
 745 |      "metadata": {},
 746 |      "output_type": "execute_result"
 747 |     }
 748 |    ],
 749 |    "source": [
 750 |     "df.sort_values(\"Price\", ascending = False).head()"
 751 |    ]
 752 |   },
 753 |   {
 754 |    "cell_type": "markdown",
 755 |    "execution_count": null,
 756 |    "metadata": {},
 757 |    "source": [
 758 |     "## 8. How many orders came from which country?"
 759 |    ]
 760 |   },
 761 |   {
 762 |    "cell_type": "code",
 763 |    "execution_count": 14,
 764 |    "metadata": {},
 765 |    "outputs": [
 766 |     {
 767 |      "data": {
 768 |       "text/plain": [
 769 |        "United Kingdom          485852\n",
 770 |        "EIRE                      9670\n",
 771 |        "Germany                   8129\n",
 772 |        "France                    5772\n",
 773 |        "Netherlands               2769\n",
 774 |        "Spain                     1278\n",
 775 |        "Switzerland               1187\n",
 776 |        "Portugal                  1101\n",
 777 |        "Belgium                   1054\n",
 778 |        "Channel Islands            906\n",
 779 |        "Sweden                     902\n",
 780 |        "Italy                      731\n",
 781 |        "Australia                  654\n",
 782 |        "Cyprus                     554\n",
 783 |        "Austria                    537\n",
 784 |        "Greece                     517\n",
 785 |        "United Arab Emirates       432\n",
 786 |        "Denmark                    428\n",
 787 |        "Norway                     369\n",
 788 |        "Finland                    354\n",
 789 |        "Unspecified                310\n",
 790 |        "USA                        244\n",
 791 |        "Japan                      224\n",
 792 |        "Poland                     194\n",
 793 |        "Malta                      172\n",
 794 |        "Lithuania                  154\n",
 795 |        "Singapore                  117\n",
 796 |        "RSA                        111\n",
 797 |        "Bahrain                    107\n",
 798 |        "Canada                      77\n",
 799 |        "Hong Kong                   76\n",
 800 |        "Thailand                    76\n",
 801 |        "Israel                      74\n",
 802 |        "Iceland                     71\n",
 803 |        "Korea                       63\n",
 804 |        "Brazil                      62\n",
 805 |        "West Indies                 54\n",
 806 |        "Bermuda                     34\n",
 807 |        "Nigeria                     32\n",
 808 |        "Lebanon                     13\n",
 809 |        "Name: Country, dtype: int64"
 810 |       ]
 811 |      },
 812 |      "execution_count": 14,
 813 |      "metadata": {},
 814 |      "output_type": "execute_result"
 815 |     }
 816 |    ],
 817 |    "source": [
 818 |     "df[\"Country\"].value_counts()"
 819 |    ]
 820 |   },
 821 |   {
 822 |    "cell_type": "markdown",
 823 |    "execution_count": null,
 824 |    "metadata": {},
 825 |    "source": [
 826 |     "## 9. Which country gained how much?"
 827 |    ]
 828 |   },
 829 |   {
 830 |    "cell_type": "code",
 831 |    "execution_count": 15,
 832 |    "metadata": {},
 833 |    "outputs": [
 834 |     {
 835 |      "data": {
 836 |       "text/html": [
 837 |        "<div>\n",
 838 |        "<style scoped>\n",
 839 |        "    .dataframe tbody tr th:only-of-type {\n",
 840 |        "        vertical-align: middle;\n",
 841 |        "    }\n",
 842 |        "\n",
 843 |        "    .dataframe tbody tr th {\n",
 844 |        "        vertical-align: top;\n",
 845 |        "    }\n",
 846 |        "\n",
 847 |        "    .dataframe thead th {\n",
 848 |        "        text-align: right;\n",
 849 |        "    }\n",
 850 |        "</style>\n",
 851 |        "<table border=\"1\" class=\"dataframe\">\n",
 852 |        "  <thead>\n",
 853 |        "    <tr style=\"text-align: right;\">\n",
 854 |        "      <th></th>\n",
 855 |        "      <th>TotalPrice</th>\n",
 856 |        "    </tr>\n",
 857 |        "    <tr>\n",
 858 |        "      <th>Country</th>\n",
 859 |        "      <th></th>\n",
 860 |        "    </tr>\n",
 861 |        "  </thead>\n",
 862 |        "  <tbody>\n",
 863 |        "    <tr>\n",
 864 |        "      <th>United Kingdom</th>\n",
 865 |        "      <td>8194778</td>\n",
 866 |        "    </tr>\n",
 867 |        "    <tr>\n",
 868 |        "      <th>EIRE</th>\n",
 869 |        "      <td>352243</td>\n",
 870 |        "    </tr>\n",
 871 |        "    <tr>\n",
 872 |        "      <th>Netherlands</th>\n",
 873 |        "      <td>263863</td>\n",
 874 |        "    </tr>\n",
 875 |        "    <tr>\n",
 876 |        "      <th>Germany</th>\n",
 877 |        "      <td>196290</td>\n",
 878 |        "    </tr>\n",
 879 |        "    <tr>\n",
 880 |        "      <th>France</th>\n",
 881 |        "      <td>130770</td>\n",
 882 |        "    </tr>\n",
 883 |        "  </tbody>\n",
 884 |        "</table>\n",
 885 |        "</div>"
 886 |       ],
 887 |       "text/plain": [
 888 |        "                TotalPrice\n",
 889 |        "Country                   \n",
 890 |        "United Kingdom     8194778\n",
 891 |        "EIRE                352243\n",
 892 |        "Netherlands         263863\n",
 893 |        "Germany             196290\n",
 894 |        "France              130770"
 895 |       ]
 896 |      },
 897 |      "execution_count": 15,
 898 |      "metadata": {},
 899 |      "output_type": "execute_result"
 900 |     }
 901 |    ],
 902 |    "source": [
 903 |     "df.groupby(\"Country\").agg({\"TotalPrice\":\"sum\"}).sort_values(\"TotalPrice\", ascending = False).head()"
 904 |    ]
 905 |   },
 906 |   {
 907 |    "cell_type": "markdown",
 908 |    "execution_count": null,
 909 |    "metadata": {},
 910 |    "source": [
 911 |     "## 10. Which product is the most returned?"
 912 |    ]
 913 |   },
 914 |   {
 915 |    "cell_type": "code",
 916 |    "execution_count": 16,
 917 |    "metadata": {},
 918 |    "outputs": [
 919 |     {
 920 |      "data": {
 921 |       "text/html": [
 922 |        "<div>\n",
 923 |        "<style scoped>\n",
 924 |        "    .dataframe tbody tr th:only-of-type {\n",
 925 |        "        vertical-align: middle;\n",
 926 |        "    }\n",
 927 |        "\n",
 928 |        "    .dataframe tbody tr th {\n",
 929 |        "        vertical-align: top;\n",
 930 |        "    }\n",
 931 |        "\n",
 932 |        "    .dataframe thead th {\n",
 933 |        "        text-align: right;\n",
 934 |        "    }\n",
 935 |        "</style>\n",
 936 |        "<table border=\"1\" class=\"dataframe\">\n",
 937 |        "  <thead>\n",
 938 |        "    <tr style=\"text-align: right;\">\n",
 939 |        "      <th></th>\n",
 940 |        "      <th>Invoice</th>\n",
 941 |        "      <th>StockCode</th>\n",
 942 |        "      <th>Description</th>\n",
 943 |        "      <th>Quantity</th>\n",
 944 |        "      <th>InvoiceDate</th>\n",
 945 |        "      <th>Price</th>\n",
 946 |        "      <th>Customer ID</th>\n",
 947 |        "      <th>Country</th>\n",
 948 |        "      <th>TotalPrice</th>\n",
 949 |        "    </tr>\n",
 950 |        "  </thead>\n",
 951 |        "  <tbody>\n",
 952 |        "    <tr>\n",
 953 |        "      <th>507225</th>\n",
 954 |        "      <td>C536757</td>\n",
 955 |        "      <td>84347</td>\n",
 956 |        "      <td>ROTATING SILVER ANGELS T-LIGHT HLDR</td>\n",
 957 |        "      <td>-9360</td>\n",
 958 |        "      <td>2010-12-02 14:23:00</td>\n",
 959 |        "      <td>0</td>\n",
 960 |        "      <td>15838</td>\n",
 961 |        "      <td>United Kingdom</td>\n",
 962 |        "      <td>-281</td>\n",
 963 |        "    </tr>\n",
 964 |        "    <tr>\n",
 965 |        "      <th>359669</th>\n",
 966 |        "      <td>C524235</td>\n",
 967 |        "      <td>21088</td>\n",
 968 |        "      <td>SET/6 FRUIT SALAD PAPER CUPS</td>\n",
 969 |        "      <td>-7128</td>\n",
 970 |        "      <td>2010-09-28 11:02:00</td>\n",
 971 |        "      <td>0</td>\n",
 972 |        "      <td>14277</td>\n",
 973 |        "      <td>France</td>\n",
 974 |        "      <td>-570</td>\n",
 975 |        "    </tr>\n",
 976 |        "    <tr>\n",
 977 |        "      <th>359670</th>\n",
 978 |        "      <td>C524235</td>\n",
 979 |        "      <td>21096</td>\n",
 980 |        "      <td>SET/6 FRUIT SALAD  PAPER PLATES</td>\n",
 981 |        "      <td>-7008</td>\n",
 982 |        "      <td>2010-09-28 11:02:00</td>\n",
 983 |        "      <td>0</td>\n",
 984 |        "      <td>14277</td>\n",
 985 |        "      <td>France</td>\n",
 986 |        "      <td>-911</td>\n",
 987 |        "    </tr>\n",
 988 |        "    <tr>\n",
 989 |        "      <th>359630</th>\n",
 990 |        "      <td>C524235</td>\n",
 991 |        "      <td>16047</td>\n",
 992 |        "      <td>POP ART PEN CASE &amp; PENS</td>\n",
 993 |        "      <td>-5184</td>\n",
 994 |        "      <td>2010-09-28 11:02:00</td>\n",
 995 |        "      <td>0</td>\n",
 996 |        "      <td>14277</td>\n",
 997 |        "      <td>France</td>\n",
 998 |        "      <td>-415</td>\n",
 999 |        "    </tr>\n",
1000 |        "    <tr>\n",
1001 |        "      <th>359636</th>\n",
1002 |        "      <td>C524235</td>\n",
1003 |        "      <td>37340</td>\n",
1004 |        "      <td>MULTICOLOUR SPRING FLOWER MUG</td>\n",
1005 |        "      <td>-4992</td>\n",
1006 |        "      <td>2010-09-28 11:02:00</td>\n",
1007 |        "      <td>0</td>\n",
1008 |        "      <td>14277</td>\n",
1009 |        "      <td>France</td>\n",
1010 |        "      <td>-499</td>\n",
1011 |        "    </tr>\n",
1012 |        "  </tbody>\n",
1013 |        "</table>\n",
1014 |        "</div>"
1015 |       ],
1016 |       "text/plain": [
1017 |        "        Invoice StockCode                          Description  Quantity  \\\n",
1018 |        "507225  C536757     84347  ROTATING SILVER ANGELS T-LIGHT HLDR     -9360   \n",
1019 |        "359669  C524235     21088         SET/6 FRUIT SALAD PAPER CUPS     -7128   \n",
1020 |        "359670  C524235     21096      SET/6 FRUIT SALAD  PAPER PLATES     -7008   \n",
1021 |        "359630  C524235     16047              POP ART PEN CASE & PENS     -5184   \n",
1022 |        "359636  C524235     37340        MULTICOLOUR SPRING FLOWER MUG     -4992   \n",
1023 |        "\n",
1024 |        "               InvoiceDate  Price  Customer ID         Country  TotalPrice  \n",
1025 |        "507225 2010-12-02 14:23:00      0        15838  United Kingdom        -281  \n",
1026 |        "359669 2010-09-28 11:02:00      0        14277          France        -570  \n",
1027 |        "359670 2010-09-28 11:02:00      0        14277          France        -911  \n",
1028 |        "359630 2010-09-28 11:02:00      0        14277          France        -415  \n",
1029 |        "359636 2010-09-28 11:02:00      0        14277          France        -499  "
1030 |       ]
1031 |      },
1032 |      "execution_count": 16,
1033 |      "metadata": {},
1034 |      "output_type": "execute_result"
1035 |     }
1036 |    ],
1037 |    "source": [
1038 |     "df[df['Invoice'].str.startswith(\"C\", na=False)].sort_values(\"Quantity\", ascending = True).head()"
1039 |    ]
1040 |   },
1041 |   {
1042 |    "cell_type": "markdown",
1043 |    "execution_count": null,
1044 |    "metadata": {},
1045 |    "source": [
1046 |     "# Data Preparation"
1047 |    ]
1048 |   },
1049 |   {
1050 |    "cell_type": "code",
1051 |    "execution_count": 17,
1052 |    "metadata": {},
1053 |    "outputs": [
1054 |     {
1055 |      "data": {
1056 |       "text/plain": [
1057 |        "Invoice             0\n",
1058 |        "StockCode           0\n",
1059 |        "Description      2928\n",
1060 |        "Quantity            0\n",
1061 |        "InvoiceDate         0\n",
1062 |        "Price               0\n",
1063 |        "Customer ID    107927\n",
1064 |        "Country             0\n",
1065 |        "TotalPrice          0\n",
1066 |        "dtype: int64"
1067 |       ]
1068 |      },
1069 |      "execution_count": 17,
1070 |      "metadata": {},
1071 |      "output_type": "execute_result"
1072 |     }
1073 |    ],
1074 |    "source": [
1075 |     "df.isnull().sum()"
1076 |    ]
1077 |   },
1078 |   {
1079 |    "cell_type": "code",
1080 |    "execution_count": 18,
1081 |    "metadata": {},
1082 |    "outputs": [],
1083 |    "source": [
1084 |     "df.dropna(inplace = True)"
1085 |    ]
1086 |   },
1087 |   {
1088 |    "cell_type": "code",
1089 |    "execution_count": 19,
1090 |    "metadata": {},
1091 |    "outputs": [
1092 |     {
1093 |      "data": {
1094 |       "text/plain": [
1095 |        "(417534, 9)"
1096 |       ]
1097 |      },
1098 |      "execution_count": 19,
1099 |      "metadata": {},
1100 |      "output_type": "execute_result"
1101 |     }
1102 |    ],
1103 |    "source": [
1104 |     "df.shape"
1105 |    ]
1106 |   },
1107 |   {
1108 |    "cell_type": "code",
1109 |    "execution_count": 20,
1110 |    "metadata": {},
1111 |    "outputs": [
1112 |     {
1113 |      "data": {
1114 |       "text/html": [
1115 |        "<div>\n",
1116 |        "<style scoped>\n",
1117 |        "    .dataframe tbody tr th:only-of-type {\n",
1118 |        "        vertical-align: middle;\n",
1119 |        "    }\n",
1120 |        "\n",
1121 |        "    .dataframe tbody tr th {\n",
1122 |        "        vertical-align: top;\n",
1123 |        "    }\n",
1124 |        "\n",
1125 |        "    .dataframe thead th {\n",
1126 |        "        text-align: right;\n",
1127 |        "    }\n",
1128 |        "</style>\n",
1129 |        "<table border=\"1\" class=\"dataframe\">\n",
1130 |        "  <thead>\n",
1131 |        "    <tr style=\"text-align: right;\">\n",
1132 |        "      <th></th>\n",
1133 |        "      <th>count</th>\n",
1134 |        "      <th>mean</th>\n",
1135 |        "      <th>std</th>\n",
1136 |        "      <th>min</th>\n",
1137 |        "      <th>1%</th>\n",
1138 |        "      <th>5%</th>\n",
1139 |        "      <th>10%</th>\n",
1140 |        "      <th>25%</th>\n",
1141 |        "      <th>50%</th>\n",
1142 |        "      <th>75%</th>\n",
1143 |        "      <th>90%</th>\n",
1144 |        "      <th>95%</th>\n",
1145 |        "      <th>99%</th>\n",
1146 |        "      <th>max</th>\n",
1147 |        "    </tr>\n",
1148 |        "  </thead>\n",
1149 |        "  <tbody>\n",
1150 |        "    <tr>\n",
1151 |        "      <th>Quantity</th>\n",
1152 |        "      <td>417534</td>\n",
1153 |        "      <td>13</td>\n",
1154 |        "      <td>101</td>\n",
1155 |        "      <td>-9360</td>\n",
1156 |        "      <td>-2</td>\n",
1157 |        "      <td>1</td>\n",
1158 |        "      <td>1</td>\n",
1159 |        "      <td>2</td>\n",
1160 |        "      <td>4</td>\n",
1161 |        "      <td>12</td>\n",
1162 |        "      <td>24</td>\n",
1163 |        "      <td>36</td>\n",
1164 |        "      <td>144</td>\n",
1165 |        "      <td>19152</td>\n",
1166 |        "    </tr>\n",
1167 |        "    <tr>\n",
1168 |        "      <th>Price</th>\n",
1169 |        "      <td>417534</td>\n",
1170 |        "      <td>4</td>\n",
1171 |        "      <td>71</td>\n",
1172 |        "      <td>0</td>\n",
1173 |        "      <td>0</td>\n",
1174 |        "      <td>0</td>\n",
1175 |        "      <td>1</td>\n",
1176 |        "      <td>1</td>\n",
1177 |        "      <td>2</td>\n",
1178 |        "      <td>4</td>\n",
1179 |        "      <td>7</td>\n",
1180 |        "      <td>8</td>\n",
1181 |        "      <td>15</td>\n",
1182 |        "      <td>25111</td>\n",
1183 |        "    </tr>\n",
1184 |        "    <tr>\n",
1185 |        "      <th>Customer ID</th>\n",
1186 |        "      <td>417534</td>\n",
1187 |        "      <td>15361</td>\n",
1188 |        "      <td>1681</td>\n",
1189 |        "      <td>12346</td>\n",
1190 |        "      <td>12435</td>\n",
1191 |        "      <td>12725</td>\n",
1192 |        "      <td>13042</td>\n",
1193 |        "      <td>13983</td>\n",
1194 |        "      <td>15311</td>\n",
1195 |        "      <td>16799</td>\n",
1196 |        "      <td>17706</td>\n",
1197 |        "      <td>17913</td>\n",
1198 |        "      <td>18196</td>\n",
1199 |        "      <td>18287</td>\n",
1200 |        "    </tr>\n",
1201 |        "    <tr>\n",
1202 |        "      <th>TotalPrice</th>\n",
1203 |        "      <td>417534</td>\n",
1204 |        "      <td>20</td>\n",
1205 |        "      <td>100</td>\n",
1206 |        "      <td>-25111</td>\n",
1207 |        "      <td>-11</td>\n",
1208 |        "      <td>1</td>\n",
1209 |        "      <td>2</td>\n",
1210 |        "      <td>4</td>\n",
1211 |        "      <td>11</td>\n",
1212 |        "      <td>19</td>\n",
1213 |        "      <td>35</td>\n",
1214 |        "      <td>65</td>\n",
1215 |        "      <td>196</td>\n",
1216 |        "      <td>15818</td>\n",
1217 |        "    </tr>\n",
1218 |        "  </tbody>\n",
1219 |        "</table>\n",
1220 |        "</div>"
1221 |       ],
1222 |       "text/plain": [
1223 |        "             count  mean  std    min    1%    5%   10%   25%   50%   75%  \\\n",
1224 |        "Quantity    417534    13  101  -9360    -2     1     1     2     4    12   \n",
1225 |        "Price       417534     4   71      0     0     0     1     1     2     4   \n",
1226 |        "Customer ID 417534 15361 1681  12346 12435 12725 13042 13983 15311 16799   \n",
1227 |        "TotalPrice  417534    20  100 -25111   -11     1     2     4    11    19   \n",
1228 |        "\n",
1229 |        "              90%   95%   99%   max  \n",
1230 |        "Quantity       24    36   144 19152  \n",
1231 |        "Price           7     8    15 25111  \n",
1232 |        "Customer ID 17706 17913 18196 18287  \n",
1233 |        "TotalPrice     35    65   196 15818  "
1234 |       ]
1235 |      },
1236 |      "execution_count": 20,
1237 |      "metadata": {},
1238 |      "output_type": "execute_result"
1239 |     }
1240 |    ],
1241 |    "source": [
1242 |     "df.describe([0.01,0.05,0.10,0.25,0.50,0.75,0.90,0.95, 0.99]).T"
1243 |    ]
1244 |   },
1245 |   {
1246 |    "cell_type": "code",
1247 |    "execution_count": 21,
1248 |    "metadata": {},
1249 |    "outputs": [
1250 |     {
1251 |      "name": "stdout",
1252 |      "output_type": "stream",
1253 |      "text": [
1254 |       "Quantity yes\n",
1255 |       "1063\n",
1256 |       "Price yes\n",
1257 |       "953\n",
1258 |       "TotalPrice yes\n",
1259 |       "1150\n"
1260 |      ]
1261 |     }
1262 |    ],
1263 |    "source": [
1264 |     "for feature in [\"Quantity\",\"Price\",\"TotalPrice\"]:\n",
1265 |     "\n",
1266 |     "    Q1 = df[feature].quantile(0.01)\n",
1267 |     "    Q3 = df[feature].quantile(0.99)\n",
1268 |     "    IQR = Q3-Q1\n",
1269 |     "    upper = Q3 + 1.5*IQR\n",
1270 |     "    lower = Q1 - 1.5*IQR\n",
1271 |     "\n",
1272 |     "    if df[(df[feature] > upper) | (df[feature] < lower)].any(axis=None):\n",
1273 |     "        print(feature,\"yes\")\n",
1274 |     "        print(df[(df[feature] > upper) | (df[feature] < lower)].shape[0])\n",
1275 |     "    else:\n",
1276 |     "        print(feature, \"no\")"
1277 |    ]
1278 |   },
1279 |   {
1280 |    "cell_type": "markdown",
1281 |    "execution_count": null,
1282 |    "metadata": {},
1283 |    "source": [
1284 |     "# Customer Segmentation with RFM Scores\n",
1285 |     "\n",
1286 |     "Consists of initials of Recency, Frequency, Monetary expressions.\n",
1287 |     "\n",
1288 |     "It is a technique that helps determine marketing and sales strategies based on customers' buying habits.\n",
1289 |     "\n",
1290 |     "- Recency (innovation): Time since customer last purchased\n",
1291 |     "\n",
1292 |     "     -- In other words, it is the “time since the last contact of the customer”.\n",
1293 |     "\n",
1294 |     "     -- Today's date - Last purchase\n",
1295 |     "\n",
1296 |     "     -- To give an example, if we are doing this analysis today, today's date is the last product purchase date.\n",
1297 |     "\n",
1298 |     "     -- This can be for example 20 or 100. We know that 20 customers are hotter. He has been in contact with us recently.\n",
1299 |     "\n",
1300 |     "- Frequency: Total number of purchases.\n",
1301 |     "\n",
1302 |     "- Monetary (Monetary Value): Total spending by the customer.\n"
1303 |    ]
1304 |   },
1305 |   {
1306 |    "cell_type": "code",
1307 |    "execution_count": 22,
1308 |    "metadata": {},
1309 |    "outputs": [
1310 |     {
1311 |      "data": {
1312 |       "text/html": [
1313 |        "<div>\n",
1314 |        "<style scoped>\n",
1315 |        "    .dataframe tbody tr th:only-of-type {\n",
1316 |        "        vertical-align: middle;\n",
1317 |        "    }\n",
1318 |        "\n",
1319 |        "    .dataframe tbody tr th {\n",
1320 |        "        vertical-align: top;\n",
1321 |        "    }\n",
1322 |        "\n",
1323 |        "    .dataframe thead th {\n",
1324 |        "        text-align: right;\n",
1325 |        "    }\n",
1326 |        "</style>\n",
1327 |        "<table border=\"1\" class=\"dataframe\">\n",
1328 |        "  <thead>\n",
1329 |        "    <tr style=\"text-align: right;\">\n",
1330 |        "      <th></th>\n",
1331 |        "      <th>Invoice</th>\n",
1332 |        "      <th>StockCode</th>\n",
1333 |        "      <th>Description</th>\n",
1334 |        "      <th>Quantity</th>\n",
1335 |        "      <th>InvoiceDate</th>\n",
1336 |        "      <th>Price</th>\n",
1337 |        "      <th>Customer ID</th>\n",
1338 |        "      <th>Country</th>\n",
1339 |        "      <th>TotalPrice</th>\n",
1340 |        "    </tr>\n",
1341 |        "  </thead>\n",
1342 |        "  <tbody>\n",
1343 |        "    <tr>\n",
1344 |        "      <th>0</th>\n",
1345 |        "      <td>489434</td>\n",
1346 |        "      <td>85048</td>\n",
1347 |        "      <td>15CM CHRISTMAS GLASS BALL 20 LIGHTS</td>\n",
1348 |        "      <td>12</td>\n",
1349 |        "      <td>2009-12-01 07:45:00</td>\n",
1350 |        "      <td>7</td>\n",
1351 |        "      <td>13085</td>\n",
1352 |        "      <td>United Kingdom</td>\n",
1353 |        "      <td>83</td>\n",
1354 |        "    </tr>\n",
1355 |        "    <tr>\n",
1356 |        "      <th>1</th>\n",
1357 |        "      <td>489434</td>\n",
1358 |        "      <td>79323P</td>\n",
1359 |        "      <td>PINK CHERRY LIGHTS</td>\n",
1360 |        "      <td>12</td>\n",
1361 |        "      <td>2009-12-01 07:45:00</td>\n",
1362 |        "      <td>7</td>\n",
1363 |        "      <td>13085</td>\n",
1364 |        "      <td>United Kingdom</td>\n",
1365 |        "      <td>81</td>\n",
1366 |        "    </tr>\n",
1367 |        "    <tr>\n",
1368 |        "      <th>2</th>\n",
1369 |        "      <td>489434</td>\n",
1370 |        "      <td>79323W</td>\n",
1371 |        "      <td>WHITE CHERRY LIGHTS</td>\n",
1372 |        "      <td>12</td>\n",
1373 |        "      <td>2009-12-01 07:45:00</td>\n",
1374 |        "      <td>7</td>\n",
1375 |        "      <td>13085</td>\n",
1376 |        "      <td>United Kingdom</td>\n",
1377 |        "      <td>81</td>\n",
1378 |        "    </tr>\n",
1379 |        "    <tr>\n",
1380 |        "      <th>3</th>\n",
1381 |        "      <td>489434</td>\n",
1382 |        "      <td>22041</td>\n",
1383 |        "      <td>RECORD FRAME 7\" SINGLE SIZE</td>\n",
1384 |        "      <td>48</td>\n",
1385 |        "      <td>2009-12-01 07:45:00</td>\n",
1386 |        "      <td>2</td>\n",
1387 |        "      <td>13085</td>\n",
1388 |        "      <td>United Kingdom</td>\n",
1389 |        "      <td>101</td>\n",
1390 |        "    </tr>\n",
1391 |        "    <tr>\n",
1392 |        "      <th>4</th>\n",
1393 |        "      <td>489434</td>\n",
1394 |        "      <td>21232</td>\n",
1395 |        "      <td>STRAWBERRY CERAMIC TRINKET BOX</td>\n",
1396 |        "      <td>24</td>\n",
1397 |        "      <td>2009-12-01 07:45:00</td>\n",
1398 |        "      <td>1</td>\n",
1399 |        "      <td>13085</td>\n",
1400 |        "      <td>United Kingdom</td>\n",
1401 |        "      <td>30</td>\n",
1402 |        "    </tr>\n",
1403 |        "  </tbody>\n",
1404 |        "</table>\n",
1405 |        "</div>"
1406 |       ],
1407 |       "text/plain": [
1408 |        "  Invoice StockCode                          Description  Quantity  \\\n",
1409 |        "0  489434     85048  15CM CHRISTMAS GLASS BALL 20 LIGHTS        12   \n",
1410 |        "1  489434    79323P                   PINK CHERRY LIGHTS        12   \n",
1411 |        "2  489434    79323W                  WHITE CHERRY LIGHTS        12   \n",
1412 |        "3  489434     22041         RECORD FRAME 7\" SINGLE SIZE         48   \n",
1413 |        "4  489434     21232       STRAWBERRY CERAMIC TRINKET BOX        24   \n",
1414 |        "\n",
1415 |        "          InvoiceDate  Price  Customer ID         Country  TotalPrice  \n",
1416 |        "0 2009-12-01 07:45:00      7        13085  United Kingdom          83  \n",
1417 |        "1 2009-12-01 07:45:00      7        13085  United Kingdom          81  \n",
1418 |        "2 2009-12-01 07:45:00      7        13085  United Kingdom          81  \n",
1419 |        "3 2009-12-01 07:45:00      2        13085  United Kingdom         101  \n",
1420 |        "4 2009-12-01 07:45:00      1        13085  United Kingdom          30  "
1421 |       ]
1422 |      },
1423 |      "execution_count": 22,
1424 |      "metadata": {},
1425 |      "output_type": "execute_result"
1426 |     }
1427 |    ],
1428 |    "source": [
1429 |     "df.head()"
1430 |    ]
1431 |   },
1432 |   {
1433 |    "cell_type": "code",
1434 |    "execution_count": 23,
1435 |    "metadata": {},
1436 |    "outputs": [
1437 |     {
1438 |      "name": "stdout",
1439 |      "output_type": "stream",
1440 |      "text": [
1441 |       "<class 'pandas.core.frame.DataFrame'>\n",
1442 |       "Int64Index: 417534 entries, 0 to 525460\n",
1443 |       "Data columns (total 9 columns):\n",
1444 |       " #   Column       Non-Null Count   Dtype         \n",
1445 |       "---  ------       --------------   -----         \n",
1446 |       " 0   Invoice      417534 non-null  object        \n",
1447 |       " 1   StockCode    417534 non-null  object        \n",
1448 |       " 2   Description  417534 non-null  object        \n",
1449 |       " 3   Quantity     417534 non-null  int64         \n",
1450 |       " 4   InvoiceDate  417534 non-null  datetime64[ns]\n",
1451 |       " 5   Price        417534 non-null  float64       \n",
1452 |       " 6   Customer ID  417534 non-null  float64       \n",
1453 |       " 7   Country      417534 non-null  object        \n",
1454 |       " 8   TotalPrice   417534 non-null  float64       \n",
1455 |       "dtypes: datetime64[ns](1), float64(3), int64(1), object(4)\n",
1456 |       "memory usage: 31.9+ MB\n"
1457 |      ]
1458 |     }
1459 |    ],
1460 |    "source": [
1461 |     "df.info()"
1462 |    ]
1463 |   },
1464 |   {
1465 |    "cell_type": "code",
1466 |    "execution_count": 24,
1467 |    "metadata": {},
1468 |    "outputs": [
1469 |     {
1470 |      "data": {
1471 |       "text/plain": [
1472 |        "Timestamp('2009-12-01 07:45:00')"
1473 |       ]
1474 |      },
1475 |      "execution_count": 24,
1476 |      "metadata": {},
1477 |      "output_type": "execute_result"
1478 |     }
1479 |    ],
1480 |    "source": [
1481 |     "df[\"InvoiceDate\"].min()"
1482 |    ]
1483 |   },
1484 |   {
1485 |    "cell_type": "code",
1486 |    "execution_count": 25,
1487 |    "metadata": {},
1488 |    "outputs": [
1489 |     {
1490 |      "data": {
1491 |       "text/plain": [
1492 |        "Timestamp('2010-12-09 20:01:00')"
1493 |       ]
1494 |      },
1495 |      "execution_count": 25,
1496 |      "metadata": {},
1497 |      "output_type": "execute_result"
1498 |     }
1499 |    ],
1500 |    "source": [
1501 |     "df[\"InvoiceDate\"].max()"
1502 |    ]
1503 |   },
1504 |   {
1505 |    "cell_type": "markdown",
1506 |    "execution_count": null,
1507 |    "metadata": {},
1508 |    "source": [
1509 |     "What is today? Now if we take today's date, then there will be a very serious difference.\n",
1510 |     "\n",
1511 |     "For this reason, let us determine ourselves a \"today\" according to the structure of this data set.\n",
1512 |     "\n",
1513 |     "We can set this day as the maximum day of the data set.\n",
1514 |     "\n",
1515 |     "We can segmentation according to the day of the last recording."
1516 |    ]
1517 |   },
1518 |   {
1519 |    "cell_type": "code",
1520 |    "execution_count": 26,
1521 |    "metadata": {},
1522 |    "outputs": [],
1523 |    "source": [
1524 |     "import datetime as dt\n",
1525 |     "\n",
1526 |     "today_date = dt.datetime(2010,12,9)"
1527 |    ]
1528 |   },
1529 |   {
1530 |    "cell_type": "code",
1531 |    "execution_count": 27,
1532 |    "metadata": {},
1533 |    "outputs": [
1534 |     {
1535 |      "data": {
1536 |       "text/plain": [
1537 |        "datetime.datetime(2010, 12, 9, 0, 0)"
1538 |       ]
1539 |      },
1540 |      "execution_count": 27,
1541 |      "metadata": {},
1542 |      "output_type": "execute_result"
1543 |     }
1544 |    ],
1545 |    "source": [
1546 |     "today_date"
1547 |    ]
1548 |   },
1549 |   {
1550 |    "cell_type": "markdown",
1551 |    "execution_count": null,
1552 |    "metadata": {},
1553 |    "source": [
1554 |     "## 11. Show the last shopping dates of each customer."
1555 |    ]
1556 |   },
1557 |   {
1558 |    "cell_type": "code",
1559 |    "execution_count": 28,
1560 |    "metadata": {},
1561 |    "outputs": [
1562 |     {
1563 |      "data": {
1564 |       "text/html": [
1565 |        "<div>\n",
1566 |        "<style scoped>\n",
1567 |        "    .dataframe tbody tr th:only-of-type {\n",
1568 |        "        vertical-align: middle;\n",
1569 |        "    }\n",
1570 |        "\n",
1571 |        "    .dataframe tbody tr th {\n",
1572 |        "        vertical-align: top;\n",
1573 |        "    }\n",
1574 |        "\n",
1575 |        "    .dataframe thead th {\n",
1576 |        "        text-align: right;\n",
1577 |        "    }\n",
1578 |        "</style>\n",
1579 |        "<table border=\"1\" class=\"dataframe\">\n",
1580 |        "  <thead>\n",
1581 |        "    <tr style=\"text-align: right;\">\n",
1582 |        "      <th></th>\n",
1583 |        "      <th>InvoiceDate</th>\n",
1584 |        "    </tr>\n",
1585 |        "    <tr>\n",
1586 |        "      <th>Customer ID</th>\n",
1587 |        "      <th></th>\n",
1588 |        "    </tr>\n",
1589 |        "  </thead>\n",
1590 |        "  <tbody>\n",
1591 |        "    <tr>\n",
1592 |        "      <th>12346</th>\n",
1593 |        "      <td>2010-10-04 16:33:00</td>\n",
1594 |        "    </tr>\n",
1595 |        "    <tr>\n",
1596 |        "      <th>12347</th>\n",
1597 |        "      <td>2010-12-07 14:57:00</td>\n",
1598 |        "    </tr>\n",
1599 |        "    <tr>\n",
1600 |        "      <th>12348</th>\n",
1601 |        "      <td>2010-09-27 14:59:00</td>\n",
1602 |        "    </tr>\n",
1603 |        "    <tr>\n",
1604 |        "      <th>12349</th>\n",
1605 |        "      <td>2010-10-28 08:23:00</td>\n",
1606 |        "    </tr>\n",
1607 |        "    <tr>\n",
1608 |        "      <th>12351</th>\n",
1609 |        "      <td>2010-11-29 15:23:00</td>\n",
1610 |        "    </tr>\n",
1611 |        "  </tbody>\n",
1612 |        "</table>\n",
1613 |        "</div>"
1614 |       ],
1615 |       "text/plain": [
1616 |        "                    InvoiceDate\n",
1617 |        "Customer ID                    \n",
1618 |        "12346       2010-10-04 16:33:00\n",
1619 |        "12347       2010-12-07 14:57:00\n",
1620 |        "12348       2010-09-27 14:59:00\n",
1621 |        "12349       2010-10-28 08:23:00\n",
1622 |        "12351       2010-11-29 15:23:00"
1623 |       ]
1624 |      },
1625 |      "execution_count": 28,
1626 |      "metadata": {},
1627 |      "output_type": "execute_result"
1628 |     }
1629 |    ],
1630 |    "source": [
1631 |     "df.groupby(\"Customer ID\").agg({\"InvoiceDate\":\"max\"}).head()"
1632 |    ]
1633 |   },
1634 |   {
1635 |    "cell_type": "markdown",
1636 |    "execution_count": null,
1637 |    "metadata": {},
1638 |    "source": [
1639 |     "Now we have the last shopping dates of each customer. Let's fix \"Customer ID\"s."
1640 |    ]
1641 |   },
1642 |   {
1643 |    "cell_type": "code",
1644 |    "execution_count": 29,
1645 |    "metadata": {},
1646 |    "outputs": [],
1647 |    "source": [
1648 |     "df[\"Customer ID\"] = df[\"Customer ID\"].astype(int)"
1649 |    ]
1650 |   },
1651 |   {
1652 |    "cell_type": "markdown",
1653 |    "execution_count": null,
1654 |    "metadata": {},
1655 |    "source": [
1656 |     "## 12. What should we do for customer segmentation with RFM?"
1657 |    ]
1658 |   },
1659 |   {
1660 |    "cell_type": "markdown",
1661 |    "execution_count": null,
1662 |    "metadata": {},
1663 |    "source": [
1664 |     "For each customer, we need to deduce the customers' last purchase date from today's date.\n",
1665 |     "\n",
1666 |     "Then we have singularized customer deadlines."
1667 |    ]
1668 |   },
1669 |   {
1670 |    "cell_type": "code",
1671 |    "execution_count": 30,
1672 |    "metadata": {},
1673 |    "outputs": [
1674 |     {
1675 |      "data": {
1676 |       "text/html": [
1677 |        "<div>\n",
1678 |        "<style scoped>\n",
1679 |        "    .dataframe tbody tr th:only-of-type {\n",
1680 |        "        vertical-align: middle;\n",
1681 |        "    }\n",
1682 |        "\n",
1683 |        "    .dataframe tbody tr th {\n",
1684 |        "        vertical-align: top;\n",
1685 |        "    }\n",
1686 |        "\n",
1687 |        "    .dataframe thead th {\n",
1688 |        "        text-align: right;\n",
1689 |        "    }\n",
1690 |        "</style>\n",
1691 |        "<table border=\"1\" class=\"dataframe\">\n",
1692 |        "  <thead>\n",
1693 |        "    <tr style=\"text-align: right;\">\n",
1694 |        "      <th></th>\n",
1695 |        "      <th>InvoiceDate</th>\n",
1696 |        "    </tr>\n",
1697 |        "    <tr>\n",
1698 |        "      <th>Customer ID</th>\n",
1699 |        "      <th></th>\n",
1700 |        "    </tr>\n",
1701 |        "  </thead>\n",
1702 |        "  <tbody>\n",
1703 |        "    <tr>\n",
1704 |        "      <th>12346</th>\n",
1705 |        "      <td>65 days 07:27:00</td>\n",
1706 |        "    </tr>\n",
1707 |        "    <tr>\n",
1708 |        "      <th>12347</th>\n",
1709 |        "      <td>1 days 09:03:00</td>\n",
1710 |        "    </tr>\n",
1711 |        "    <tr>\n",
1712 |        "      <th>12348</th>\n",
1713 |        "      <td>72 days 09:01:00</td>\n",
1714 |        "    </tr>\n",
1715 |        "    <tr>\n",
1716 |        "      <th>12349</th>\n",
1717 |        "      <td>41 days 15:37:00</td>\n",
1718 |        "    </tr>\n",
1719 |        "    <tr>\n",
1720 |        "      <th>12351</th>\n",
1721 |        "      <td>9 days 08:37:00</td>\n",
1722 |        "    </tr>\n",
1723 |        "  </tbody>\n",
1724 |        "</table>\n",
1725 |        "</div>"
1726 |       ],
1727 |       "text/plain": [
1728 |        "                 InvoiceDate\n",
1729 |        "Customer ID                 \n",
1730 |        "12346       65 days 07:27:00\n",
1731 |        "12347        1 days 09:03:00\n",
1732 |        "12348       72 days 09:01:00\n",
1733 |        "12349       41 days 15:37:00\n",
1734 |        "12351        9 days 08:37:00"
1735 |       ]
1736 |      },
1737 |      "execution_count": 30,
1738 |      "metadata": {},
1739 |      "output_type": "execute_result"
1740 |     }
1741 |    ],
1742 |    "source": [
1743 |     "(today_date - df.groupby(\"Customer ID\").agg({\"InvoiceDate\":\"max\"})).head()"
1744 |    ]
1745 |   },
1746 |   {
1747 |    "cell_type": "code",
1748 |    "execution_count": 31,
1749 |    "metadata": {},
1750 |    "outputs": [],
1751 |    "source": [
1752 |     "temp_df = (today_date - df.groupby(\"Customer ID\").agg({\"InvoiceDate\":\"max\"}))"
1753 |    ]
1754 |   },
1755 |   {
1756 |    "cell_type": "code",
1757 |    "execution_count": 32,
1758 |    "metadata": {},
1759 |    "outputs": [],
1760 |    "source": [
1761 |     "temp_df.rename(columns={\"InvoiceDate\": \"Recency\"}, inplace = True)"
1762 |    ]
1763 |   },
1764 |   {
1765 |    "cell_type": "code",
1766 |    "execution_count": 33,
1767 |    "metadata": {},
1768 |    "outputs": [
1769 |     {
1770 |      "data": {
1771 |       "text/html": [
1772 |        "<div>\n",
1773 |        "<style scoped>\n",
1774 |        "    .dataframe tbody tr th:only-of-type {\n",
1775 |        "        vertical-align: middle;\n",
1776 |        "    }\n",
1777 |        "\n",
1778 |        "    .dataframe tbody tr th {\n",
1779 |        "        vertical-align: top;\n",
1780 |        "    }\n",
1781 |        "\n",
1782 |        "    .dataframe thead th {\n",
1783 |        "        text-align: right;\n",
1784 |        "    }\n",
1785 |        "</style>\n",
1786 |        "<table border=\"1\" class=\"dataframe\">\n",
1787 |        "  <thead>\n",
1788 |        "    <tr style=\"text-align: right;\">\n",
1789 |        "      <th></th>\n",
1790 |        "      <th>Recency</th>\n",
1791 |        "    </tr>\n",
1792 |        "    <tr>\n",
1793 |        "      <th>Customer ID</th>\n",
1794 |        "      <th></th>\n",
1795 |        "    </tr>\n",
1796 |        "  </thead>\n",
1797 |        "  <tbody>\n",
1798 |        "    <tr>\n",
1799 |        "      <th>12346</th>\n",
1800 |        "      <td>65 days 07:27:00</td>\n",
1801 |        "    </tr>\n",
1802 |        "    <tr>\n",
1803 |        "      <th>12347</th>\n",
1804 |        "      <td>1 days 09:03:00</td>\n",
1805 |        "    </tr>\n",
1806 |        "    <tr>\n",
1807 |        "      <th>12348</th>\n",
1808 |        "      <td>72 days 09:01:00</td>\n",
1809 |        "    </tr>\n",
1810 |        "    <tr>\n",
1811 |        "      <th>12349</th>\n",
1812 |        "      <td>41 days 15:37:00</td>\n",
1813 |        "    </tr>\n",
1814 |        "    <tr>\n",
1815 |        "      <th>12351</th>\n",
1816 |        "      <td>9 days 08:37:00</td>\n",
1817 |        "    </tr>\n",
1818 |        "  </tbody>\n",
1819 |        "</table>\n",
1820 |        "</div>"
1821 |       ],
1822 |       "text/plain": [
1823 |        "                     Recency\n",
1824 |        "Customer ID                 \n",
1825 |        "12346       65 days 07:27:00\n",
1826 |        "12347        1 days 09:03:00\n",
1827 |        "12348       72 days 09:01:00\n",
1828 |        "12349       41 days 15:37:00\n",
1829 |        "12351        9 days 08:37:00"
1830 |       ]
1831 |      },
1832 |      "execution_count": 33,
1833 |      "metadata": {},
1834 |      "output_type": "execute_result"
1835 |     }
1836 |    ],
1837 |    "source": [
1838 |     "temp_df.head()"
1839 |    ]
1840 |   },
1841 |   {
1842 |    "cell_type": "code",
1843 |    "execution_count": 34,
1844 |    "metadata": {},
1845 |    "outputs": [],
1846 |    "source": [
1847 |     "recency_df = temp_df[\"Recency\"].apply(lambda x: x.days)"
1848 |    ]
1849 |   },
1850 |   {
1851 |    "cell_type": "code",
1852 |    "execution_count": 35,
1853 |    "metadata": {},
1854 |    "outputs": [
1855 |     {
1856 |      "data": {
1857 |       "text/plain": [
1858 |        "Customer ID\n",
1859 |        "12346    65\n",
1860 |        "12347     1\n",
1861 |        "12348    72\n",
1862 |        "12349    41\n",
1863 |        "12351     9\n",
1864 |        "Name: Recency, dtype: int64"
1865 |       ]
1866 |      },
1867 |      "execution_count": 35,
1868 |      "metadata": {},
1869 |      "output_type": "execute_result"
1870 |     }
1871 |    ],
1872 |    "source": [
1873 |     "recency_df.head()"
1874 |    ]
1875 |   },
1876 |   {
1877 |    "cell_type": "code",
1878 |    "execution_count": 36,
1879 |    "metadata": {},
1880 |    "outputs": [],
1881 |    "source": [
1882 |     "#df.groupby(\"Customer ID\").agg({\"InvoiceDate\": lambda x: (today_date - x.max()).days}).head()"
1883 |    ]
1884 |   },
1885 |   {
1886 |    "cell_type": "markdown",
1887 |    "execution_count": null,
1888 |    "metadata": {},
1889 |    "source": [
1890 |     "# Frequency"
1891 |    ]
1892 |   },
1893 |   {
1894 |    "cell_type": "code",
1895 |    "execution_count": 37,
1896 |    "metadata": {},
1897 |    "outputs": [],
1898 |    "source": [
1899 |     "temp_df = df.groupby([\"Customer ID\",\"Invoice\"]).agg({\"Invoice\":\"count\"})"
1900 |    ]
1901 |   },
1902 |   {
1903 |    "cell_type": "code",
1904 |    "execution_count": 38,
1905 |    "metadata": {},
1906 |    "outputs": [
1907 |     {
1908 |      "data": {
1909 |       "text/html": [
1910 |        "<div>\n",
1911 |        "<style scoped>\n",
1912 |        "    .dataframe tbody tr th:only-of-type {\n",
1913 |        "        vertical-align: middle;\n",
1914 |        "    }\n",
1915 |        "\n",
1916 |        "    .dataframe tbody tr th {\n",
1917 |        "        vertical-align: top;\n",
1918 |        "    }\n",
1919 |        "\n",
1920 |        "    .dataframe thead th {\n",
1921 |        "        text-align: right;\n",
1922 |        "    }\n",
1923 |        "</style>\n",
1924 |        "<table border=\"1\" class=\"dataframe\">\n",
1925 |        "  <thead>\n",
1926 |        "    <tr style=\"text-align: right;\">\n",
1927 |        "      <th></th>\n",
1928 |        "      <th></th>\n",
1929 |        "      <th>Invoice</th>\n",
1930 |        "    </tr>\n",
1931 |        "    <tr>\n",
1932 |        "      <th>Customer ID</th>\n",
1933 |        "      <th>Invoice</th>\n",
1934 |        "      <th></th>\n",
1935 |        "    </tr>\n",
1936 |        "  </thead>\n",
1937 |        "  <tbody>\n",
1938 |        "    <tr>\n",
1939 |        "      <th rowspan=\"5\" valign=\"top\">12346</th>\n",
1940 |        "      <th>491725</th>\n",
1941 |        "      <td>1</td>\n",
1942 |        "    </tr>\n",
1943 |        "    <tr>\n",
1944 |        "      <th>491742</th>\n",
1945 |        "      <td>1</td>\n",
1946 |        "    </tr>\n",
1947 |        "    <tr>\n",
1948 |        "      <th>491744</th>\n",
1949 |        "      <td>1</td>\n",
1950 |        "    </tr>\n",
1951 |        "    <tr>\n",
1952 |        "      <th>492718</th>\n",
1953 |        "      <td>1</td>\n",
1954 |        "    </tr>\n",
1955 |        "    <tr>\n",
1956 |        "      <th>492722</th>\n",
1957 |        "      <td>1</td>\n",
1958 |        "    </tr>\n",
1959 |        "  </tbody>\n",
1960 |        "</table>\n",
1961 |        "</div>"
1962 |       ],
1963 |       "text/plain": [
1964 |        "                     Invoice\n",
1965 |        "Customer ID Invoice         \n",
1966 |        "12346       491725         1\n",
1967 |        "            491742         1\n",
1968 |        "            491744         1\n",
1969 |        "            492718         1\n",
1970 |        "            492722         1"
1971 |       ]
1972 |      },
1973 |      "execution_count": 38,
1974 |      "metadata": {},
1975 |      "output_type": "execute_result"
1976 |     }
1977 |    ],
1978 |    "source": [
1979 |     "temp_df.head()"
1980 |    ]
1981 |   },
1982 |   {
1983 |    "cell_type": "code",
1984 |    "execution_count": 39,
1985 |    "metadata": {},
1986 |    "outputs": [
1987 |     {
1988 |      "data": {
1989 |       "text/html": [
1990 |        "<div>\n",
1991 |        "<style scoped>\n",
1992 |        "    .dataframe tbody tr th:only-of-type {\n",
1993 |        "        vertical-align: middle;\n",
1994 |        "    }\n",
1995 |        "\n",
1996 |        "    .dataframe tbody tr th {\n",
1997 |        "        vertical-align: top;\n",
1998 |        "    }\n",
1999 |        "\n",
2000 |        "    .dataframe thead th {\n",
2001 |        "        text-align: right;\n",
2002 |        "    }\n",
2003 |        "</style>\n",
2004 |        "<table border=\"1\" class=\"dataframe\">\n",
2005 |        "  <thead>\n",
2006 |        "    <tr style=\"text-align: right;\">\n",
2007 |        "      <th></th>\n",
2008 |        "      <th>Invoice</th>\n",
2009 |        "    </tr>\n",
2010 |        "    <tr>\n",
2011 |        "      <th>Customer ID</th>\n",
2012 |        "      <th></th>\n",
2013 |        "    </tr>\n",
2014 |        "  </thead>\n",
2015 |        "  <tbody>\n",
2016 |        "    <tr>\n",
2017 |        "      <th>12346</th>\n",
2018 |        "      <td>15</td>\n",
2019 |        "    </tr>\n",
2020 |        "    <tr>\n",
2021 |        "      <th>12347</th>\n",
2022 |        "      <td>2</td>\n",
2023 |        "    </tr>\n",
2024 |        "    <tr>\n",
2025 |        "      <th>12348</th>\n",
2026 |        "      <td>1</td>\n",
2027 |        "    </tr>\n",
2028 |        "    <tr>\n",
2029 |        "      <th>12349</th>\n",
2030 |        "      <td>4</td>\n",
2031 |        "    </tr>\n",
2032 |        "    <tr>\n",
2033 |        "      <th>12351</th>\n",
2034 |        "      <td>1</td>\n",
2035 |        "    </tr>\n",
2036 |        "  </tbody>\n",
2037 |        "</table>\n",
2038 |        "</div>"
2039 |       ],
2040 |       "text/plain": [
2041 |        "             Invoice\n",
2042 |        "Customer ID         \n",
2043 |        "12346             15\n",
2044 |        "12347              2\n",
2045 |        "12348              1\n",
2046 |        "12349              4\n",
2047 |        "12351              1"
2048 |       ]
2049 |      },
2050 |      "execution_count": 39,
2051 |      "metadata": {},
2052 |      "output_type": "execute_result"
2053 |     }
2054 |    ],
2055 |    "source": [
2056 |     "temp_df.groupby(\"Customer ID\").agg({\"Invoice\":\"count\"}).head()"
2057 |    ]
2058 |   },
2059 |   {
2060 |    "cell_type": "code",
2061 |    "execution_count": 40,
2062 |    "metadata": {},
2063 |    "outputs": [
2064 |     {
2065 |      "data": {
2066 |       "text/html": [
2067 |        "<div>\n",
2068 |        "<style scoped>\n",
2069 |        "    .dataframe tbody tr th:only-of-type {\n",
2070 |        "        vertical-align: middle;\n",
2071 |        "    }\n",
2072 |        "\n",
2073 |        "    .dataframe tbody tr th {\n",
2074 |        "        vertical-align: top;\n",
2075 |        "    }\n",
2076 |        "\n",
2077 |        "    .dataframe thead th {\n",
2078 |        "        text-align: right;\n",
2079 |        "    }\n",
2080 |        "</style>\n",
2081 |        "<table border=\"1\" class=\"dataframe\">\n",
2082 |        "  <thead>\n",
2083 |        "    <tr style=\"text-align: right;\">\n",
2084 |        "      <th></th>\n",
2085 |        "      <th>Frequency</th>\n",
2086 |        "    </tr>\n",
2087 |        "    <tr>\n",
2088 |        "      <th>Customer ID</th>\n",
2089 |        "      <th></th>\n",
2090 |        "    </tr>\n",
2091 |        "  </thead>\n",
2092 |        "  <tbody>\n",
2093 |        "    <tr>\n",
2094 |        "      <th>12346</th>\n",
2095 |        "      <td>46</td>\n",
2096 |        "    </tr>\n",
2097 |        "    <tr>\n",
2098 |        "      <th>12347</th>\n",
2099 |        "      <td>71</td>\n",
2100 |        "    </tr>\n",
2101 |        "    <tr>\n",
2102 |        "      <th>12348</th>\n",
2103 |        "      <td>20</td>\n",
2104 |        "    </tr>\n",
2105 |        "    <tr>\n",
2106 |        "      <th>12349</th>\n",
2107 |        "      <td>107</td>\n",
2108 |        "    </tr>\n",
2109 |        "    <tr>\n",
2110 |        "      <th>12351</th>\n",
2111 |        "      <td>21</td>\n",
2112 |        "    </tr>\n",
2113 |        "  </tbody>\n",
2114 |        "</table>\n",
2115 |        "</div>"
2116 |       ],
2117 |       "text/plain": [
2118 |        "             Frequency\n",
2119 |        "Customer ID           \n",
2120 |        "12346               46\n",
2121 |        "12347               71\n",
2122 |        "12348               20\n",
2123 |        "12349              107\n",
2124 |        "12351               21"
2125 |       ]
2126 |      },
2127 |      "execution_count": 40,
2128 |      "metadata": {},
2129 |      "output_type": "execute_result"
2130 |     }
2131 |    ],
2132 |    "source": [
2133 |     "freq_df = temp_df.groupby(\"Customer ID\").agg({\"Invoice\":\"sum\"})\n",
2134 |     "freq_df.rename(columns={\"Invoice\": \"Frequency\"}, inplace = True)\n",
2135 |     "freq_df.head()"
2136 |    ]
2137 |   },
2138 |   {
2139 |    "cell_type": "markdown",
2140 |    "execution_count": null,
2141 |    "metadata": {},
2142 |    "source": [
2143 |     "# Monetary"
2144 |    ]
2145 |   },
2146 |   {
2147 |    "cell_type": "code",
2148 |    "execution_count": 41,
2149 |    "metadata": {},
2150 |    "outputs": [],
2151 |    "source": [
2152 |     "monetary_df = df.groupby(\"Customer ID\").agg({\"TotalPrice\":\"sum\"})"
2153 |    ]
2154 |   },
2155 |   {
2156 |    "cell_type": "code",
2157 |    "execution_count": 42,
2158 |    "metadata": {},
2159 |    "outputs": [
2160 |     {
2161 |      "data": {
2162 |       "text/html": [
2163 |        "<div>\n",
2164 |        "<style scoped>\n",
2165 |        "    .dataframe tbody tr th:only-of-type {\n",
2166 |        "        vertical-align: middle;\n",
2167 |        "    }\n",
2168 |        "\n",
2169 |        "    .dataframe tbody tr th {\n",
2170 |        "        vertical-align: top;\n",
2171 |        "    }\n",
2172 |        "\n",
2173 |        "    .dataframe thead th {\n",
2174 |        "        text-align: right;\n",
2175 |        "    }\n",
2176 |        "</style>\n",
2177 |        "<table border=\"1\" class=\"dataframe\">\n",
2178 |        "  <thead>\n",
2179 |        "    <tr style=\"text-align: right;\">\n",
2180 |        "      <th></th>\n",
2181 |        "      <th>TotalPrice</th>\n",
2182 |        "    </tr>\n",
2183 |        "    <tr>\n",
2184 |        "      <th>Customer ID</th>\n",
2185 |        "      <th></th>\n",
2186 |        "    </tr>\n",
2187 |        "  </thead>\n",
2188 |        "  <tbody>\n",
2189 |        "    <tr>\n",
2190 |        "      <th>12346</th>\n",
2191 |        "      <td>-65</td>\n",
2192 |        "    </tr>\n",
2193 |        "    <tr>\n",
2194 |        "      <th>12347</th>\n",
2195 |        "      <td>1323</td>\n",
2196 |        "    </tr>\n",
2197 |        "    <tr>\n",
2198 |        "      <th>12348</th>\n",
2199 |        "      <td>222</td>\n",
2200 |        "    </tr>\n",
2201 |        "    <tr>\n",
2202 |        "      <th>12349</th>\n",
2203 |        "      <td>2647</td>\n",
2204 |        "    </tr>\n",
2205 |        "    <tr>\n",
2206 |        "      <th>12351</th>\n",
2207 |        "      <td>301</td>\n",
2208 |        "    </tr>\n",
2209 |        "  </tbody>\n",
2210 |        "</table>\n",
2211 |        "</div>"
2212 |       ],
2213 |       "text/plain": [
2214 |        "             TotalPrice\n",
2215 |        "Customer ID            \n",
2216 |        "12346               -65\n",
2217 |        "12347              1323\n",
2218 |        "12348               222\n",
2219 |        "12349              2647\n",
2220 |        "12351               301"
2221 |       ]
2222 |      },
2223 |      "execution_count": 42,
2224 |      "metadata": {},
2225 |      "output_type": "execute_result"
2226 |     }
2227 |    ],
2228 |    "source": [
2229 |     "monetary_df.head()"
2230 |    ]
2231 |   },
2232 |   {
2233 |    "cell_type": "code",
2234 |    "execution_count": 43,
2235 |    "metadata": {},
2236 |    "outputs": [],
2237 |    "source": [
2238 |     "# lets change names\n",
2239 |     "\n",
2240 |     "monetary_df.rename(columns={\"TotalPrice\": \"Monetary\"}, inplace = True)"
2241 |    ]
2242 |   },
2243 |   {
2244 |    "cell_type": "code",
2245 |    "execution_count": 44,
2246 |    "metadata": {},
2247 |    "outputs": [
2248 |     {
2249 |      "name": "stdout",
2250 |      "output_type": "stream",
2251 |      "text": [
2252 |       "(4383,) (4383, 1) (4383, 1)\n"
2253 |      ]
2254 |     }
2255 |    ],
2256 |    "source": [
2257 |     "print(recency_df.shape,freq_df.shape,monetary_df.shape)"
2258 |    ]
2259 |   },
2260 |   {
2261 |    "cell_type": "code",
2262 |    "execution_count": 45,
2263 |    "metadata": {},
2264 |    "outputs": [],
2265 |    "source": [
2266 |     "rfm = pd.concat([recency_df, freq_df, monetary_df],  axis=1)"
2267 |    ]
2268 |   },
2269 |   {
2270 |    "cell_type": "code",
2271 |    "execution_count": 46,
2272 |    "metadata": {},
2273 |    "outputs": [
2274 |     {
2275 |      "data": {
2276 |       "text/html": [
2277 |        "<div>\n",
2278 |        "<style scoped>\n",
2279 |        "    .dataframe tbody tr th:only-of-type {\n",
2280 |        "        vertical-align: middle;\n",
2281 |        "    }\n",
2282 |        "\n",
2283 |        "    .dataframe tbody tr th {\n",
2284 |        "        vertical-align: top;\n",
2285 |        "    }\n",
2286 |        "\n",
2287 |        "    .dataframe thead th {\n",
2288 |        "        text-align: right;\n",
2289 |        "    }\n",
2290 |        "</style>\n",
2291 |        "<table border=\"1\" class=\"dataframe\">\n",
2292 |        "  <thead>\n",
2293 |        "    <tr style=\"text-align: right;\">\n",
2294 |        "      <th></th>\n",
2295 |        "      <th>Recency</th>\n",
2296 |        "      <th>Frequency</th>\n",
2297 |        "      <th>Monetary</th>\n",
2298 |        "    </tr>\n",
2299 |        "    <tr>\n",
2300 |        "      <th>Customer ID</th>\n",
2301 |        "      <th></th>\n",
2302 |        "      <th></th>\n",
2303 |        "      <th></th>\n",
2304 |        "    </tr>\n",
2305 |        "  </thead>\n",
2306 |        "  <tbody>\n",
2307 |        "    <tr>\n",
2308 |        "      <th>12346</th>\n",
2309 |        "      <td>65</td>\n",
2310 |        "      <td>46</td>\n",
2311 |        "      <td>-65</td>\n",
2312 |        "    </tr>\n",
2313 |        "    <tr>\n",
2314 |        "      <th>12347</th>\n",
2315 |        "      <td>1</td>\n",
2316 |        "      <td>71</td>\n",
2317 |        "      <td>1323</td>\n",
2318 |        "    </tr>\n",
2319 |        "    <tr>\n",
2320 |        "      <th>12348</th>\n",
2321 |        "      <td>72</td>\n",
2322 |        "      <td>20</td>\n",
2323 |        "      <td>222</td>\n",
2324 |        "    </tr>\n",
2325 |        "    <tr>\n",
2326 |        "      <th>12349</th>\n",
2327 |        "      <td>41</td>\n",
2328 |        "      <td>107</td>\n",
2329 |        "      <td>2647</td>\n",
2330 |        "    </tr>\n",
2331 |        "    <tr>\n",
2332 |        "      <th>12351</th>\n",
2333 |        "      <td>9</td>\n",
2334 |        "      <td>21</td>\n",
2335 |        "      <td>301</td>\n",
2336 |        "    </tr>\n",
2337 |        "  </tbody>\n",
2338 |        "</table>\n",
2339 |        "</div>"
2340 |       ],
2341 |       "text/plain": [
2342 |        "             Recency  Frequency  Monetary\n",
2343 |        "Customer ID                              \n",
2344 |        "12346             65         46       -65\n",
2345 |        "12347              1         71      1323\n",
2346 |        "12348             72         20       222\n",
2347 |        "12349             41        107      2647\n",
2348 |        "12351              9         21       301"
2349 |       ]
2350 |      },
2351 |      "execution_count": 46,
2352 |      "metadata": {},
2353 |      "output_type": "execute_result"
2354 |     }
2355 |    ],
2356 |    "source": [
2357 |     "rfm.head()"
2358 |    ]
2359 |   },
2360 |   {
2361 |    "cell_type": "markdown",
2362 |    "execution_count": null,
2363 |    "metadata": {},
2364 |    "source": [
2365 |     "## Now, we need to score according to the most recent (Recency), the cyclic (Frequency) and the monetary expenditure (Monetary)."
2366 |    ]
2367 |   },
2368 |   {
2369 |    "cell_type": "markdown",
2370 |    "execution_count": null,
2371 |    "metadata": {},
2372 |    "source": [
2373 |     "## 13. Scoring for RFM\n",
2374 |     "\n",
2375 |     "- Let's start with the last 5 here. Let's use the 'qcut' method to score."
2376 |    ]
2377 |   },
2378 |   {
2379 |    "cell_type": "code",
2380 |    "execution_count": 47,
2381 |    "metadata": {},
2382 |    "outputs": [],
2383 |    "source": [
2384 |     "rfm[\"RecencyScore\"] = pd.qcut(rfm['Recency'], 5, labels = [5, 4, 3, 2, 1])   "
2385 |    ]
2386 |   },
2387 |   {
2388 |    "cell_type": "code",
2389 |    "execution_count": 48,
2390 |    "metadata": {},
2391 |    "outputs": [],
2392 |    "source": [
2393 |     "rfm[\"FrequencyScore\"] = pd.qcut(rfm['Frequency'].rank(method = \"first\"), 5, labels = [1, 2, 3, 4, 5])"
2394 |    ]
2395 |   },
2396 |   {
2397 |    "cell_type": "code",
2398 |    "execution_count": 49,
2399 |    "metadata": {},
2400 |    "outputs": [],
2401 |    "source": [
2402 |     "rfm[\"MonetaryScore\"] = pd.qcut(rfm['Monetary'], 5, labels = [1, 2, 3, 4, 5])"
2403 |    ]
2404 |   },
2405 |   {
2406 |    "cell_type": "code",
2407 |    "execution_count": 50,
2408 |    "metadata": {},
2409 |    "outputs": [
2410 |     {
2411 |      "data": {
2412 |       "text/html": [
2413 |        "<div>\n",
2414 |        "<style scoped>\n",
2415 |        "    .dataframe tbody tr th:only-of-type {\n",
2416 |        "        vertical-align: middle;\n",
2417 |        "    }\n",
2418 |        "\n",
2419 |        "    .dataframe tbody tr th {\n",
2420 |        "        vertical-align: top;\n",
2421 |        "    }\n",
2422 |        "\n",
2423 |        "    .dataframe thead th {\n",
2424 |        "        text-align: right;\n",
2425 |        "    }\n",
2426 |        "</style>\n",
2427 |        "<table border=\"1\" class=\"dataframe\">\n",
2428 |        "  <thead>\n",
2429 |        "    <tr style=\"text-align: right;\">\n",
2430 |        "      <th></th>\n",
2431 |        "      <th>Recency</th>\n",
2432 |        "      <th>Frequency</th>\n",
2433 |        "      <th>Monetary</th>\n",
2434 |        "      <th>RecencyScore</th>\n",
2435 |        "      <th>FrequencyScore</th>\n",
2436 |        "      <th>MonetaryScore</th>\n",
2437 |        "    </tr>\n",
2438 |        "    <tr>\n",
2439 |        "      <th>Customer ID</th>\n",
2440 |        "      <th></th>\n",
2441 |        "      <th></th>\n",
2442 |        "      <th></th>\n",
2443 |        "      <th></th>\n",
2444 |        "      <th></th>\n",
2445 |        "      <th></th>\n",
2446 |        "    </tr>\n",
2447 |        "  </thead>\n",
2448 |        "  <tbody>\n",
2449 |        "    <tr>\n",
2450 |        "      <th>12346</th>\n",
2451 |        "      <td>65</td>\n",
2452 |        "      <td>46</td>\n",
2453 |        "      <td>-65</td>\n",
2454 |        "      <td>3</td>\n",
2455 |        "      <td>3</td>\n",
2456 |        "      <td>1</td>\n",
2457 |        "    </tr>\n",
2458 |        "    <tr>\n",
2459 |        "      <th>12347</th>\n",
2460 |        "      <td>1</td>\n",
2461 |        "      <td>71</td>\n",
2462 |        "      <td>1323</td>\n",
2463 |        "      <td>5</td>\n",
2464 |        "      <td>4</td>\n",
2465 |        "      <td>4</td>\n",
2466 |        "    </tr>\n",
2467 |        "    <tr>\n",
2468 |        "      <th>12348</th>\n",
2469 |        "      <td>72</td>\n",
2470 |        "      <td>20</td>\n",
2471 |        "      <td>222</td>\n",
2472 |        "      <td>2</td>\n",
2473 |        "      <td>2</td>\n",
2474 |        "      <td>1</td>\n",
2475 |        "    </tr>\n",
2476 |        "    <tr>\n",
2477 |        "      <th>12349</th>\n",
2478 |        "      <td>41</td>\n",
2479 |        "      <td>107</td>\n",
2480 |        "      <td>2647</td>\n",
2481 |        "      <td>3</td>\n",
2482 |        "      <td>4</td>\n",
2483 |        "      <td>5</td>\n",
2484 |        "    </tr>\n",
2485 |        "    <tr>\n",
2486 |        "      <th>12351</th>\n",
2487 |        "      <td>9</td>\n",
2488 |        "      <td>21</td>\n",
2489 |        "      <td>301</td>\n",
2490 |        "      <td>5</td>\n",
2491 |        "      <td>2</td>\n",
2492 |        "      <td>2</td>\n",
2493 |        "    </tr>\n",
2494 |        "  </tbody>\n",
2495 |        "</table>\n",
2496 |        "</div>"
2497 |       ],
2498 |       "text/plain": [
2499 |        "             Recency  Frequency  Monetary RecencyScore FrequencyScore  \\\n",
2500 |        "Customer ID                                                             \n",
2501 |        "12346             65         46       -65            3              3   \n",
2502 |        "12347              1         71      1323            5              4   \n",
2503 |        "12348             72         20       222            2              2   \n",
2504 |        "12349             41        107      2647            3              4   \n",
2505 |        "12351              9         21       301            5              2   \n",
2506 |        "\n",
2507 |        "            MonetaryScore  \n",
2508 |        "Customer ID                \n",
2509 |        "12346                   1  \n",
2510 |        "12347                   4  \n",
2511 |        "12348                   1  \n",
2512 |        "12349                   5  \n",
2513 |        "12351                   2  "
2514 |       ]
2515 |      },
2516 |      "execution_count": 50,
2517 |      "metadata": {},
2518 |      "output_type": "execute_result"
2519 |     }
2520 |    ],
2521 |    "source": [
2522 |     "rfm.head()"
2523 |    ]
2524 |   },
2525 |   {
2526 |    "cell_type": "markdown",
2527 |    "execution_count": null,
2528 |    "metadata": {},
2529 |    "source": [
2530 |     "Let's write code with RFM values side by side"
2531 |    ]
2532 |   },
2533 |   {
2534 |    "cell_type": "code",
2535 |    "execution_count": 51,
2536 |    "metadata": {},
2537 |    "outputs": [
2538 |     {
2539 |      "data": {
2540 |       "text/plain": [
2541 |        "Customer ID\n",
2542 |        "12346    331\n",
2543 |        "12347    544\n",
2544 |        "12348    221\n",
2545 |        "12349    345\n",
2546 |        "12351    522\n",
2547 |        "dtype: object"
2548 |       ]
2549 |      },
2550 |      "execution_count": 51,
2551 |      "metadata": {},
2552 |      "output_type": "execute_result"
2553 |     }
2554 |    ],
2555 |    "source": [
2556 |     "(rfm['RecencyScore'].astype(str) + \n",
2557 |     " rfm['FrequencyScore'].astype(str) + \n",
2558 |     " rfm['MonetaryScore'].astype(str)).head()"
2559 |    ]
2560 |   },
2561 |   {
2562 |    "cell_type": "code",
2563 |    "execution_count": 52,
2564 |    "metadata": {},
2565 |    "outputs": [],
2566 |    "source": [
2567 |     "rfm[\"RFM_SCORE\"] = rfm['RecencyScore'].astype(str) + rfm['FrequencyScore'].astype(str) + rfm['MonetaryScore'].astype(str)"
2568 |    ]
2569 |   },
2570 |   {
2571 |    "cell_type": "code",
2572 |    "execution_count": 53,
2573 |    "metadata": {},
2574 |    "outputs": [
2575 |     {
2576 |      "data": {
2577 |       "text/html": [
2578 |        "<div>\n",
2579 |        "<style scoped>\n",
2580 |        "    .dataframe tbody tr th:only-of-type {\n",
2581 |        "        vertical-align: middle;\n",
2582 |        "    }\n",
2583 |        "\n",
2584 |        "    .dataframe tbody tr th {\n",
2585 |        "        vertical-align: top;\n",
2586 |        "    }\n",
2587 |        "\n",
2588 |        "    .dataframe thead th {\n",
2589 |        "        text-align: right;\n",
2590 |        "    }\n",
2591 |        "</style>\n",
2592 |        "<table border=\"1\" class=\"dataframe\">\n",
2593 |        "  <thead>\n",
2594 |        "    <tr style=\"text-align: right;\">\n",
2595 |        "      <th></th>\n",
2596 |        "      <th>Recency</th>\n",
2597 |        "      <th>Frequency</th>\n",
2598 |        "      <th>Monetary</th>\n",
2599 |        "      <th>RecencyScore</th>\n",
2600 |        "      <th>FrequencyScore</th>\n",
2601 |        "      <th>MonetaryScore</th>\n",
2602 |        "      <th>RFM_SCORE</th>\n",
2603 |        "    </tr>\n",
2604 |        "    <tr>\n",
2605 |        "      <th>Customer ID</th>\n",
2606 |        "      <th></th>\n",
2607 |        "      <th></th>\n",
2608 |        "      <th></th>\n",
2609 |        "      <th></th>\n",
2610 |        "      <th></th>\n",
2611 |        "      <th></th>\n",
2612 |        "      <th></th>\n",
2613 |        "    </tr>\n",
2614 |        "  </thead>\n",
2615 |        "  <tbody>\n",
2616 |        "    <tr>\n",
2617 |        "      <th>12346</th>\n",
2618 |        "      <td>65</td>\n",
2619 |        "      <td>46</td>\n",
2620 |        "      <td>-65</td>\n",
2621 |        "      <td>3</td>\n",
2622 |        "      <td>3</td>\n",
2623 |        "      <td>1</td>\n",
2624 |        "      <td>331</td>\n",
2625 |        "    </tr>\n",
2626 |        "    <tr>\n",
2627 |        "      <th>12347</th>\n",
2628 |        "      <td>1</td>\n",
2629 |        "      <td>71</td>\n",
2630 |        "      <td>1323</td>\n",
2631 |        "      <td>5</td>\n",
2632 |        "      <td>4</td>\n",
2633 |        "      <td>4</td>\n",
2634 |        "      <td>544</td>\n",
2635 |        "    </tr>\n",
2636 |        "    <tr>\n",
2637 |        "      <th>12348</th>\n",
2638 |        "      <td>72</td>\n",
2639 |        "      <td>20</td>\n",
2640 |        "      <td>222</td>\n",
2641 |        "      <td>2</td>\n",
2642 |        "      <td>2</td>\n",
2643 |        "      <td>1</td>\n",
2644 |        "      <td>221</td>\n",
2645 |        "    </tr>\n",
2646 |        "    <tr>\n",
2647 |        "      <th>12349</th>\n",
2648 |        "      <td>41</td>\n",
2649 |        "      <td>107</td>\n",
2650 |        "      <td>2647</td>\n",
2651 |        "      <td>3</td>\n",
2652 |        "      <td>4</td>\n",
2653 |        "      <td>5</td>\n",
2654 |        "      <td>345</td>\n",
2655 |        "    </tr>\n",
2656 |        "    <tr>\n",
2657 |        "      <th>12351</th>\n",
2658 |        "      <td>9</td>\n",
2659 |        "      <td>21</td>\n",
2660 |        "      <td>301</td>\n",
2661 |        "      <td>5</td>\n",
2662 |        "      <td>2</td>\n",
2663 |        "      <td>2</td>\n",
2664 |        "      <td>522</td>\n",
2665 |        "    </tr>\n",
2666 |        "  </tbody>\n",
2667 |        "</table>\n",
2668 |        "</div>"
2669 |       ],
2670 |       "text/plain": [
2671 |        "             Recency  Frequency  Monetary RecencyScore FrequencyScore  \\\n",
2672 |        "Customer ID                                                             \n",
2673 |        "12346             65         46       -65            3              3   \n",
2674 |        "12347              1         71      1323            5              4   \n",
2675 |        "12348             72         20       222            2              2   \n",
2676 |        "12349             41        107      2647            3              4   \n",
2677 |        "12351              9         21       301            5              2   \n",
2678 |        "\n",
2679 |        "            MonetaryScore RFM_SCORE  \n",
2680 |        "Customer ID                          \n",
2681 |        "12346                   1       331  \n",
2682 |        "12347                   4       544  \n",
2683 |        "12348                   1       221  \n",
2684 |        "12349                   5       345  \n",
2685 |        "12351                   2       522  "
2686 |       ]
2687 |      },
2688 |      "execution_count": 53,
2689 |      "metadata": {},
2690 |      "output_type": "execute_result"
2691 |     }
2692 |    ],
2693 |    "source": [
2694 |     "rfm.head()"
2695 |    ]
2696 |   },
2697 |   {
2698 |    "cell_type": "code",
2699 |    "execution_count": 54,
2700 |    "metadata": {},
2701 |    "outputs": [
2702 |     {
2703 |      "data": {
2704 |       "text/html": [
2705 |        "<div>\n",
2706 |        "<style scoped>\n",
2707 |        "    .dataframe tbody tr th:only-of-type {\n",
2708 |        "        vertical-align: middle;\n",
2709 |        "    }\n",
2710 |        "\n",
2711 |        "    .dataframe tbody tr th {\n",
2712 |        "        vertical-align: top;\n",
2713 |        "    }\n",
2714 |        "\n",
2715 |        "    .dataframe thead th {\n",
2716 |        "        text-align: right;\n",
2717 |        "    }\n",
2718 |        "</style>\n",
2719 |        "<table border=\"1\" class=\"dataframe\">\n",
2720 |        "  <thead>\n",
2721 |        "    <tr style=\"text-align: right;\">\n",
2722 |        "      <th></th>\n",
2723 |        "      <th>count</th>\n",
2724 |        "      <th>mean</th>\n",
2725 |        "      <th>std</th>\n",
2726 |        "      <th>min</th>\n",
2727 |        "      <th>25%</th>\n",
2728 |        "      <th>50%</th>\n",
2729 |        "      <th>75%</th>\n",
2730 |        "      <th>max</th>\n",
2731 |        "    </tr>\n",
2732 |        "  </thead>\n",
2733 |        "  <tbody>\n",
2734 |        "    <tr>\n",
2735 |        "      <th>Recency</th>\n",
2736 |        "      <td>4383</td>\n",
2737 |        "      <td>89</td>\n",
2738 |        "      <td>98</td>\n",
2739 |        "      <td>-1</td>\n",
2740 |        "      <td>15</td>\n",
2741 |        "      <td>50</td>\n",
2742 |        "      <td>136</td>\n",
2743 |        "      <td>372</td>\n",
2744 |        "    </tr>\n",
2745 |        "    <tr>\n",
2746 |        "      <th>Frequency</th>\n",
2747 |        "      <td>4383</td>\n",
2748 |        "      <td>95</td>\n",
2749 |        "      <td>205</td>\n",
2750 |        "      <td>1</td>\n",
2751 |        "      <td>18</td>\n",
2752 |        "      <td>44</td>\n",
2753 |        "      <td>103</td>\n",
2754 |        "      <td>5710</td>\n",
2755 |        "    </tr>\n",
2756 |        "    <tr>\n",
2757 |        "      <th>Monetary</th>\n",
2758 |        "      <td>4383</td>\n",
2759 |        "      <td>1905</td>\n",
2760 |        "      <td>8519</td>\n",
2761 |        "      <td>-25111</td>\n",
2762 |        "      <td>285</td>\n",
2763 |        "      <td>656</td>\n",
2764 |        "      <td>1646</td>\n",
2765 |        "      <td>341777</td>\n",
2766 |        "    </tr>\n",
2767 |        "  </tbody>\n",
2768 |        "</table>\n",
2769 |        "</div>"
2770 |       ],
2771 |       "text/plain": [
2772 |        "           count  mean  std    min  25%  50%  75%    max\n",
2773 |        "Recency     4383    89   98     -1   15   50  136    372\n",
2774 |        "Frequency   4383    95  205      1   18   44  103   5710\n",
2775 |        "Monetary    4383  1905 8519 -25111  285  656 1646 341777"
2776 |       ]
2777 |      },
2778 |      "execution_count": 54,
2779 |      "metadata": {},
2780 |      "output_type": "execute_result"
2781 |     }
2782 |    ],
2783 |    "source": [
2784 |     "rfm.describe().T"
2785 |    ]
2786 |   },
2787 |   {
2788 |    "cell_type": "markdown",
2789 |    "execution_count": null,
2790 |    "metadata": {},
2791 |    "source": [
2792 |     "If we rank 5 points out of 3, 555 are champions."
2793 |    ]
2794 |   },
2795 |   {
2796 |    "cell_type": "code",
2797 |    "execution_count": 55,
2798 |    "metadata": {},
2799 |    "outputs": [
2800 |     {
2801 |      "data": {
2802 |       "text/html": [
2803 |        "<div>\n",
2804 |        "<style scoped>\n",
2805 |        "    .dataframe tbody tr th:only-of-type {\n",
2806 |        "        vertical-align: middle;\n",
2807 |        "    }\n",
2808 |        "\n",
2809 |        "    .dataframe tbody tr th {\n",
2810 |        "        vertical-align: top;\n",
2811 |        "    }\n",
2812 |        "\n",
2813 |        "    .dataframe thead th {\n",
2814 |        "        text-align: right;\n",
2815 |        "    }\n",
2816 |        "</style>\n",
2817 |        "<table border=\"1\" class=\"dataframe\">\n",
2818 |        "  <thead>\n",
2819 |        "    <tr style=\"text-align: right;\">\n",
2820 |        "      <th></th>\n",
2821 |        "      <th>Recency</th>\n",
2822 |        "      <th>Frequency</th>\n",
2823 |        "      <th>Monetary</th>\n",
2824 |        "      <th>RecencyScore</th>\n",
2825 |        "      <th>FrequencyScore</th>\n",
2826 |        "      <th>MonetaryScore</th>\n",
2827 |        "      <th>RFM_SCORE</th>\n",
2828 |        "    </tr>\n",
2829 |        "    <tr>\n",
2830 |        "      <th>Customer ID</th>\n",
2831 |        "      <th></th>\n",
2832 |        "      <th></th>\n",
2833 |        "      <th></th>\n",
2834 |        "      <th></th>\n",
2835 |        "      <th></th>\n",
2836 |        "      <th></th>\n",
2837 |        "      <th></th>\n",
2838 |        "    </tr>\n",
2839 |        "  </thead>\n",
2840 |        "  <tbody>\n",
2841 |        "    <tr>\n",
2842 |        "      <th>12415</th>\n",
2843 |        "      <td>9</td>\n",
2844 |        "      <td>212</td>\n",
2845 |        "      <td>19544</td>\n",
2846 |        "      <td>5</td>\n",
2847 |        "      <td>5</td>\n",
2848 |        "      <td>5</td>\n",
2849 |        "      <td>555</td>\n",
2850 |        "    </tr>\n",
2851 |        "    <tr>\n",
2852 |        "      <th>12431</th>\n",
2853 |        "      <td>7</td>\n",
2854 |        "      <td>173</td>\n",
2855 |        "      <td>4303</td>\n",
2856 |        "      <td>5</td>\n",
2857 |        "      <td>5</td>\n",
2858 |        "      <td>5</td>\n",
2859 |        "      <td>555</td>\n",
2860 |        "    </tr>\n",
2861 |        "    <tr>\n",
2862 |        "      <th>12433</th>\n",
2863 |        "      <td>0</td>\n",
2864 |        "      <td>287</td>\n",
2865 |        "      <td>7053</td>\n",
2866 |        "      <td>5</td>\n",
2867 |        "      <td>5</td>\n",
2868 |        "      <td>5</td>\n",
2869 |        "      <td>555</td>\n",
2870 |        "    </tr>\n",
2871 |        "    <tr>\n",
2872 |        "      <th>12471</th>\n",
2873 |        "      <td>6</td>\n",
2874 |        "      <td>767</td>\n",
2875 |        "      <td>19208</td>\n",
2876 |        "      <td>5</td>\n",
2877 |        "      <td>5</td>\n",
2878 |        "      <td>5</td>\n",
2879 |        "      <td>555</td>\n",
2880 |        "    </tr>\n",
2881 |        "    <tr>\n",
2882 |        "      <th>12472</th>\n",
2883 |        "      <td>3</td>\n",
2884 |        "      <td>658</td>\n",
2885 |        "      <td>10727</td>\n",
2886 |        "      <td>5</td>\n",
2887 |        "      <td>5</td>\n",
2888 |        "      <td>5</td>\n",
2889 |        "      <td>555</td>\n",
2890 |        "    </tr>\n",
2891 |        "  </tbody>\n",
2892 |        "</table>\n",
2893 |        "</div>"
2894 |       ],
2895 |       "text/plain": [
2896 |        "             Recency  Frequency  Monetary RecencyScore FrequencyScore  \\\n",
2897 |        "Customer ID                                                             \n",
2898 |        "12415              9        212     19544            5              5   \n",
2899 |        "12431              7        173      4303            5              5   \n",
2900 |        "12433              0        287      7053            5              5   \n",
2901 |        "12471              6        767     19208            5              5   \n",
2902 |        "12472              3        658     10727            5              5   \n",
2903 |        "\n",
2904 |        "            MonetaryScore RFM_SCORE  \n",
2905 |        "Customer ID                          \n",
2906 |        "12415                   5       555  \n",
2907 |        "12431                   5       555  \n",
2908 |        "12433                   5       555  \n",
2909 |        "12471                   5       555  \n",
2910 |        "12472                   5       555  "
2911 |       ]
2912 |      },
2913 |      "execution_count": 55,
2914 |      "metadata": {},
2915 |      "output_type": "execute_result"
2916 |     }
2917 |    ],
2918 |    "source": [
2919 |     "rfm[rfm[\"RFM_SCORE\"] == \"555\"].head()"
2920 |    ]
2921 |   },
2922 |   {
2923 |    "cell_type": "markdown",
2924 |    "execution_count": null,
2925 |    "metadata": {},
2926 |    "source": [
2927 |     "If we rank 1 point out of 3, that is, 111 ones are the lowest."
2928 |    ]
2929 |   },
2930 |   {
2931 |    "cell_type": "code",
2932 |    "execution_count": 56,
2933 |    "metadata": {},
2934 |    "outputs": [
2935 |     {
2936 |      "data": {
2937 |       "text/html": [
2938 |        "<div>\n",
2939 |        "<style scoped>\n",
2940 |        "    .dataframe tbody tr th:only-of-type {\n",
2941 |        "        vertical-align: middle;\n",
2942 |        "    }\n",
2943 |        "\n",
2944 |        "    .dataframe tbody tr th {\n",
2945 |        "        vertical-align: top;\n",
2946 |        "    }\n",
2947 |        "\n",
2948 |        "    .dataframe thead th {\n",
2949 |        "        text-align: right;\n",
2950 |        "    }\n",
2951 |        "</style>\n",
2952 |        "<table border=\"1\" class=\"dataframe\">\n",
2953 |        "  <thead>\n",
2954 |        "    <tr style=\"text-align: right;\">\n",
2955 |        "      <th></th>\n",
2956 |        "      <th>Recency</th>\n",
2957 |        "      <th>Frequency</th>\n",
2958 |        "      <th>Monetary</th>\n",
2959 |        "      <th>RecencyScore</th>\n",
2960 |        "      <th>FrequencyScore</th>\n",
2961 |        "      <th>MonetaryScore</th>\n",
2962 |        "      <th>RFM_SCORE</th>\n",
2963 |        "    </tr>\n",
2964 |        "    <tr>\n",
2965 |        "      <th>Customer ID</th>\n",
2966 |        "      <th></th>\n",
2967 |        "      <th></th>\n",
2968 |        "      <th></th>\n",
2969 |        "      <th></th>\n",
2970 |        "      <th></th>\n",
2971 |        "      <th></th>\n",
2972 |        "      <th></th>\n",
2973 |        "    </tr>\n",
2974 |        "  </thead>\n",
2975 |        "  <tbody>\n",
2976 |        "    <tr>\n",
2977 |        "      <th>12362</th>\n",
2978 |        "      <td>372</td>\n",
2979 |        "      <td>1</td>\n",
2980 |        "      <td>130</td>\n",
2981 |        "      <td>1</td>\n",
2982 |        "      <td>1</td>\n",
2983 |        "      <td>1</td>\n",
2984 |        "      <td>111</td>\n",
2985 |        "    </tr>\n",
2986 |        "    <tr>\n",
2987 |        "      <th>12382</th>\n",
2988 |        "      <td>316</td>\n",
2989 |        "      <td>1</td>\n",
2990 |        "      <td>-18</td>\n",
2991 |        "      <td>1</td>\n",
2992 |        "      <td>1</td>\n",
2993 |        "      <td>1</td>\n",
2994 |        "      <td>111</td>\n",
2995 |        "    </tr>\n",
2996 |        "    <tr>\n",
2997 |        "      <th>12404</th>\n",
2998 |        "      <td>316</td>\n",
2999 |        "      <td>1</td>\n",
3000 |        "      <td>63</td>\n",
3001 |        "      <td>1</td>\n",
3002 |        "      <td>1</td>\n",
3003 |        "      <td>1</td>\n",
3004 |        "      <td>111</td>\n",
3005 |        "    </tr>\n",
3006 |        "    <tr>\n",
3007 |        "      <th>12416</th>\n",
3008 |        "      <td>290</td>\n",
3009 |        "      <td>11</td>\n",
3010 |        "      <td>203</td>\n",
3011 |        "      <td>1</td>\n",
3012 |        "      <td>1</td>\n",
3013 |        "      <td>1</td>\n",
3014 |        "      <td>111</td>\n",
3015 |        "    </tr>\n",
3016 |        "    <tr>\n",
3017 |        "      <th>12466</th>\n",
3018 |        "      <td>316</td>\n",
3019 |        "      <td>1</td>\n",
3020 |        "      <td>57</td>\n",
3021 |        "      <td>1</td>\n",
3022 |        "      <td>1</td>\n",
3023 |        "      <td>1</td>\n",
3024 |        "      <td>111</td>\n",
3025 |        "    </tr>\n",
3026 |        "  </tbody>\n",
3027 |        "</table>\n",
3028 |        "</div>"
3029 |       ],
3030 |       "text/plain": [
3031 |        "             Recency  Frequency  Monetary RecencyScore FrequencyScore  \\\n",
3032 |        "Customer ID                                                             \n",
3033 |        "12362            372          1       130            1              1   \n",
3034 |        "12382            316          1       -18            1              1   \n",
3035 |        "12404            316          1        63            1              1   \n",
3036 |        "12416            290         11       203            1              1   \n",
3037 |        "12466            316          1        57            1              1   \n",
3038 |        "\n",
3039 |        "            MonetaryScore RFM_SCORE  \n",
3040 |        "Customer ID                          \n",
3041 |        "12362                   1       111  \n",
3042 |        "12382                   1       111  \n",
3043 |        "12404                   1       111  \n",
3044 |        "12416                   1       111  \n",
3045 |        "12466                   1       111  "
3046 |       ]
3047 |      },
3048 |      "execution_count": 56,
3049 |      "metadata": {},
3050 |      "output_type": "execute_result"
3051 |     }
3052 |    ],
3053 |    "source": [
3054 |     "rfm[rfm[\"RFM_SCORE\"] == \"111\"].head()"
3055 |    ]
3056 |   },
3057 |   {
3058 |    "cell_type": "markdown",
3059 |    "execution_count": null,
3060 |    "metadata": {},
3061 |    "source": [
3062 |     "Let's do regex segmentation. With the help of regex, we will set rfm aside and consider r and f.\n",
3063 |     "\n",
3064 |     "Example: If you see 1-2 in r and 1-2 in f, write 'Hibernating'"
3065 |    ]
3066 |   },
3067 |   {
3068 |    "cell_type": "code",
3069 |    "execution_count": 57,
3070 |    "metadata": {},
3071 |    "outputs": [],
3072 |    "source": [
3073 |     "seg_map = {\n",
3074 |     "    r'[1-2][1-2]': 'Hibernating',\n",
3075 |     "    r'[1-2][3-4]': 'At Risk',\n",
3076 |     "    r'[1-2]5': 'Can\\'t Loose',\n",
3077 |     "    r'3[1-2]': 'About to Sleep',\n",
3078 |     "    r'33': 'Need Attention',\n",
3079 |     "    r'[3-4][4-5]': 'Loyal Customers',\n",
3080 |     "    r'41': 'Promising',\n",
3081 |     "    r'51': 'New Customers',\n",
3082 |     "    r'[4-5][2-3]': 'Potential Loyalists',\n",
3083 |     "    r'5[4-5]': 'Champions'\n",
3084 |     "}"
3085 |    ]
3086 |   },
3087 |   {
3088 |    "cell_type": "code",
3089 |    "execution_count": 58,
3090 |    "metadata": {},
3091 |    "outputs": [
3092 |     {
3093 |      "data": {
3094 |       "text/html": [
3095 |        "<div>\n",
3096 |        "<style scoped>\n",
3097 |        "    .dataframe tbody tr th:only-of-type {\n",
3098 |        "        vertical-align: middle;\n",
3099 |        "    }\n",
3100 |        "\n",
3101 |        "    .dataframe tbody tr th {\n",
3102 |        "        vertical-align: top;\n",
3103 |        "    }\n",
3104 |        "\n",
3105 |        "    .dataframe thead th {\n",
3106 |        "        text-align: right;\n",
3107 |        "    }\n",
3108 |        "</style>\n",
3109 |        "<table border=\"1\" class=\"dataframe\">\n",
3110 |        "  <thead>\n",
3111 |        "    <tr style=\"text-align: right;\">\n",
3112 |        "      <th></th>\n",
3113 |        "      <th>Recency</th>\n",
3114 |        "      <th>Frequency</th>\n",
3115 |        "      <th>Monetary</th>\n",
3116 |        "      <th>RecencyScore</th>\n",
3117 |        "      <th>FrequencyScore</th>\n",
3118 |        "      <th>MonetaryScore</th>\n",
3119 |        "      <th>RFM_SCORE</th>\n",
3120 |        "      <th>Segment</th>\n",
3121 |        "    </tr>\n",
3122 |        "    <tr>\n",
3123 |        "      <th>Customer ID</th>\n",
3124 |        "      <th></th>\n",
3125 |        "      <th></th>\n",
3126 |        "      <th></th>\n",
3127 |        "      <th></th>\n",
3128 |        "      <th></th>\n",
3129 |        "      <th></th>\n",
3130 |        "      <th></th>\n",
3131 |        "      <th></th>\n",
3132 |        "    </tr>\n",
3133 |        "  </thead>\n",
3134 |        "  <tbody>\n",
3135 |        "    <tr>\n",
3136 |        "      <th>12346</th>\n",
3137 |        "      <td>65</td>\n",
3138 |        "      <td>46</td>\n",
3139 |        "      <td>-65</td>\n",
3140 |        "      <td>3</td>\n",
3141 |        "      <td>3</td>\n",
3142 |        "      <td>1</td>\n",
3143 |        "      <td>331</td>\n",
3144 |        "      <td>Need Attention</td>\n",
3145 |        "    </tr>\n",
3146 |        "    <tr>\n",
3147 |        "      <th>12347</th>\n",
3148 |        "      <td>1</td>\n",
3149 |        "      <td>71</td>\n",
3150 |        "      <td>1323</td>\n",
3151 |        "      <td>5</td>\n",
3152 |        "      <td>4</td>\n",
3153 |        "      <td>4</td>\n",
3154 |        "      <td>544</td>\n",
3155 |        "      <td>Champions</td>\n",
3156 |        "    </tr>\n",
3157 |        "    <tr>\n",
3158 |        "      <th>12348</th>\n",
3159 |        "      <td>72</td>\n",
3160 |        "      <td>20</td>\n",
3161 |        "      <td>222</td>\n",
3162 |        "      <td>2</td>\n",
3163 |        "      <td>2</td>\n",
3164 |        "      <td>1</td>\n",
3165 |        "      <td>221</td>\n",
3166 |        "      <td>Hibernating</td>\n",
3167 |        "    </tr>\n",
3168 |        "    <tr>\n",
3169 |        "      <th>12349</th>\n",
3170 |        "      <td>41</td>\n",
3171 |        "      <td>107</td>\n",
3172 |        "      <td>2647</td>\n",
3173 |        "      <td>3</td>\n",
3174 |        "      <td>4</td>\n",
3175 |        "      <td>5</td>\n",
3176 |        "      <td>345</td>\n",
3177 |        "      <td>Loyal Customers</td>\n",
3178 |        "    </tr>\n",
3179 |        "    <tr>\n",
3180 |        "      <th>12351</th>\n",
3181 |        "      <td>9</td>\n",
3182 |        "      <td>21</td>\n",
3183 |        "      <td>301</td>\n",
3184 |        "      <td>5</td>\n",
3185 |        "      <td>2</td>\n",
3186 |        "      <td>2</td>\n",
3187 |        "      <td>522</td>\n",
3188 |        "      <td>Potential Loyalists</td>\n",
3189 |        "    </tr>\n",
3190 |        "  </tbody>\n",
3191 |        "</table>\n",
3192 |        "</div>"
3193 |       ],
3194 |       "text/plain": [
3195 |        "             Recency  Frequency  Monetary RecencyScore FrequencyScore  \\\n",
3196 |        "Customer ID                                                             \n",
3197 |        "12346             65         46       -65            3              3   \n",
3198 |        "12347              1         71      1323            5              4   \n",
3199 |        "12348             72         20       222            2              2   \n",
3200 |        "12349             41        107      2647            3              4   \n",
3201 |        "12351              9         21       301            5              2   \n",
3202 |        "\n",
3203 |        "            MonetaryScore RFM_SCORE              Segment  \n",
3204 |        "Customer ID                                               \n",
3205 |        "12346                   1       331       Need Attention  \n",
3206 |        "12347                   4       544            Champions  \n",
3207 |        "12348                   1       221          Hibernating  \n",
3208 |        "12349                   5       345      Loyal Customers  \n",
3209 |        "12351                   2       522  Potential Loyalists  "
3210 |       ]
3211 |      },
3212 |      "execution_count": 58,
3213 |      "metadata": {},
3214 |      "output_type": "execute_result"
3215 |     }
3216 |    ],
3217 |    "source": [
3218 |     "rfm['Segment'] = rfm['RecencyScore'].astype(str) + rfm['FrequencyScore'].astype(str)\n",
3219 |     "rfm['Segment'] = rfm['Segment'].replace(seg_map, regex=True)\n",
3220 |     "rfm.head()"
3221 |    ]
3222 |   },
3223 |   {
3224 |    "cell_type": "code",
3225 |    "execution_count": 59,
3226 |    "metadata": {},
3227 |    "outputs": [
3228 |     {
3229 |      "data": {
3230 |       "text/html": [
3231 |        "<div>\n",
3232 |        "<style scoped>\n",
3233 |        "    .dataframe tbody tr th:only-of-type {\n",
3234 |        "        vertical-align: middle;\n",
3235 |        "    }\n",
3236 |        "\n",
3237 |        "    .dataframe tbody tr th {\n",
3238 |        "        vertical-align: top;\n",
3239 |        "    }\n",
3240 |        "\n",
3241 |        "    .dataframe thead tr th {\n",
3242 |        "        text-align: left;\n",
3243 |        "    }\n",
3244 |        "\n",
3245 |        "    .dataframe thead tr:last-of-type th {\n",
3246 |        "        text-align: right;\n",
3247 |        "    }\n",
3248 |        "</style>\n",
3249 |        "<table border=\"1\" class=\"dataframe\">\n",
3250 |        "  <thead>\n",
3251 |        "    <tr>\n",
3252 |        "      <th></th>\n",
3253 |        "      <th colspan=\"2\" halign=\"left\">Recency</th>\n",
3254 |        "      <th colspan=\"2\" halign=\"left\">Frequency</th>\n",
3255 |        "      <th colspan=\"2\" halign=\"left\">Monetary</th>\n",
3256 |        "    </tr>\n",
3257 |        "    <tr>\n",
3258 |        "      <th></th>\n",
3259 |        "      <th>mean</th>\n",
3260 |        "      <th>count</th>\n",
3261 |        "      <th>mean</th>\n",
3262 |        "      <th>count</th>\n",
3263 |        "      <th>mean</th>\n",
3264 |        "      <th>count</th>\n",
3265 |        "    </tr>\n",
3266 |        "    <tr>\n",
3267 |        "      <th>Segment</th>\n",
3268 |        "      <th></th>\n",
3269 |        "      <th></th>\n",
3270 |        "      <th></th>\n",
3271 |        "      <th></th>\n",
3272 |        "      <th></th>\n",
3273 |        "      <th></th>\n",
3274 |        "    </tr>\n",
3275 |        "  </thead>\n",
3276 |        "  <tbody>\n",
3277 |        "    <tr>\n",
3278 |        "      <th>About to Sleep</th>\n",
3279 |        "      <td>51</td>\n",
3280 |        "      <td>346</td>\n",
3281 |        "      <td>15</td>\n",
3282 |        "      <td>346</td>\n",
3283 |        "      <td>383</td>\n",
3284 |        "      <td>346</td>\n",
3285 |        "    </tr>\n",
3286 |        "    <tr>\n",
3287 |        "      <th>At Risk</th>\n",
3288 |        "      <td>161</td>\n",
3289 |        "      <td>620</td>\n",
3290 |        "      <td>59</td>\n",
3291 |        "      <td>620</td>\n",
3292 |        "      <td>1062</td>\n",
3293 |        "      <td>620</td>\n",
3294 |        "    </tr>\n",
3295 |        "    <tr>\n",
3296 |        "      <th>Can't Loose</th>\n",
3297 |        "      <td>121</td>\n",
3298 |        "      <td>94</td>\n",
3299 |        "      <td>228</td>\n",
3300 |        "      <td>94</td>\n",
3301 |        "      <td>2876</td>\n",
3302 |        "      <td>94</td>\n",
3303 |        "    </tr>\n",
3304 |        "    <tr>\n",
3305 |        "      <th>Champions</th>\n",
3306 |        "      <td>5</td>\n",
3307 |        "      <td>667</td>\n",
3308 |        "      <td>272</td>\n",
3309 |        "      <td>667</td>\n",
3310 |        "      <td>6534</td>\n",
3311 |        "      <td>667</td>\n",
3312 |        "    </tr>\n",
3313 |        "    <tr>\n",
3314 |        "      <th>Hibernating</th>\n",
3315 |        "      <td>209</td>\n",
3316 |        "      <td>1024</td>\n",
3317 |        "      <td>13</td>\n",
3318 |        "      <td>1024</td>\n",
3319 |        "      <td>276</td>\n",
3320 |        "      <td>1024</td>\n",
3321 |        "    </tr>\n",
3322 |        "    <tr>\n",
3323 |        "      <th>Loyal Customers</th>\n",
3324 |        "      <td>35</td>\n",
3325 |        "      <td>768</td>\n",
3326 |        "      <td>170</td>\n",
3327 |        "      <td>768</td>\n",
3328 |        "      <td>2533</td>\n",
3329 |        "      <td>768</td>\n",
3330 |        "    </tr>\n",
3331 |        "    <tr>\n",
3332 |        "      <th>Need Attention</th>\n",
3333 |        "      <td>50</td>\n",
3334 |        "      <td>167</td>\n",
3335 |        "      <td>46</td>\n",
3336 |        "      <td>167</td>\n",
3337 |        "      <td>857</td>\n",
3338 |        "      <td>167</td>\n",
3339 |        "    </tr>\n",
3340 |        "    <tr>\n",
3341 |        "      <th>New Customers</th>\n",
3342 |        "      <td>6</td>\n",
3343 |        "      <td>65</td>\n",
3344 |        "      <td>7</td>\n",
3345 |        "      <td>65</td>\n",
3346 |        "      <td>441</td>\n",
3347 |        "      <td>65</td>\n",
3348 |        "    </tr>\n",
3349 |        "    <tr>\n",
3350 |        "      <th>Potential Loyalists</th>\n",
3351 |        "      <td>16</td>\n",
3352 |        "      <td>534</td>\n",
3353 |        "      <td>37</td>\n",
3354 |        "      <td>534</td>\n",
3355 |        "      <td>910</td>\n",
3356 |        "      <td>534</td>\n",
3357 |        "    </tr>\n",
3358 |        "    <tr>\n",
3359 |        "      <th>Promising</th>\n",
3360 |        "      <td>23</td>\n",
3361 |        "      <td>98</td>\n",
3362 |        "      <td>8</td>\n",
3363 |        "      <td>98</td>\n",
3364 |        "      <td>436</td>\n",
3365 |        "      <td>98</td>\n",
3366 |        "    </tr>\n",
3367 |        "  </tbody>\n",
3368 |        "</table>\n",
3369 |        "</div>"
3370 |       ],
3371 |       "text/plain": [
3372 |        "                    Recency       Frequency       Monetary      \n",
3373 |        "                       mean count      mean count     mean count\n",
3374 |        "Segment                                                         \n",
3375 |        "About to Sleep           51   346        15   346      383   346\n",
3376 |        "At Risk                 161   620        59   620     1062   620\n",
3377 |        "Can't Loose             121    94       228    94     2876    94\n",
3378 |        "Champions                 5   667       272   667     6534   667\n",
3379 |        "Hibernating             209  1024        13  1024      276  1024\n",
3380 |        "Loyal Customers          35   768       170   768     2533   768\n",
3381 |        "Need Attention           50   167        46   167      857   167\n",
3382 |        "New Customers             6    65         7    65      441    65\n",
3383 |        "Potential Loyalists      16   534        37   534      910   534\n",
3384 |        "Promising                23    98         8    98      436    98"
3385 |       ]
3386 |      },
3387 |      "execution_count": 59,
3388 |      "metadata": {},
3389 |      "output_type": "execute_result"
3390 |     }
3391 |    ],
3392 |    "source": [
3393 |     "rfm[[\"Segment\", \"Recency\",\"Frequency\",\"Monetary\"]].groupby(\"Segment\").agg([\"mean\",\"count\"])"
3394 |    ]
3395 |   },
3396 |   {
3397 |    "cell_type": "markdown",
3398 |    "execution_count": null,
3399 |    "metadata": {},
3400 |    "source": [
3401 |     "## If we need to comment, let's make an example of champions.\n",
3402 |     "\n",
3403 |     "- Recency is the last 666 number of shopping last 5,\n",
3404 |     "- Frequency average of 272 out of 666 how much shopping it makes,\n",
3405 |     "- Monetary has spent an average of 6533 currencies over 666 shoppers."
3406 |    ]
3407 |   },
3408 |   {
3409 |    "cell_type": "markdown",
3410 |    "execution_count": null,
3411 |    "metadata": {},
3412 |    "source": [
3413 |     "Now, let's choose the class (Need Attention) that needs attention.\n",
3414 |     "If we make strategy evaluations: you can take their \"Customer ID\" and keep it in excel, send sales department and prepare a campaign for them and make it more efficient."
3415 |    ]
3416 |   },
3417 |   {
3418 |    "cell_type": "code",
3419 |    "execution_count": 60,
3420 |    "metadata": {},
3421 |    "outputs": [
3422 |     {
3423 |      "data": {
3424 |       "text/html": [
3425 |        "<div>\n",
3426 |        "<style scoped>\n",
3427 |        "    .dataframe tbody tr th:only-of-type {\n",
3428 |        "        vertical-align: middle;\n",
3429 |        "    }\n",
3430 |        "\n",
3431 |        "    .dataframe tbody tr th {\n",
3432 |        "        vertical-align: top;\n",
3433 |        "    }\n",
3434 |        "\n",
3435 |        "    .dataframe thead th {\n",
3436 |        "        text-align: right;\n",
3437 |        "    }\n",
3438 |        "</style>\n",
3439 |        "<table border=\"1\" class=\"dataframe\">\n",
3440 |        "  <thead>\n",
3441 |        "    <tr style=\"text-align: right;\">\n",
3442 |        "      <th></th>\n",
3443 |        "      <th>Recency</th>\n",
3444 |        "      <th>Frequency</th>\n",
3445 |        "      <th>Monetary</th>\n",
3446 |        "      <th>RecencyScore</th>\n",
3447 |        "      <th>FrequencyScore</th>\n",
3448 |        "      <th>MonetaryScore</th>\n",
3449 |        "      <th>RFM_SCORE</th>\n",
3450 |        "      <th>Segment</th>\n",
3451 |        "    </tr>\n",
3452 |        "    <tr>\n",
3453 |        "      <th>Customer ID</th>\n",
3454 |        "      <th></th>\n",
3455 |        "      <th></th>\n",
3456 |        "      <th></th>\n",
3457 |        "      <th></th>\n",
3458 |        "      <th></th>\n",
3459 |        "      <th></th>\n",
3460 |        "      <th></th>\n",
3461 |        "      <th></th>\n",
3462 |        "    </tr>\n",
3463 |        "  </thead>\n",
3464 |        "  <tbody>\n",
3465 |        "    <tr>\n",
3466 |        "      <th>12346</th>\n",
3467 |        "      <td>65</td>\n",
3468 |        "      <td>46</td>\n",
3469 |        "      <td>-65</td>\n",
3470 |        "      <td>3</td>\n",
3471 |        "      <td>3</td>\n",
3472 |        "      <td>1</td>\n",
3473 |        "      <td>331</td>\n",
3474 |        "      <td>Need Attention</td>\n",
3475 |        "    </tr>\n",
3476 |        "    <tr>\n",
3477 |        "      <th>12374</th>\n",
3478 |        "      <td>55</td>\n",
3479 |        "      <td>50</td>\n",
3480 |        "      <td>2246</td>\n",
3481 |        "      <td>3</td>\n",
3482 |        "      <td>3</td>\n",
3483 |        "      <td>5</td>\n",
3484 |        "      <td>335</td>\n",
3485 |        "      <td>Need Attention</td>\n",
3486 |        "    </tr>\n",
3487 |        "    <tr>\n",
3488 |        "      <th>12379</th>\n",
3489 |        "      <td>56</td>\n",
3490 |        "      <td>41</td>\n",
3491 |        "      <td>768</td>\n",
3492 |        "      <td>3</td>\n",
3493 |        "      <td>3</td>\n",
3494 |        "      <td>3</td>\n",
3495 |        "      <td>333</td>\n",
3496 |        "      <td>Need Attention</td>\n",
3497 |        "    </tr>\n",
3498 |        "    <tr>\n",
3499 |        "      <th>12389</th>\n",
3500 |        "      <td>36</td>\n",
3501 |        "      <td>49</td>\n",
3502 |        "      <td>1433</td>\n",
3503 |        "      <td>3</td>\n",
3504 |        "      <td>3</td>\n",
3505 |        "      <td>4</td>\n",
3506 |        "      <td>334</td>\n",
3507 |        "      <td>Need Attention</td>\n",
3508 |        "    </tr>\n",
3509 |        "    <tr>\n",
3510 |        "      <th>12425</th>\n",
3511 |        "      <td>64</td>\n",
3512 |        "      <td>59</td>\n",
3513 |        "      <td>904</td>\n",
3514 |        "      <td>3</td>\n",
3515 |        "      <td>3</td>\n",
3516 |        "      <td>3</td>\n",
3517 |        "      <td>333</td>\n",
3518 |        "      <td>Need Attention</td>\n",
3519 |        "    </tr>\n",
3520 |        "  </tbody>\n",
3521 |        "</table>\n",
3522 |        "</div>"
3523 |       ],
3524 |       "text/plain": [
3525 |        "             Recency  Frequency  Monetary RecencyScore FrequencyScore  \\\n",
3526 |        "Customer ID                                                             \n",
3527 |        "12346             65         46       -65            3              3   \n",
3528 |        "12374             55         50      2246            3              3   \n",
3529 |        "12379             56         41       768            3              3   \n",
3530 |        "12389             36         49      1433            3              3   \n",
3531 |        "12425             64         59       904            3              3   \n",
3532 |        "\n",
3533 |        "            MonetaryScore RFM_SCORE         Segment  \n",
3534 |        "Customer ID                                          \n",
3535 |        "12346                   1       331  Need Attention  \n",
3536 |        "12374                   5       335  Need Attention  \n",
3537 |        "12379                   3       333  Need Attention  \n",
3538 |        "12389                   4       334  Need Attention  \n",
3539 |        "12425                   3       333  Need Attention  "
3540 |       ]
3541 |      },
3542 |      "execution_count": 60,
3543 |      "metadata": {},
3544 |      "output_type": "execute_result"
3545 |     }
3546 |    ],
3547 |    "source": [
3548 |     "rfm[rfm[\"Segment\"] == \"Need Attention\"].head()"
3549 |    ]
3550 |   },
3551 |   {
3552 |    "cell_type": "markdown",
3553 |    "execution_count": null,
3554 |    "metadata": {},
3555 |    "source": [
3556 |     "## 14. Finally, create an excel file named New Customer."
3557 |    ]
3558 |   },
3559 |   {
3560 |    "cell_type": "code",
3561 |    "execution_count": 61,
3562 |    "metadata": {},
3563 |    "outputs": [
3564 |     {
3565 |      "data": {
3566 |       "text/plain": [
3567 |        "Int64Index([12386, 12427, 12441, 12538, 12686, 12738, 13010, 13011, 13029,\n",
3568 |        "            13094, 13145, 13254, 13258, 13270, 13369, 13747, 13848, 14119,\n",
3569 |        "            14213, 14306, 14491, 14576, 14589, 14865, 14987, 15018, 15181,\n",
3570 |        "            15212, 15299, 15304, 15649, 15728, 15899, 15914, 15922, 15973,\n",
3571 |        "            16194, 16473, 16545, 16552, 16711, 16752, 16988, 16995, 17026,\n",
3572 |        "            17170, 17181, 17262, 17281, 17339, 17378, 17468, 17556, 17616,\n",
3573 |        "            17674, 17723, 17857, 17870, 17924, 17925, 17951, 18084, 18113,\n",
3574 |        "            18161, 18269],\n",
3575 |        "           dtype='int64', name='Customer ID')"
3576 |       ]
3577 |      },
3578 |      "execution_count": 61,
3579 |      "metadata": {},
3580 |      "output_type": "execute_result"
3581 |     }
3582 |    ],
3583 |    "source": [
3584 |     "rfm[rfm[\"Segment\"] == \"New Customers\"].index"
3585 |    ]
3586 |   },
3587 |   {
3588 |    "cell_type": "code",
3589 |    "execution_count": 62,
3590 |    "metadata": {},
3591 |    "outputs": [],
3592 |    "source": [
3593 |     "new_df = pd.DataFrame()\n",
3594 |     "new_df[\"NewCustomerID\"] = rfm[rfm[\"Segment\"] == \"New Customers\"].index"
3595 |    ]
3596 |   },
3597 |   {
3598 |    "cell_type": "code",
3599 |    "execution_count": 63,
3600 |    "metadata": {},
3601 |    "outputs": [
3602 |     {
3603 |      "data": {
3604 |       "text/html": [
3605 |        "<div>\n",
3606 |        "<style scoped>\n",
3607 |        "    .dataframe tbody tr th:only-of-type {\n",
3608 |        "        vertical-align: middle;\n",
3609 |        "    }\n",
3610 |        "\n",
3611 |        "    .dataframe tbody tr th {\n",
3612 |        "        vertical-align: top;\n",
3613 |        "    }\n",
3614 |        "\n",
3615 |        "    .dataframe thead th {\n",
3616 |        "        text-align: right;\n",
3617 |        "    }\n",
3618 |        "</style>\n",
3619 |        "<table border=\"1\" class=\"dataframe\">\n",
3620 |        "  <thead>\n",
3621 |        "    <tr style=\"text-align: right;\">\n",
3622 |        "      <th></th>\n",
3623 |        "      <th>NewCustomerID</th>\n",
3624 |        "    </tr>\n",
3625 |        "  </thead>\n",
3626 |        "  <tbody>\n",
3627 |        "    <tr>\n",
3628 |        "      <th>0</th>\n",
3629 |        "      <td>12386</td>\n",
3630 |        "    </tr>\n",
3631 |        "    <tr>\n",
3632 |        "      <th>1</th>\n",
3633 |        "      <td>12427</td>\n",
3634 |        "    </tr>\n",
3635 |        "    <tr>\n",
3636 |        "      <th>2</th>\n",
3637 |        "      <td>12441</td>\n",
3638 |        "    </tr>\n",
3639 |        "    <tr>\n",
3640 |        "      <th>3</th>\n",
3641 |        "      <td>12538</td>\n",
3642 |        "    </tr>\n",
3643 |        "    <tr>\n",
3644 |        "      <th>4</th>\n",
3645 |        "      <td>12686</td>\n",
3646 |        "    </tr>\n",
3647 |        "  </tbody>\n",
3648 |        "</table>\n",
3649 |        "</div>"
3650 |       ],
3651 |       "text/plain": [
3652 |        "   NewCustomerID\n",
3653 |        "0          12386\n",
3654 |        "1          12427\n",
3655 |        "2          12441\n",
3656 |        "3          12538\n",
3657 |        "4          12686"
3658 |       ]
3659 |      },
3660 |      "execution_count": 63,
3661 |      "metadata": {},
3662 |      "output_type": "execute_result"
3663 |     }
3664 |    ],
3665 |    "source": [
3666 |     "new_df.head()"
3667 |    ]
3668 |   },
3669 |   {
3670 |    "cell_type": "code",
3671 |    "execution_count": 64,
3672 |    "metadata": {},
3673 |    "outputs": [],
3674 |    "source": [
3675 |     "new_df.to_csv(\"new_customers.csv\")"
3676 |    ]
3677 |   },
3678 |   {
3679 |    "cell_type": "markdown",
3680 |    "execution_count": null,
3681 |    "metadata": {},
3682 |    "source": [
3683 |     "\n",
3684 |     "\n",
3685 |     "# Conclusion\n",
3686 |     "\n",
3687 |     "    After this notebook, my aim is to prepare 'kernel' which is 'not clear' data set.\n",
3688 |     "\n",
3689 |     "    If you have any suggestions, please could you write for me? I wil be happy for comment and critics!\n",
3690 |     "\n",
3691 |     "    Thank you for your suggestion and votes ;)\n",
3692 |     "\n"
3693 |    ]
3694 |   },
3695 |   {
3696 |    "cell_type": "code",
3697 |    "execution_count": null,
3698 |    "metadata": {},
3699 |    "outputs": [],
3700 |    "source": []
3701 |   }
3702 |  ],
3703 |  "metadata": {
3704 |   "kernelspec": {
3705 |    "display_name": "Python 3",
3706 |    "language": "python",
3707 |    "name": "python3"
3708 |   },
3709 |   "language_info": {
3710 |    "codemirror_mode": {
3711 |     "name": "ipython",
3712 |     "version": 3
3713 |    },
3714 |    "file_extension": ".py",
3715 |    "mimetype": "text/x-python",
3716 |    "name": "python",
3717 |    "nbconvert_exporter": "python",
3718 |    "pygments_lexer": "ipython3",
3719 |    "version": "3.7.6"
3720 |   }
3721 |  },
3722 |  "nbformat": 4,
3723 |  "nbformat_minor": 4
3724 | }
3725 | 


--------------------------------------------------------------------------------