├── README.md └── business-problem-with-customer-segmentation.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # Customer-Segmentation-with-RFM-Analysis 2 | 3 | 4 | ## Context 5 | A real online retail transaction data set of two years. 6 | 7 | 8 | ## Content 9 | This Online Retail II data set contains all the transactions occurring for a UK-based and registered, non-store online retail between 01/12/2009 and 09/12/2011.The company mainly sells unique all-occasion gift-ware. Many customers of the company are wholesalers. 10 | 11 | 12 | ## Column Descriptors 13 | InvoiceNo: Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'c', it indicates a cancellation. 14 | 15 | StockCode: Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product. 16 | 17 | Description: Product (item) name. Nominal. 18 | 19 | Quantity: The quantities of each product (item) per transaction. Numeric. 20 | 21 | InvoiceDate: Invice date and time. Numeric. The day and time when a transaction was generated. 22 | 23 | UnitPrice: Unit price. Numeric. Product price per unit in sterling (£). 24 | 25 | CustomerID: Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer. 26 | 27 | Country: Country name. Nominal. The name of the country where a customer resides. 28 | 29 | 30 | ## Acknowledgements 31 | Here you can find references about data set: 32 | https://archive.ics.uci.edu/ml/datasets/Online+Retail+II 33 | and 34 | You can find data set and example kernel on #Kaggle with this link: 35 | https://www.kaggle.com/mathchi/business-problem-with-customer-segmentation 36 | 37 | 38 | ## Relevant Papers: 39 | Chen, D. Sain, S.L., and Guo, K. (2012), Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining, Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197-208. doi: [Web Link]. 40 | 41 | Chen, D., Guo, K. and Ubakanma, G. (2015), Predicting customer profitability over time based on RFM time series, International Journal of Business Forecasting and Marketing Intelligence, Vol. 2, No. 1, pp.1-18. doi: [Web Link]. 42 | 43 | Chen, D., Guo, K., and Li, Bo (2019), Predicting Customer Profitability Dynamically over Time: An Experimental Comparative Study, 24th Iberoamerican Congress on Pattern Recognition (CIARP 2019), Havana, Cuba, 28-31 Oct, 2019. 44 | 45 | Laha Ale, Ning Zhang, Huici Wu, Dajiang Chen, and Tao Han, Online Proactive Caching in Mobile Edge Computing Using Bidirectional Deep Recurrent Neural Network, IEEE Internet of Things Journal, Vol. 6, Issue 3, pp. 5520-5530, 2019. 46 | 47 | Rina Singh, Jeffrey A. Graves, Douglas A. Talbert, William Eberle, Prefix and Suffix Sequential Pattern Mining, Industrial Conference on Data Mining 2018: Advances in Data Mining. Applications and Theoretical Aspects, pp. 309-324. 2018. 48 | 49 | 50 | ## Inspiration 51 | This is Data Set Characteristics: Multivariate, Sequential, Time-Series, Text 52 | -------------------------------------------------------------------------------- /business-problem-with-customer-segmentation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "source": [ 8 | "# Business Problem with Customer Segmentation\n", 9 | "\n", 10 | "\n", 11 | "An e-commerce company wants to segment its customers and determine marketing strategies according to these segments.\n", 12 | "\n", 13 | "For this purpose, we will define the behavior of customers and we will form groups according to clustering.\n", 14 | "\n", 15 | "In other words, we will take those who exhibit common behaviors into the same groups and we will try to develop sales and marketing techniques specific to these groups.\n", 16 | "\n", 17 | "\n", 18 | "\n", 19 | "### Data Set Story:\n", 20 | "\n", 21 | "https://archive.ics.uci.edu/ml/datasets/Online+Retail+II\n", 22 | "\n", 23 | "This Online Retail II data set contains all the transactions occurring for a UK-based and registered, non-store online retail between 01/12/2009 and 09/12/2011.\n", 24 | "\n", 25 | "The company mainly sells unique all-occasion gift-ware. \n", 26 | "\n", 27 | "Many customers of the company are wholesalers.\n", 28 | "\n", 29 | "\n", 30 | "\n", 31 | "\n", 32 | "### Attribute Information:\n", 33 | "\n", 34 | "- InvoiceNo: Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'c', it indicates a cancellation.\n", 35 | "- StockCode: Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product.\n", 36 | "- Description: Product (item) name. Nominal.\n", 37 | "- Quantity: The quantities of each product (item) per transaction. Numeric.\n", 38 | "- InvoiceDate: Invice date and time. Numeric. The day and time when a transaction was generated.\n", 39 | "- UnitPrice: Unit price. Numeric. Product price per unit in sterling (£).\n", 40 | "- CustomerID: Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer.\n", 41 | "- Country: Country name. Nominal. The name of the country where a customer resides.\n", 42 | "\n" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "execution_count": null, 48 | "metadata": {}, 49 | "source": [ 50 | "# Questions from data set\n", 51 | "\n", 52 | "\n", 53 | "All questions about 2009-2010 years\n", 54 | "\n", 55 | "1. What is the number of unique products?\n", 56 | "2. Which product do you have?\n", 57 | "3. Which product is the most ordered?\n", 58 | "4. How do we rank this output?\n", 59 | "5. How many invoices have been issued?\n", 60 | "6. How much money has been earned per invoice?\n", 61 | "7. Which are the most expensive products?\n", 62 | "8. How many orders came from which country?\n", 63 | "9. which country gained how much?\n", 64 | "10. which product is the most returned?\n", 65 | "11. What should we do for customer segmentation with RFM?\n", 66 | "12. Scoring for RFM.\n", 67 | "13. Finally, create an excel file named New Customer." 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "execution_count": null, 73 | "metadata": {}, 74 | "source": [ 75 | "# Data Understanding " 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 1, 81 | "metadata": {}, 82 | "outputs": [], 83 | "source": [ 84 | "import pandas as pd\n", 85 | "import numpy as np\n", 86 | "import seaborn as sns\n", 87 | "\n", 88 | "# to display all columns and rows:\n", 89 | "pd.set_option('display.max_columns', None); pd.set_option('display.max_rows', None);\n" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "execution_count": null, 95 | "metadata": {}, 96 | "source": [ 97 | "The number of numbers that will be shown after the comma. For variables such as 'price', the option below is replaced with 0 instead." 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 2, 103 | "metadata": {}, 104 | "outputs": [], 105 | "source": [ 106 | "pd.set_option('display.float_format', lambda x: '%.0f' % x)\n", 107 | "import matplotlib.pyplot as plt" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 3, 113 | "metadata": {}, 114 | "outputs": [], 115 | "source": [ 116 | "df_2009_2010 = pd.read_excel(\"../input/online-retail-ii-data-set-from-ml-repository/online_retail_II.xlsx\", sheet_name = \"Year 2009-2010\")" 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": 4, 122 | "metadata": {}, 123 | "outputs": [], 124 | "source": [ 125 | "df = df_2009_2010.copy()" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "execution_count": null, 131 | "metadata": {}, 132 | "source": [ 133 | "Try to understand the data by using the functions that can be used as a first look at the data in the pandas section." 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "execution_count": null, 139 | "metadata": {}, 140 | "source": [ 141 | "## 1. What is the number of unique products?" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 5, 147 | "metadata": {}, 148 | "outputs": [ 149 | { 150 | "data": { 151 | "text/plain": [ 152 | "4681" 153 | ] 154 | }, 155 | "execution_count": 5, 156 | "metadata": {}, 157 | "output_type": "execute_result" 158 | } 159 | ], 160 | "source": [ 161 | "df[\"Description\"].nunique()" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "execution_count": null, 167 | "metadata": {}, 168 | "source": [ 169 | "## 2. Which product do you have?" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": 6, 175 | "metadata": {}, 176 | "outputs": [ 177 | { 178 | "data": { 179 | "text/plain": [ 180 | "WHITE HANGING HEART T-LIGHT HOLDER 3549\n", 181 | "REGENCY CAKESTAND 3 TIER 2212\n", 182 | "STRAWBERRY CERAMIC TRINKET BOX 1843\n", 183 | "PACK OF 72 RETRO SPOT CAKE CASES 1466\n", 184 | "ASSORTED COLOUR BIRD ORNAMENT 1457\n", 185 | "Name: Description, dtype: int64" 186 | ] 187 | }, 188 | "execution_count": 6, 189 | "metadata": {}, 190 | "output_type": "execute_result" 191 | } 192 | ], 193 | "source": [ 194 | "df[\"Description\"].value_counts().head()" 195 | ] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "execution_count": null, 200 | "metadata": {}, 201 | "source": [ 202 | "## 3. Which product is the most ordered?" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 7, 208 | "metadata": {}, 209 | "outputs": [ 210 | { 211 | "data": { 212 | "text/html": [ 213 | "
\n", 214 | "\n", 227 | "\n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | "
Quantity
Description
21494-720
22467-2
227192
DOORMAT UNION JACK GUNS AND ROSES179
3 STRIPEY MICE FELTCRAFT690
\n", 261 | "
" 262 | ], 263 | "text/plain": [ 264 | " Quantity\n", 265 | "Description \n", 266 | "21494 -720\n", 267 | "22467 -2\n", 268 | "22719 2\n", 269 | " DOORMAT UNION JACK GUNS AND ROSES 179\n", 270 | " 3 STRIPEY MICE FELTCRAFT 690" 271 | ] 272 | }, 273 | "execution_count": 7, 274 | "metadata": {}, 275 | "output_type": "execute_result" 276 | } 277 | ], 278 | "source": [ 279 | "df.groupby(\"Description\").agg({\"Quantity\":\"sum\"}).head()" 280 | ] 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "execution_count": null, 285 | "metadata": {}, 286 | "source": [ 287 | "## 4. How do we rank this output?" 288 | ] 289 | }, 290 | { 291 | "cell_type": "code", 292 | "execution_count": 8, 293 | "metadata": {}, 294 | "outputs": [ 295 | { 296 | "data": { 297 | "text/html": [ 298 | "
\n", 299 | "\n", 312 | "\n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | "
Quantity
Description
WHITE HANGING HEART T-LIGHT HOLDER57733
WORLD WAR 2 GLIDERS ASSTD DESIGNS54698
BROCADE RING PURSE47647
PACK OF 72 RETRO SPOT CAKE CASES46106
ASSORTED COLOUR BIRD ORNAMENT44925
\n", 346 | "
" 347 | ], 348 | "text/plain": [ 349 | " Quantity\n", 350 | "Description \n", 351 | "WHITE HANGING HEART T-LIGHT HOLDER 57733\n", 352 | "WORLD WAR 2 GLIDERS ASSTD DESIGNS 54698\n", 353 | "BROCADE RING PURSE 47647\n", 354 | "PACK OF 72 RETRO SPOT CAKE CASES 46106\n", 355 | "ASSORTED COLOUR BIRD ORNAMENT 44925" 356 | ] 357 | }, 358 | "execution_count": 8, 359 | "metadata": {}, 360 | "output_type": "execute_result" 361 | } 362 | ], 363 | "source": [ 364 | "df.groupby(\"Description\").agg({\"Quantity\":\"sum\"}).sort_values(\"Quantity\", ascending = False).head()" 365 | ] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "execution_count": null, 370 | "metadata": {}, 371 | "source": [ 372 | "## 5. How many invoices have been issued?" 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": 9, 378 | "metadata": {}, 379 | "outputs": [ 380 | { 381 | "data": { 382 | "text/plain": [ 383 | "28816" 384 | ] 385 | }, 386 | "execution_count": 9, 387 | "metadata": {}, 388 | "output_type": "execute_result" 389 | } 390 | ], 391 | "source": [ 392 | "df[\"Invoice\"].nunique()" 393 | ] 394 | }, 395 | { 396 | "cell_type": "markdown", 397 | "execution_count": null, 398 | "metadata": {}, 399 | "source": [ 400 | "## 6. How much money has been earned per invoice?" 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": 10, 406 | "metadata": {}, 407 | "outputs": [], 408 | "source": [ 409 | "# it is necessary to create a new variable by multiplying two variables\n", 410 | "\n", 411 | "df[\"TotalPrice\"] = df[\"Quantity\"]*df[\"Price\"]" 412 | ] 413 | }, 414 | { 415 | "cell_type": "code", 416 | "execution_count": 11, 417 | "metadata": {}, 418 | "outputs": [ 419 | { 420 | "data": { 421 | "text/html": [ 422 | "
\n", 423 | "\n", 436 | "\n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | "
InvoiceStockCodeDescriptionQuantityInvoiceDatePriceCustomer IDCountryTotalPrice
04894348504815CM CHRISTMAS GLASS BALL 20 LIGHTS122009-12-01 07:45:00713085United Kingdom83
148943479323PPINK CHERRY LIGHTS122009-12-01 07:45:00713085United Kingdom81
248943479323WWHITE CHERRY LIGHTS122009-12-01 07:45:00713085United Kingdom81
348943422041RECORD FRAME 7\" SINGLE SIZE482009-12-01 07:45:00213085United Kingdom101
448943421232STRAWBERRY CERAMIC TRINKET BOX242009-12-01 07:45:00113085United Kingdom30
\n", 514 | "
" 515 | ], 516 | "text/plain": [ 517 | " Invoice StockCode Description Quantity \\\n", 518 | "0 489434 85048 15CM CHRISTMAS GLASS BALL 20 LIGHTS 12 \n", 519 | "1 489434 79323P PINK CHERRY LIGHTS 12 \n", 520 | "2 489434 79323W WHITE CHERRY LIGHTS 12 \n", 521 | "3 489434 22041 RECORD FRAME 7\" SINGLE SIZE 48 \n", 522 | "4 489434 21232 STRAWBERRY CERAMIC TRINKET BOX 24 \n", 523 | "\n", 524 | " InvoiceDate Price Customer ID Country TotalPrice \n", 525 | "0 2009-12-01 07:45:00 7 13085 United Kingdom 83 \n", 526 | "1 2009-12-01 07:45:00 7 13085 United Kingdom 81 \n", 527 | "2 2009-12-01 07:45:00 7 13085 United Kingdom 81 \n", 528 | "3 2009-12-01 07:45:00 2 13085 United Kingdom 101 \n", 529 | "4 2009-12-01 07:45:00 1 13085 United Kingdom 30 " 530 | ] 531 | }, 532 | "execution_count": 11, 533 | "metadata": {}, 534 | "output_type": "execute_result" 535 | } 536 | ], 537 | "source": [ 538 | "df.head()" 539 | ] 540 | }, 541 | { 542 | "cell_type": "code", 543 | "execution_count": 12, 544 | "metadata": {}, 545 | "outputs": [ 546 | { 547 | "data": { 548 | "text/html": [ 549 | "
\n", 550 | "\n", 563 | "\n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | "
TotalPrice
Invoice
489434505
489435146
489436630
489437311
4894382286
\n", 597 | "
" 598 | ], 599 | "text/plain": [ 600 | " TotalPrice\n", 601 | "Invoice \n", 602 | "489434 505\n", 603 | "489435 146\n", 604 | "489436 630\n", 605 | "489437 311\n", 606 | "489438 2286" 607 | ] 608 | }, 609 | "execution_count": 12, 610 | "metadata": {}, 611 | "output_type": "execute_result" 612 | } 613 | ], 614 | "source": [ 615 | "df.groupby(\"Invoice\").agg({\"TotalPrice\":\"sum\"}).head()" 616 | ] 617 | }, 618 | { 619 | "cell_type": "markdown", 620 | "execution_count": null, 621 | "metadata": {}, 622 | "source": [ 623 | "## 7. Which are the most expensive products?" 624 | ] 625 | }, 626 | { 627 | "cell_type": "code", 628 | "execution_count": 13, 629 | "metadata": {}, 630 | "outputs": [ 631 | { 632 | "data": { 633 | "text/html": [ 634 | "
\n", 635 | "\n", 648 | "\n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | "
InvoiceStockCodeDescriptionQuantityInvoiceDatePriceCustomer IDCountryTotalPrice
241824C512770MManual-12010-06-17 16:52:002511117399United Kingdom-25111
241827512771MManual12010-06-17 16:53:0025111nanUnited Kingdom25111
320581C520667BANK CHARGESBank Charges-12010-08-27 13:42:0018911nanUnited Kingdom-18911
517953C537630AMAZONFEEAMAZON FEE-12010-12-07 15:04:0013541nanUnited Kingdom-13541
519294C537651AMAZONFEEAMAZON FEE-12010-12-07 15:49:0013541nanUnited Kingdom-13541
\n", 726 | "
" 727 | ], 728 | "text/plain": [ 729 | " Invoice StockCode Description Quantity InvoiceDate \\\n", 730 | "241824 C512770 M Manual -1 2010-06-17 16:52:00 \n", 731 | "241827 512771 M Manual 1 2010-06-17 16:53:00 \n", 732 | "320581 C520667 BANK CHARGES Bank Charges -1 2010-08-27 13:42:00 \n", 733 | "517953 C537630 AMAZONFEE AMAZON FEE -1 2010-12-07 15:04:00 \n", 734 | "519294 C537651 AMAZONFEE AMAZON FEE -1 2010-12-07 15:49:00 \n", 735 | "\n", 736 | " Price Customer ID Country TotalPrice \n", 737 | "241824 25111 17399 United Kingdom -25111 \n", 738 | "241827 25111 nan United Kingdom 25111 \n", 739 | "320581 18911 nan United Kingdom -18911 \n", 740 | "517953 13541 nan United Kingdom -13541 \n", 741 | "519294 13541 nan United Kingdom -13541 " 742 | ] 743 | }, 744 | "execution_count": 13, 745 | "metadata": {}, 746 | "output_type": "execute_result" 747 | } 748 | ], 749 | "source": [ 750 | "df.sort_values(\"Price\", ascending = False).head()" 751 | ] 752 | }, 753 | { 754 | "cell_type": "markdown", 755 | "execution_count": null, 756 | "metadata": {}, 757 | "source": [ 758 | "## 8. How many orders came from which country?" 759 | ] 760 | }, 761 | { 762 | "cell_type": "code", 763 | "execution_count": 14, 764 | "metadata": {}, 765 | "outputs": [ 766 | { 767 | "data": { 768 | "text/plain": [ 769 | "United Kingdom 485852\n", 770 | "EIRE 9670\n", 771 | "Germany 8129\n", 772 | "France 5772\n", 773 | "Netherlands 2769\n", 774 | "Spain 1278\n", 775 | "Switzerland 1187\n", 776 | "Portugal 1101\n", 777 | "Belgium 1054\n", 778 | "Channel Islands 906\n", 779 | "Sweden 902\n", 780 | "Italy 731\n", 781 | "Australia 654\n", 782 | "Cyprus 554\n", 783 | "Austria 537\n", 784 | "Greece 517\n", 785 | "United Arab Emirates 432\n", 786 | "Denmark 428\n", 787 | "Norway 369\n", 788 | "Finland 354\n", 789 | "Unspecified 310\n", 790 | "USA 244\n", 791 | "Japan 224\n", 792 | "Poland 194\n", 793 | "Malta 172\n", 794 | "Lithuania 154\n", 795 | "Singapore 117\n", 796 | "RSA 111\n", 797 | "Bahrain 107\n", 798 | "Canada 77\n", 799 | "Hong Kong 76\n", 800 | "Thailand 76\n", 801 | "Israel 74\n", 802 | "Iceland 71\n", 803 | "Korea 63\n", 804 | "Brazil 62\n", 805 | "West Indies 54\n", 806 | "Bermuda 34\n", 807 | "Nigeria 32\n", 808 | "Lebanon 13\n", 809 | "Name: Country, dtype: int64" 810 | ] 811 | }, 812 | "execution_count": 14, 813 | "metadata": {}, 814 | "output_type": "execute_result" 815 | } 816 | ], 817 | "source": [ 818 | "df[\"Country\"].value_counts()" 819 | ] 820 | }, 821 | { 822 | "cell_type": "markdown", 823 | "execution_count": null, 824 | "metadata": {}, 825 | "source": [ 826 | "## 9. Which country gained how much?" 827 | ] 828 | }, 829 | { 830 | "cell_type": "code", 831 | "execution_count": 15, 832 | "metadata": {}, 833 | "outputs": [ 834 | { 835 | "data": { 836 | "text/html": [ 837 | "
\n", 838 | "\n", 851 | "\n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | "
TotalPrice
Country
United Kingdom8194778
EIRE352243
Netherlands263863
Germany196290
France130770
\n", 885 | "
" 886 | ], 887 | "text/plain": [ 888 | " TotalPrice\n", 889 | "Country \n", 890 | "United Kingdom 8194778\n", 891 | "EIRE 352243\n", 892 | "Netherlands 263863\n", 893 | "Germany 196290\n", 894 | "France 130770" 895 | ] 896 | }, 897 | "execution_count": 15, 898 | "metadata": {}, 899 | "output_type": "execute_result" 900 | } 901 | ], 902 | "source": [ 903 | "df.groupby(\"Country\").agg({\"TotalPrice\":\"sum\"}).sort_values(\"TotalPrice\", ascending = False).head()" 904 | ] 905 | }, 906 | { 907 | "cell_type": "markdown", 908 | "execution_count": null, 909 | "metadata": {}, 910 | "source": [ 911 | "## 10. Which product is the most returned?" 912 | ] 913 | }, 914 | { 915 | "cell_type": "code", 916 | "execution_count": 16, 917 | "metadata": {}, 918 | "outputs": [ 919 | { 920 | "data": { 921 | "text/html": [ 922 | "
\n", 923 | "\n", 936 | "\n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | " \n", 941 | " \n", 942 | " \n", 943 | " \n", 944 | " \n", 945 | " \n", 946 | " \n", 947 | " \n", 948 | " \n", 949 | " \n", 950 | " \n", 951 | " \n", 952 | " \n", 953 | " \n", 954 | " \n", 955 | " \n", 956 | " \n", 957 | " \n", 958 | " \n", 959 | " \n", 960 | " \n", 961 | " \n", 962 | " \n", 963 | " \n", 964 | " \n", 965 | " \n", 966 | " \n", 967 | " \n", 968 | " \n", 969 | " \n", 970 | " \n", 971 | " \n", 972 | " \n", 973 | " \n", 974 | " \n", 975 | " \n", 976 | " \n", 977 | " \n", 978 | " \n", 979 | " \n", 980 | " \n", 981 | " \n", 982 | " \n", 983 | " \n", 984 | " \n", 985 | " \n", 986 | " \n", 987 | " \n", 988 | " \n", 989 | " \n", 990 | " \n", 991 | " \n", 992 | " \n", 993 | " \n", 994 | " \n", 995 | " \n", 996 | " \n", 997 | " \n", 998 | " \n", 999 | " \n", 1000 | " \n", 1001 | " \n", 1002 | " \n", 1003 | " \n", 1004 | " \n", 1005 | " \n", 1006 | " \n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | "
InvoiceStockCodeDescriptionQuantityInvoiceDatePriceCustomer IDCountryTotalPrice
507225C53675784347ROTATING SILVER ANGELS T-LIGHT HLDR-93602010-12-02 14:23:00015838United Kingdom-281
359669C52423521088SET/6 FRUIT SALAD PAPER CUPS-71282010-09-28 11:02:00014277France-570
359670C52423521096SET/6 FRUIT SALAD PAPER PLATES-70082010-09-28 11:02:00014277France-911
359630C52423516047POP ART PEN CASE & PENS-51842010-09-28 11:02:00014277France-415
359636C52423537340MULTICOLOUR SPRING FLOWER MUG-49922010-09-28 11:02:00014277France-499
\n", 1014 | "
" 1015 | ], 1016 | "text/plain": [ 1017 | " Invoice StockCode Description Quantity \\\n", 1018 | "507225 C536757 84347 ROTATING SILVER ANGELS T-LIGHT HLDR -9360 \n", 1019 | "359669 C524235 21088 SET/6 FRUIT SALAD PAPER CUPS -7128 \n", 1020 | "359670 C524235 21096 SET/6 FRUIT SALAD PAPER PLATES -7008 \n", 1021 | "359630 C524235 16047 POP ART PEN CASE & PENS -5184 \n", 1022 | "359636 C524235 37340 MULTICOLOUR SPRING FLOWER MUG -4992 \n", 1023 | "\n", 1024 | " InvoiceDate Price Customer ID Country TotalPrice \n", 1025 | "507225 2010-12-02 14:23:00 0 15838 United Kingdom -281 \n", 1026 | "359669 2010-09-28 11:02:00 0 14277 France -570 \n", 1027 | "359670 2010-09-28 11:02:00 0 14277 France -911 \n", 1028 | "359630 2010-09-28 11:02:00 0 14277 France -415 \n", 1029 | "359636 2010-09-28 11:02:00 0 14277 France -499 " 1030 | ] 1031 | }, 1032 | "execution_count": 16, 1033 | "metadata": {}, 1034 | "output_type": "execute_result" 1035 | } 1036 | ], 1037 | "source": [ 1038 | "df[df['Invoice'].str.startswith(\"C\", na=False)].sort_values(\"Quantity\", ascending = True).head()" 1039 | ] 1040 | }, 1041 | { 1042 | "cell_type": "markdown", 1043 | "execution_count": null, 1044 | "metadata": {}, 1045 | "source": [ 1046 | "# Data Preparation" 1047 | ] 1048 | }, 1049 | { 1050 | "cell_type": "code", 1051 | "execution_count": 17, 1052 | "metadata": {}, 1053 | "outputs": [ 1054 | { 1055 | "data": { 1056 | "text/plain": [ 1057 | "Invoice 0\n", 1058 | "StockCode 0\n", 1059 | "Description 2928\n", 1060 | "Quantity 0\n", 1061 | "InvoiceDate 0\n", 1062 | "Price 0\n", 1063 | "Customer ID 107927\n", 1064 | "Country 0\n", 1065 | "TotalPrice 0\n", 1066 | "dtype: int64" 1067 | ] 1068 | }, 1069 | "execution_count": 17, 1070 | "metadata": {}, 1071 | "output_type": "execute_result" 1072 | } 1073 | ], 1074 | "source": [ 1075 | "df.isnull().sum()" 1076 | ] 1077 | }, 1078 | { 1079 | "cell_type": "code", 1080 | "execution_count": 18, 1081 | "metadata": {}, 1082 | "outputs": [], 1083 | "source": [ 1084 | "df.dropna(inplace = True)" 1085 | ] 1086 | }, 1087 | { 1088 | "cell_type": "code", 1089 | "execution_count": 19, 1090 | "metadata": {}, 1091 | "outputs": [ 1092 | { 1093 | "data": { 1094 | "text/plain": [ 1095 | "(417534, 9)" 1096 | ] 1097 | }, 1098 | "execution_count": 19, 1099 | "metadata": {}, 1100 | "output_type": "execute_result" 1101 | } 1102 | ], 1103 | "source": [ 1104 | "df.shape" 1105 | ] 1106 | }, 1107 | { 1108 | "cell_type": "code", 1109 | "execution_count": 20, 1110 | "metadata": {}, 1111 | "outputs": [ 1112 | { 1113 | "data": { 1114 | "text/html": [ 1115 | "
\n", 1116 | "\n", 1129 | "\n", 1130 | " \n", 1131 | " \n", 1132 | " \n", 1133 | " \n", 1134 | " \n", 1135 | " \n", 1136 | " \n", 1137 | " \n", 1138 | " \n", 1139 | " \n", 1140 | " \n", 1141 | " \n", 1142 | " \n", 1143 | " \n", 1144 | " \n", 1145 | " \n", 1146 | " \n", 1147 | " \n", 1148 | " \n", 1149 | " \n", 1150 | " \n", 1151 | " \n", 1152 | " \n", 1153 | " \n", 1154 | " \n", 1155 | " \n", 1156 | " \n", 1157 | " \n", 1158 | " \n", 1159 | " \n", 1160 | " \n", 1161 | " \n", 1162 | " \n", 1163 | " \n", 1164 | " \n", 1165 | " \n", 1166 | " \n", 1167 | " \n", 1168 | " \n", 1169 | " \n", 1170 | " \n", 1171 | " \n", 1172 | " \n", 1173 | " \n", 1174 | " \n", 1175 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | " \n", 1219 | "
countmeanstdmin1%5%10%25%50%75%90%95%99%max
Quantity41753413101-9360-2112412243614419152
Price4175344710001124781525111
Customer ID4175341536116811234612435127251304213983153111679917706179131819618287
TotalPrice41753420100-25111-111241119356519615818
\n", 1220 | "
" 1221 | ], 1222 | "text/plain": [ 1223 | " count mean std min 1% 5% 10% 25% 50% 75% \\\n", 1224 | "Quantity 417534 13 101 -9360 -2 1 1 2 4 12 \n", 1225 | "Price 417534 4 71 0 0 0 1 1 2 4 \n", 1226 | "Customer ID 417534 15361 1681 12346 12435 12725 13042 13983 15311 16799 \n", 1227 | "TotalPrice 417534 20 100 -25111 -11 1 2 4 11 19 \n", 1228 | "\n", 1229 | " 90% 95% 99% max \n", 1230 | "Quantity 24 36 144 19152 \n", 1231 | "Price 7 8 15 25111 \n", 1232 | "Customer ID 17706 17913 18196 18287 \n", 1233 | "TotalPrice 35 65 196 15818 " 1234 | ] 1235 | }, 1236 | "execution_count": 20, 1237 | "metadata": {}, 1238 | "output_type": "execute_result" 1239 | } 1240 | ], 1241 | "source": [ 1242 | "df.describe([0.01,0.05,0.10,0.25,0.50,0.75,0.90,0.95, 0.99]).T" 1243 | ] 1244 | }, 1245 | { 1246 | "cell_type": "code", 1247 | "execution_count": 21, 1248 | "metadata": {}, 1249 | "outputs": [ 1250 | { 1251 | "name": "stdout", 1252 | "output_type": "stream", 1253 | "text": [ 1254 | "Quantity yes\n", 1255 | "1063\n", 1256 | "Price yes\n", 1257 | "953\n", 1258 | "TotalPrice yes\n", 1259 | "1150\n" 1260 | ] 1261 | } 1262 | ], 1263 | "source": [ 1264 | "for feature in [\"Quantity\",\"Price\",\"TotalPrice\"]:\n", 1265 | "\n", 1266 | " Q1 = df[feature].quantile(0.01)\n", 1267 | " Q3 = df[feature].quantile(0.99)\n", 1268 | " IQR = Q3-Q1\n", 1269 | " upper = Q3 + 1.5*IQR\n", 1270 | " lower = Q1 - 1.5*IQR\n", 1271 | "\n", 1272 | " if df[(df[feature] > upper) | (df[feature] < lower)].any(axis=None):\n", 1273 | " print(feature,\"yes\")\n", 1274 | " print(df[(df[feature] > upper) | (df[feature] < lower)].shape[0])\n", 1275 | " else:\n", 1276 | " print(feature, \"no\")" 1277 | ] 1278 | }, 1279 | { 1280 | "cell_type": "markdown", 1281 | "execution_count": null, 1282 | "metadata": {}, 1283 | "source": [ 1284 | "# Customer Segmentation with RFM Scores\n", 1285 | "\n", 1286 | "Consists of initials of Recency, Frequency, Monetary expressions.\n", 1287 | "\n", 1288 | "It is a technique that helps determine marketing and sales strategies based on customers' buying habits.\n", 1289 | "\n", 1290 | "- Recency (innovation): Time since customer last purchased\n", 1291 | "\n", 1292 | " -- In other words, it is the “time since the last contact of the customer”.\n", 1293 | "\n", 1294 | " -- Today's date - Last purchase\n", 1295 | "\n", 1296 | " -- To give an example, if we are doing this analysis today, today's date is the last product purchase date.\n", 1297 | "\n", 1298 | " -- This can be for example 20 or 100. We know that 20 customers are hotter. He has been in contact with us recently.\n", 1299 | "\n", 1300 | "- Frequency: Total number of purchases.\n", 1301 | "\n", 1302 | "- Monetary (Monetary Value): Total spending by the customer.\n" 1303 | ] 1304 | }, 1305 | { 1306 | "cell_type": "code", 1307 | "execution_count": 22, 1308 | "metadata": {}, 1309 | "outputs": [ 1310 | { 1311 | "data": { 1312 | "text/html": [ 1313 | "
\n", 1314 | "\n", 1327 | "\n", 1328 | " \n", 1329 | " \n", 1330 | " \n", 1331 | " \n", 1332 | " \n", 1333 | " \n", 1334 | " \n", 1335 | " \n", 1336 | " \n", 1337 | " \n", 1338 | " \n", 1339 | " \n", 1340 | " \n", 1341 | " \n", 1342 | " \n", 1343 | " \n", 1344 | " \n", 1345 | " \n", 1346 | " \n", 1347 | " \n", 1348 | " \n", 1349 | " \n", 1350 | " \n", 1351 | " \n", 1352 | " \n", 1353 | " \n", 1354 | " \n", 1355 | " \n", 1356 | " \n", 1357 | " \n", 1358 | " \n", 1359 | " \n", 1360 | " \n", 1361 | " \n", 1362 | " \n", 1363 | " \n", 1364 | " \n", 1365 | " \n", 1366 | " \n", 1367 | " \n", 1368 | " \n", 1369 | " \n", 1370 | " \n", 1371 | " \n", 1372 | " \n", 1373 | " \n", 1374 | " \n", 1375 | " \n", 1376 | " \n", 1377 | " \n", 1378 | " \n", 1379 | " \n", 1380 | " \n", 1381 | " \n", 1382 | " \n", 1383 | " \n", 1384 | " \n", 1385 | " \n", 1386 | " \n", 1387 | " \n", 1388 | " \n", 1389 | " \n", 1390 | " \n", 1391 | " \n", 1392 | " \n", 1393 | " \n", 1394 | " \n", 1395 | " \n", 1396 | " \n", 1397 | " \n", 1398 | " \n", 1399 | " \n", 1400 | " \n", 1401 | " \n", 1402 | " \n", 1403 | " \n", 1404 | "
InvoiceStockCodeDescriptionQuantityInvoiceDatePriceCustomer IDCountryTotalPrice
04894348504815CM CHRISTMAS GLASS BALL 20 LIGHTS122009-12-01 07:45:00713085United Kingdom83
148943479323PPINK CHERRY LIGHTS122009-12-01 07:45:00713085United Kingdom81
248943479323WWHITE CHERRY LIGHTS122009-12-01 07:45:00713085United Kingdom81
348943422041RECORD FRAME 7\" SINGLE SIZE482009-12-01 07:45:00213085United Kingdom101
448943421232STRAWBERRY CERAMIC TRINKET BOX242009-12-01 07:45:00113085United Kingdom30
\n", 1405 | "
" 1406 | ], 1407 | "text/plain": [ 1408 | " Invoice StockCode Description Quantity \\\n", 1409 | "0 489434 85048 15CM CHRISTMAS GLASS BALL 20 LIGHTS 12 \n", 1410 | "1 489434 79323P PINK CHERRY LIGHTS 12 \n", 1411 | "2 489434 79323W WHITE CHERRY LIGHTS 12 \n", 1412 | "3 489434 22041 RECORD FRAME 7\" SINGLE SIZE 48 \n", 1413 | "4 489434 21232 STRAWBERRY CERAMIC TRINKET BOX 24 \n", 1414 | "\n", 1415 | " InvoiceDate Price Customer ID Country TotalPrice \n", 1416 | "0 2009-12-01 07:45:00 7 13085 United Kingdom 83 \n", 1417 | "1 2009-12-01 07:45:00 7 13085 United Kingdom 81 \n", 1418 | "2 2009-12-01 07:45:00 7 13085 United Kingdom 81 \n", 1419 | "3 2009-12-01 07:45:00 2 13085 United Kingdom 101 \n", 1420 | "4 2009-12-01 07:45:00 1 13085 United Kingdom 30 " 1421 | ] 1422 | }, 1423 | "execution_count": 22, 1424 | "metadata": {}, 1425 | "output_type": "execute_result" 1426 | } 1427 | ], 1428 | "source": [ 1429 | "df.head()" 1430 | ] 1431 | }, 1432 | { 1433 | "cell_type": "code", 1434 | "execution_count": 23, 1435 | "metadata": {}, 1436 | "outputs": [ 1437 | { 1438 | "name": "stdout", 1439 | "output_type": "stream", 1440 | "text": [ 1441 | "\n", 1442 | "Int64Index: 417534 entries, 0 to 525460\n", 1443 | "Data columns (total 9 columns):\n", 1444 | " # Column Non-Null Count Dtype \n", 1445 | "--- ------ -------------- ----- \n", 1446 | " 0 Invoice 417534 non-null object \n", 1447 | " 1 StockCode 417534 non-null object \n", 1448 | " 2 Description 417534 non-null object \n", 1449 | " 3 Quantity 417534 non-null int64 \n", 1450 | " 4 InvoiceDate 417534 non-null datetime64[ns]\n", 1451 | " 5 Price 417534 non-null float64 \n", 1452 | " 6 Customer ID 417534 non-null float64 \n", 1453 | " 7 Country 417534 non-null object \n", 1454 | " 8 TotalPrice 417534 non-null float64 \n", 1455 | "dtypes: datetime64[ns](1), float64(3), int64(1), object(4)\n", 1456 | "memory usage: 31.9+ MB\n" 1457 | ] 1458 | } 1459 | ], 1460 | "source": [ 1461 | "df.info()" 1462 | ] 1463 | }, 1464 | { 1465 | "cell_type": "code", 1466 | "execution_count": 24, 1467 | "metadata": {}, 1468 | "outputs": [ 1469 | { 1470 | "data": { 1471 | "text/plain": [ 1472 | "Timestamp('2009-12-01 07:45:00')" 1473 | ] 1474 | }, 1475 | "execution_count": 24, 1476 | "metadata": {}, 1477 | "output_type": "execute_result" 1478 | } 1479 | ], 1480 | "source": [ 1481 | "df[\"InvoiceDate\"].min()" 1482 | ] 1483 | }, 1484 | { 1485 | "cell_type": "code", 1486 | "execution_count": 25, 1487 | "metadata": {}, 1488 | "outputs": [ 1489 | { 1490 | "data": { 1491 | "text/plain": [ 1492 | "Timestamp('2010-12-09 20:01:00')" 1493 | ] 1494 | }, 1495 | "execution_count": 25, 1496 | "metadata": {}, 1497 | "output_type": "execute_result" 1498 | } 1499 | ], 1500 | "source": [ 1501 | "df[\"InvoiceDate\"].max()" 1502 | ] 1503 | }, 1504 | { 1505 | "cell_type": "markdown", 1506 | "execution_count": null, 1507 | "metadata": {}, 1508 | "source": [ 1509 | "What is today? Now if we take today's date, then there will be a very serious difference.\n", 1510 | "\n", 1511 | "For this reason, let us determine ourselves a \"today\" according to the structure of this data set.\n", 1512 | "\n", 1513 | "We can set this day as the maximum day of the data set.\n", 1514 | "\n", 1515 | "We can segmentation according to the day of the last recording." 1516 | ] 1517 | }, 1518 | { 1519 | "cell_type": "code", 1520 | "execution_count": 26, 1521 | "metadata": {}, 1522 | "outputs": [], 1523 | "source": [ 1524 | "import datetime as dt\n", 1525 | "\n", 1526 | "today_date = dt.datetime(2010,12,9)" 1527 | ] 1528 | }, 1529 | { 1530 | "cell_type": "code", 1531 | "execution_count": 27, 1532 | "metadata": {}, 1533 | "outputs": [ 1534 | { 1535 | "data": { 1536 | "text/plain": [ 1537 | "datetime.datetime(2010, 12, 9, 0, 0)" 1538 | ] 1539 | }, 1540 | "execution_count": 27, 1541 | "metadata": {}, 1542 | "output_type": "execute_result" 1543 | } 1544 | ], 1545 | "source": [ 1546 | "today_date" 1547 | ] 1548 | }, 1549 | { 1550 | "cell_type": "markdown", 1551 | "execution_count": null, 1552 | "metadata": {}, 1553 | "source": [ 1554 | "## 11. Show the last shopping dates of each customer." 1555 | ] 1556 | }, 1557 | { 1558 | "cell_type": "code", 1559 | "execution_count": 28, 1560 | "metadata": {}, 1561 | "outputs": [ 1562 | { 1563 | "data": { 1564 | "text/html": [ 1565 | "
\n", 1566 | "\n", 1579 | "\n", 1580 | " \n", 1581 | " \n", 1582 | " \n", 1583 | " \n", 1584 | " \n", 1585 | " \n", 1586 | " \n", 1587 | " \n", 1588 | " \n", 1589 | " \n", 1590 | " \n", 1591 | " \n", 1592 | " \n", 1593 | " \n", 1594 | " \n", 1595 | " \n", 1596 | " \n", 1597 | " \n", 1598 | " \n", 1599 | " \n", 1600 | " \n", 1601 | " \n", 1602 | " \n", 1603 | " \n", 1604 | " \n", 1605 | " \n", 1606 | " \n", 1607 | " \n", 1608 | " \n", 1609 | " \n", 1610 | " \n", 1611 | " \n", 1612 | "
InvoiceDate
Customer ID
123462010-10-04 16:33:00
123472010-12-07 14:57:00
123482010-09-27 14:59:00
123492010-10-28 08:23:00
123512010-11-29 15:23:00
\n", 1613 | "
" 1614 | ], 1615 | "text/plain": [ 1616 | " InvoiceDate\n", 1617 | "Customer ID \n", 1618 | "12346 2010-10-04 16:33:00\n", 1619 | "12347 2010-12-07 14:57:00\n", 1620 | "12348 2010-09-27 14:59:00\n", 1621 | "12349 2010-10-28 08:23:00\n", 1622 | "12351 2010-11-29 15:23:00" 1623 | ] 1624 | }, 1625 | "execution_count": 28, 1626 | "metadata": {}, 1627 | "output_type": "execute_result" 1628 | } 1629 | ], 1630 | "source": [ 1631 | "df.groupby(\"Customer ID\").agg({\"InvoiceDate\":\"max\"}).head()" 1632 | ] 1633 | }, 1634 | { 1635 | "cell_type": "markdown", 1636 | "execution_count": null, 1637 | "metadata": {}, 1638 | "source": [ 1639 | "Now we have the last shopping dates of each customer. Let's fix \"Customer ID\"s." 1640 | ] 1641 | }, 1642 | { 1643 | "cell_type": "code", 1644 | "execution_count": 29, 1645 | "metadata": {}, 1646 | "outputs": [], 1647 | "source": [ 1648 | "df[\"Customer ID\"] = df[\"Customer ID\"].astype(int)" 1649 | ] 1650 | }, 1651 | { 1652 | "cell_type": "markdown", 1653 | "execution_count": null, 1654 | "metadata": {}, 1655 | "source": [ 1656 | "## 12. What should we do for customer segmentation with RFM?" 1657 | ] 1658 | }, 1659 | { 1660 | "cell_type": "markdown", 1661 | "execution_count": null, 1662 | "metadata": {}, 1663 | "source": [ 1664 | "For each customer, we need to deduce the customers' last purchase date from today's date.\n", 1665 | "\n", 1666 | "Then we have singularized customer deadlines." 1667 | ] 1668 | }, 1669 | { 1670 | "cell_type": "code", 1671 | "execution_count": 30, 1672 | "metadata": {}, 1673 | "outputs": [ 1674 | { 1675 | "data": { 1676 | "text/html": [ 1677 | "
\n", 1678 | "\n", 1691 | "\n", 1692 | " \n", 1693 | " \n", 1694 | " \n", 1695 | " \n", 1696 | " \n", 1697 | " \n", 1698 | " \n", 1699 | " \n", 1700 | " \n", 1701 | " \n", 1702 | " \n", 1703 | " \n", 1704 | " \n", 1705 | " \n", 1706 | " \n", 1707 | " \n", 1708 | " \n", 1709 | " \n", 1710 | " \n", 1711 | " \n", 1712 | " \n", 1713 | " \n", 1714 | " \n", 1715 | " \n", 1716 | " \n", 1717 | " \n", 1718 | " \n", 1719 | " \n", 1720 | " \n", 1721 | " \n", 1722 | " \n", 1723 | " \n", 1724 | "
InvoiceDate
Customer ID
1234665 days 07:27:00
123471 days 09:03:00
1234872 days 09:01:00
1234941 days 15:37:00
123519 days 08:37:00
\n", 1725 | "
" 1726 | ], 1727 | "text/plain": [ 1728 | " InvoiceDate\n", 1729 | "Customer ID \n", 1730 | "12346 65 days 07:27:00\n", 1731 | "12347 1 days 09:03:00\n", 1732 | "12348 72 days 09:01:00\n", 1733 | "12349 41 days 15:37:00\n", 1734 | "12351 9 days 08:37:00" 1735 | ] 1736 | }, 1737 | "execution_count": 30, 1738 | "metadata": {}, 1739 | "output_type": "execute_result" 1740 | } 1741 | ], 1742 | "source": [ 1743 | "(today_date - df.groupby(\"Customer ID\").agg({\"InvoiceDate\":\"max\"})).head()" 1744 | ] 1745 | }, 1746 | { 1747 | "cell_type": "code", 1748 | "execution_count": 31, 1749 | "metadata": {}, 1750 | "outputs": [], 1751 | "source": [ 1752 | "temp_df = (today_date - df.groupby(\"Customer ID\").agg({\"InvoiceDate\":\"max\"}))" 1753 | ] 1754 | }, 1755 | { 1756 | "cell_type": "code", 1757 | "execution_count": 32, 1758 | "metadata": {}, 1759 | "outputs": [], 1760 | "source": [ 1761 | "temp_df.rename(columns={\"InvoiceDate\": \"Recency\"}, inplace = True)" 1762 | ] 1763 | }, 1764 | { 1765 | "cell_type": "code", 1766 | "execution_count": 33, 1767 | "metadata": {}, 1768 | "outputs": [ 1769 | { 1770 | "data": { 1771 | "text/html": [ 1772 | "
\n", 1773 | "\n", 1786 | "\n", 1787 | " \n", 1788 | " \n", 1789 | " \n", 1790 | " \n", 1791 | " \n", 1792 | " \n", 1793 | " \n", 1794 | " \n", 1795 | " \n", 1796 | " \n", 1797 | " \n", 1798 | " \n", 1799 | " \n", 1800 | " \n", 1801 | " \n", 1802 | " \n", 1803 | " \n", 1804 | " \n", 1805 | " \n", 1806 | " \n", 1807 | " \n", 1808 | " \n", 1809 | " \n", 1810 | " \n", 1811 | " \n", 1812 | " \n", 1813 | " \n", 1814 | " \n", 1815 | " \n", 1816 | " \n", 1817 | " \n", 1818 | " \n", 1819 | "
Recency
Customer ID
1234665 days 07:27:00
123471 days 09:03:00
1234872 days 09:01:00
1234941 days 15:37:00
123519 days 08:37:00
\n", 1820 | "
" 1821 | ], 1822 | "text/plain": [ 1823 | " Recency\n", 1824 | "Customer ID \n", 1825 | "12346 65 days 07:27:00\n", 1826 | "12347 1 days 09:03:00\n", 1827 | "12348 72 days 09:01:00\n", 1828 | "12349 41 days 15:37:00\n", 1829 | "12351 9 days 08:37:00" 1830 | ] 1831 | }, 1832 | "execution_count": 33, 1833 | "metadata": {}, 1834 | "output_type": "execute_result" 1835 | } 1836 | ], 1837 | "source": [ 1838 | "temp_df.head()" 1839 | ] 1840 | }, 1841 | { 1842 | "cell_type": "code", 1843 | "execution_count": 34, 1844 | "metadata": {}, 1845 | "outputs": [], 1846 | "source": [ 1847 | "recency_df = temp_df[\"Recency\"].apply(lambda x: x.days)" 1848 | ] 1849 | }, 1850 | { 1851 | "cell_type": "code", 1852 | "execution_count": 35, 1853 | "metadata": {}, 1854 | "outputs": [ 1855 | { 1856 | "data": { 1857 | "text/plain": [ 1858 | "Customer ID\n", 1859 | "12346 65\n", 1860 | "12347 1\n", 1861 | "12348 72\n", 1862 | "12349 41\n", 1863 | "12351 9\n", 1864 | "Name: Recency, dtype: int64" 1865 | ] 1866 | }, 1867 | "execution_count": 35, 1868 | "metadata": {}, 1869 | "output_type": "execute_result" 1870 | } 1871 | ], 1872 | "source": [ 1873 | "recency_df.head()" 1874 | ] 1875 | }, 1876 | { 1877 | "cell_type": "code", 1878 | "execution_count": 36, 1879 | "metadata": {}, 1880 | "outputs": [], 1881 | "source": [ 1882 | "#df.groupby(\"Customer ID\").agg({\"InvoiceDate\": lambda x: (today_date - x.max()).days}).head()" 1883 | ] 1884 | }, 1885 | { 1886 | "cell_type": "markdown", 1887 | "execution_count": null, 1888 | "metadata": {}, 1889 | "source": [ 1890 | "# Frequency" 1891 | ] 1892 | }, 1893 | { 1894 | "cell_type": "code", 1895 | "execution_count": 37, 1896 | "metadata": {}, 1897 | "outputs": [], 1898 | "source": [ 1899 | "temp_df = df.groupby([\"Customer ID\",\"Invoice\"]).agg({\"Invoice\":\"count\"})" 1900 | ] 1901 | }, 1902 | { 1903 | "cell_type": "code", 1904 | "execution_count": 38, 1905 | "metadata": {}, 1906 | "outputs": [ 1907 | { 1908 | "data": { 1909 | "text/html": [ 1910 | "
\n", 1911 | "\n", 1924 | "\n", 1925 | " \n", 1926 | " \n", 1927 | " \n", 1928 | " \n", 1929 | " \n", 1930 | " \n", 1931 | " \n", 1932 | " \n", 1933 | " \n", 1934 | " \n", 1935 | " \n", 1936 | " \n", 1937 | " \n", 1938 | " \n", 1939 | " \n", 1940 | " \n", 1941 | " \n", 1942 | " \n", 1943 | " \n", 1944 | " \n", 1945 | " \n", 1946 | " \n", 1947 | " \n", 1948 | " \n", 1949 | " \n", 1950 | " \n", 1951 | " \n", 1952 | " \n", 1953 | " \n", 1954 | " \n", 1955 | " \n", 1956 | " \n", 1957 | " \n", 1958 | " \n", 1959 | " \n", 1960 | "
Invoice
Customer IDInvoice
123464917251
4917421
4917441
4927181
4927221
\n", 1961 | "
" 1962 | ], 1963 | "text/plain": [ 1964 | " Invoice\n", 1965 | "Customer ID Invoice \n", 1966 | "12346 491725 1\n", 1967 | " 491742 1\n", 1968 | " 491744 1\n", 1969 | " 492718 1\n", 1970 | " 492722 1" 1971 | ] 1972 | }, 1973 | "execution_count": 38, 1974 | "metadata": {}, 1975 | "output_type": "execute_result" 1976 | } 1977 | ], 1978 | "source": [ 1979 | "temp_df.head()" 1980 | ] 1981 | }, 1982 | { 1983 | "cell_type": "code", 1984 | "execution_count": 39, 1985 | "metadata": {}, 1986 | "outputs": [ 1987 | { 1988 | "data": { 1989 | "text/html": [ 1990 | "
\n", 1991 | "\n", 2004 | "\n", 2005 | " \n", 2006 | " \n", 2007 | " \n", 2008 | " \n", 2009 | " \n", 2010 | " \n", 2011 | " \n", 2012 | " \n", 2013 | " \n", 2014 | " \n", 2015 | " \n", 2016 | " \n", 2017 | " \n", 2018 | " \n", 2019 | " \n", 2020 | " \n", 2021 | " \n", 2022 | " \n", 2023 | " \n", 2024 | " \n", 2025 | " \n", 2026 | " \n", 2027 | " \n", 2028 | " \n", 2029 | " \n", 2030 | " \n", 2031 | " \n", 2032 | " \n", 2033 | " \n", 2034 | " \n", 2035 | " \n", 2036 | " \n", 2037 | "
Invoice
Customer ID
1234615
123472
123481
123494
123511
\n", 2038 | "
" 2039 | ], 2040 | "text/plain": [ 2041 | " Invoice\n", 2042 | "Customer ID \n", 2043 | "12346 15\n", 2044 | "12347 2\n", 2045 | "12348 1\n", 2046 | "12349 4\n", 2047 | "12351 1" 2048 | ] 2049 | }, 2050 | "execution_count": 39, 2051 | "metadata": {}, 2052 | "output_type": "execute_result" 2053 | } 2054 | ], 2055 | "source": [ 2056 | "temp_df.groupby(\"Customer ID\").agg({\"Invoice\":\"count\"}).head()" 2057 | ] 2058 | }, 2059 | { 2060 | "cell_type": "code", 2061 | "execution_count": 40, 2062 | "metadata": {}, 2063 | "outputs": [ 2064 | { 2065 | "data": { 2066 | "text/html": [ 2067 | "
\n", 2068 | "\n", 2081 | "\n", 2082 | " \n", 2083 | " \n", 2084 | " \n", 2085 | " \n", 2086 | " \n", 2087 | " \n", 2088 | " \n", 2089 | " \n", 2090 | " \n", 2091 | " \n", 2092 | " \n", 2093 | " \n", 2094 | " \n", 2095 | " \n", 2096 | " \n", 2097 | " \n", 2098 | " \n", 2099 | " \n", 2100 | " \n", 2101 | " \n", 2102 | " \n", 2103 | " \n", 2104 | " \n", 2105 | " \n", 2106 | " \n", 2107 | " \n", 2108 | " \n", 2109 | " \n", 2110 | " \n", 2111 | " \n", 2112 | " \n", 2113 | " \n", 2114 | "
Frequency
Customer ID
1234646
1234771
1234820
12349107
1235121
\n", 2115 | "
" 2116 | ], 2117 | "text/plain": [ 2118 | " Frequency\n", 2119 | "Customer ID \n", 2120 | "12346 46\n", 2121 | "12347 71\n", 2122 | "12348 20\n", 2123 | "12349 107\n", 2124 | "12351 21" 2125 | ] 2126 | }, 2127 | "execution_count": 40, 2128 | "metadata": {}, 2129 | "output_type": "execute_result" 2130 | } 2131 | ], 2132 | "source": [ 2133 | "freq_df = temp_df.groupby(\"Customer ID\").agg({\"Invoice\":\"sum\"})\n", 2134 | "freq_df.rename(columns={\"Invoice\": \"Frequency\"}, inplace = True)\n", 2135 | "freq_df.head()" 2136 | ] 2137 | }, 2138 | { 2139 | "cell_type": "markdown", 2140 | "execution_count": null, 2141 | "metadata": {}, 2142 | "source": [ 2143 | "# Monetary" 2144 | ] 2145 | }, 2146 | { 2147 | "cell_type": "code", 2148 | "execution_count": 41, 2149 | "metadata": {}, 2150 | "outputs": [], 2151 | "source": [ 2152 | "monetary_df = df.groupby(\"Customer ID\").agg({\"TotalPrice\":\"sum\"})" 2153 | ] 2154 | }, 2155 | { 2156 | "cell_type": "code", 2157 | "execution_count": 42, 2158 | "metadata": {}, 2159 | "outputs": [ 2160 | { 2161 | "data": { 2162 | "text/html": [ 2163 | "
\n", 2164 | "\n", 2177 | "\n", 2178 | " \n", 2179 | " \n", 2180 | " \n", 2181 | " \n", 2182 | " \n", 2183 | " \n", 2184 | " \n", 2185 | " \n", 2186 | " \n", 2187 | " \n", 2188 | " \n", 2189 | " \n", 2190 | " \n", 2191 | " \n", 2192 | " \n", 2193 | " \n", 2194 | " \n", 2195 | " \n", 2196 | " \n", 2197 | " \n", 2198 | " \n", 2199 | " \n", 2200 | " \n", 2201 | " \n", 2202 | " \n", 2203 | " \n", 2204 | " \n", 2205 | " \n", 2206 | " \n", 2207 | " \n", 2208 | " \n", 2209 | " \n", 2210 | "
TotalPrice
Customer ID
12346-65
123471323
12348222
123492647
12351301
\n", 2211 | "
" 2212 | ], 2213 | "text/plain": [ 2214 | " TotalPrice\n", 2215 | "Customer ID \n", 2216 | "12346 -65\n", 2217 | "12347 1323\n", 2218 | "12348 222\n", 2219 | "12349 2647\n", 2220 | "12351 301" 2221 | ] 2222 | }, 2223 | "execution_count": 42, 2224 | "metadata": {}, 2225 | "output_type": "execute_result" 2226 | } 2227 | ], 2228 | "source": [ 2229 | "monetary_df.head()" 2230 | ] 2231 | }, 2232 | { 2233 | "cell_type": "code", 2234 | "execution_count": 43, 2235 | "metadata": {}, 2236 | "outputs": [], 2237 | "source": [ 2238 | "# lets change names\n", 2239 | "\n", 2240 | "monetary_df.rename(columns={\"TotalPrice\": \"Monetary\"}, inplace = True)" 2241 | ] 2242 | }, 2243 | { 2244 | "cell_type": "code", 2245 | "execution_count": 44, 2246 | "metadata": {}, 2247 | "outputs": [ 2248 | { 2249 | "name": "stdout", 2250 | "output_type": "stream", 2251 | "text": [ 2252 | "(4383,) (4383, 1) (4383, 1)\n" 2253 | ] 2254 | } 2255 | ], 2256 | "source": [ 2257 | "print(recency_df.shape,freq_df.shape,monetary_df.shape)" 2258 | ] 2259 | }, 2260 | { 2261 | "cell_type": "code", 2262 | "execution_count": 45, 2263 | "metadata": {}, 2264 | "outputs": [], 2265 | "source": [ 2266 | "rfm = pd.concat([recency_df, freq_df, monetary_df], axis=1)" 2267 | ] 2268 | }, 2269 | { 2270 | "cell_type": "code", 2271 | "execution_count": 46, 2272 | "metadata": {}, 2273 | "outputs": [ 2274 | { 2275 | "data": { 2276 | "text/html": [ 2277 | "
\n", 2278 | "\n", 2291 | "\n", 2292 | " \n", 2293 | " \n", 2294 | " \n", 2295 | " \n", 2296 | " \n", 2297 | " \n", 2298 | " \n", 2299 | " \n", 2300 | " \n", 2301 | " \n", 2302 | " \n", 2303 | " \n", 2304 | " \n", 2305 | " \n", 2306 | " \n", 2307 | " \n", 2308 | " \n", 2309 | " \n", 2310 | " \n", 2311 | " \n", 2312 | " \n", 2313 | " \n", 2314 | " \n", 2315 | " \n", 2316 | " \n", 2317 | " \n", 2318 | " \n", 2319 | " \n", 2320 | " \n", 2321 | " \n", 2322 | " \n", 2323 | " \n", 2324 | " \n", 2325 | " \n", 2326 | " \n", 2327 | " \n", 2328 | " \n", 2329 | " \n", 2330 | " \n", 2331 | " \n", 2332 | " \n", 2333 | " \n", 2334 | " \n", 2335 | " \n", 2336 | " \n", 2337 | " \n", 2338 | "
RecencyFrequencyMonetary
Customer ID
123466546-65
123471711323
123487220222
12349411072647
12351921301
\n", 2339 | "
" 2340 | ], 2341 | "text/plain": [ 2342 | " Recency Frequency Monetary\n", 2343 | "Customer ID \n", 2344 | "12346 65 46 -65\n", 2345 | "12347 1 71 1323\n", 2346 | "12348 72 20 222\n", 2347 | "12349 41 107 2647\n", 2348 | "12351 9 21 301" 2349 | ] 2350 | }, 2351 | "execution_count": 46, 2352 | "metadata": {}, 2353 | "output_type": "execute_result" 2354 | } 2355 | ], 2356 | "source": [ 2357 | "rfm.head()" 2358 | ] 2359 | }, 2360 | { 2361 | "cell_type": "markdown", 2362 | "execution_count": null, 2363 | "metadata": {}, 2364 | "source": [ 2365 | "## Now, we need to score according to the most recent (Recency), the cyclic (Frequency) and the monetary expenditure (Monetary)." 2366 | ] 2367 | }, 2368 | { 2369 | "cell_type": "markdown", 2370 | "execution_count": null, 2371 | "metadata": {}, 2372 | "source": [ 2373 | "## 13. Scoring for RFM\n", 2374 | "\n", 2375 | "- Let's start with the last 5 here. Let's use the 'qcut' method to score." 2376 | ] 2377 | }, 2378 | { 2379 | "cell_type": "code", 2380 | "execution_count": 47, 2381 | "metadata": {}, 2382 | "outputs": [], 2383 | "source": [ 2384 | "rfm[\"RecencyScore\"] = pd.qcut(rfm['Recency'], 5, labels = [5, 4, 3, 2, 1]) " 2385 | ] 2386 | }, 2387 | { 2388 | "cell_type": "code", 2389 | "execution_count": 48, 2390 | "metadata": {}, 2391 | "outputs": [], 2392 | "source": [ 2393 | "rfm[\"FrequencyScore\"] = pd.qcut(rfm['Frequency'].rank(method = \"first\"), 5, labels = [1, 2, 3, 4, 5])" 2394 | ] 2395 | }, 2396 | { 2397 | "cell_type": "code", 2398 | "execution_count": 49, 2399 | "metadata": {}, 2400 | "outputs": [], 2401 | "source": [ 2402 | "rfm[\"MonetaryScore\"] = pd.qcut(rfm['Monetary'], 5, labels = [1, 2, 3, 4, 5])" 2403 | ] 2404 | }, 2405 | { 2406 | "cell_type": "code", 2407 | "execution_count": 50, 2408 | "metadata": {}, 2409 | "outputs": [ 2410 | { 2411 | "data": { 2412 | "text/html": [ 2413 | "
\n", 2414 | "\n", 2427 | "\n", 2428 | " \n", 2429 | " \n", 2430 | " \n", 2431 | " \n", 2432 | " \n", 2433 | " \n", 2434 | " \n", 2435 | " \n", 2436 | " \n", 2437 | " \n", 2438 | " \n", 2439 | " \n", 2440 | " \n", 2441 | " \n", 2442 | " \n", 2443 | " \n", 2444 | " \n", 2445 | " \n", 2446 | " \n", 2447 | " \n", 2448 | " \n", 2449 | " \n", 2450 | " \n", 2451 | " \n", 2452 | " \n", 2453 | " \n", 2454 | " \n", 2455 | " \n", 2456 | " \n", 2457 | " \n", 2458 | " \n", 2459 | " \n", 2460 | " \n", 2461 | " \n", 2462 | " \n", 2463 | " \n", 2464 | " \n", 2465 | " \n", 2466 | " \n", 2467 | " \n", 2468 | " \n", 2469 | " \n", 2470 | " \n", 2471 | " \n", 2472 | " \n", 2473 | " \n", 2474 | " \n", 2475 | " \n", 2476 | " \n", 2477 | " \n", 2478 | " \n", 2479 | " \n", 2480 | " \n", 2481 | " \n", 2482 | " \n", 2483 | " \n", 2484 | " \n", 2485 | " \n", 2486 | " \n", 2487 | " \n", 2488 | " \n", 2489 | " \n", 2490 | " \n", 2491 | " \n", 2492 | " \n", 2493 | " \n", 2494 | " \n", 2495 | "
RecencyFrequencyMonetaryRecencyScoreFrequencyScoreMonetaryScore
Customer ID
123466546-65331
123471711323544
123487220222221
12349411072647345
12351921301522
\n", 2496 | "
" 2497 | ], 2498 | "text/plain": [ 2499 | " Recency Frequency Monetary RecencyScore FrequencyScore \\\n", 2500 | "Customer ID \n", 2501 | "12346 65 46 -65 3 3 \n", 2502 | "12347 1 71 1323 5 4 \n", 2503 | "12348 72 20 222 2 2 \n", 2504 | "12349 41 107 2647 3 4 \n", 2505 | "12351 9 21 301 5 2 \n", 2506 | "\n", 2507 | " MonetaryScore \n", 2508 | "Customer ID \n", 2509 | "12346 1 \n", 2510 | "12347 4 \n", 2511 | "12348 1 \n", 2512 | "12349 5 \n", 2513 | "12351 2 " 2514 | ] 2515 | }, 2516 | "execution_count": 50, 2517 | "metadata": {}, 2518 | "output_type": "execute_result" 2519 | } 2520 | ], 2521 | "source": [ 2522 | "rfm.head()" 2523 | ] 2524 | }, 2525 | { 2526 | "cell_type": "markdown", 2527 | "execution_count": null, 2528 | "metadata": {}, 2529 | "source": [ 2530 | "Let's write code with RFM values side by side" 2531 | ] 2532 | }, 2533 | { 2534 | "cell_type": "code", 2535 | "execution_count": 51, 2536 | "metadata": {}, 2537 | "outputs": [ 2538 | { 2539 | "data": { 2540 | "text/plain": [ 2541 | "Customer ID\n", 2542 | "12346 331\n", 2543 | "12347 544\n", 2544 | "12348 221\n", 2545 | "12349 345\n", 2546 | "12351 522\n", 2547 | "dtype: object" 2548 | ] 2549 | }, 2550 | "execution_count": 51, 2551 | "metadata": {}, 2552 | "output_type": "execute_result" 2553 | } 2554 | ], 2555 | "source": [ 2556 | "(rfm['RecencyScore'].astype(str) + \n", 2557 | " rfm['FrequencyScore'].astype(str) + \n", 2558 | " rfm['MonetaryScore'].astype(str)).head()" 2559 | ] 2560 | }, 2561 | { 2562 | "cell_type": "code", 2563 | "execution_count": 52, 2564 | "metadata": {}, 2565 | "outputs": [], 2566 | "source": [ 2567 | "rfm[\"RFM_SCORE\"] = rfm['RecencyScore'].astype(str) + rfm['FrequencyScore'].astype(str) + rfm['MonetaryScore'].astype(str)" 2568 | ] 2569 | }, 2570 | { 2571 | "cell_type": "code", 2572 | "execution_count": 53, 2573 | "metadata": {}, 2574 | "outputs": [ 2575 | { 2576 | "data": { 2577 | "text/html": [ 2578 | "
\n", 2579 | "\n", 2592 | "\n", 2593 | " \n", 2594 | " \n", 2595 | " \n", 2596 | " \n", 2597 | " \n", 2598 | " \n", 2599 | " \n", 2600 | " \n", 2601 | " \n", 2602 | " \n", 2603 | " \n", 2604 | " \n", 2605 | " \n", 2606 | " \n", 2607 | " \n", 2608 | " \n", 2609 | " \n", 2610 | " \n", 2611 | " \n", 2612 | " \n", 2613 | " \n", 2614 | " \n", 2615 | " \n", 2616 | " \n", 2617 | " \n", 2618 | " \n", 2619 | " \n", 2620 | " \n", 2621 | " \n", 2622 | " \n", 2623 | " \n", 2624 | " \n", 2625 | " \n", 2626 | " \n", 2627 | " \n", 2628 | " \n", 2629 | " \n", 2630 | " \n", 2631 | " \n", 2632 | " \n", 2633 | " \n", 2634 | " \n", 2635 | " \n", 2636 | " \n", 2637 | " \n", 2638 | " \n", 2639 | " \n", 2640 | " \n", 2641 | " \n", 2642 | " \n", 2643 | " \n", 2644 | " \n", 2645 | " \n", 2646 | " \n", 2647 | " \n", 2648 | " \n", 2649 | " \n", 2650 | " \n", 2651 | " \n", 2652 | " \n", 2653 | " \n", 2654 | " \n", 2655 | " \n", 2656 | " \n", 2657 | " \n", 2658 | " \n", 2659 | " \n", 2660 | " \n", 2661 | " \n", 2662 | " \n", 2663 | " \n", 2664 | " \n", 2665 | " \n", 2666 | " \n", 2667 | "
RecencyFrequencyMonetaryRecencyScoreFrequencyScoreMonetaryScoreRFM_SCORE
Customer ID
123466546-65331331
123471711323544544
123487220222221221
12349411072647345345
12351921301522522
\n", 2668 | "
" 2669 | ], 2670 | "text/plain": [ 2671 | " Recency Frequency Monetary RecencyScore FrequencyScore \\\n", 2672 | "Customer ID \n", 2673 | "12346 65 46 -65 3 3 \n", 2674 | "12347 1 71 1323 5 4 \n", 2675 | "12348 72 20 222 2 2 \n", 2676 | "12349 41 107 2647 3 4 \n", 2677 | "12351 9 21 301 5 2 \n", 2678 | "\n", 2679 | " MonetaryScore RFM_SCORE \n", 2680 | "Customer ID \n", 2681 | "12346 1 331 \n", 2682 | "12347 4 544 \n", 2683 | "12348 1 221 \n", 2684 | "12349 5 345 \n", 2685 | "12351 2 522 " 2686 | ] 2687 | }, 2688 | "execution_count": 53, 2689 | "metadata": {}, 2690 | "output_type": "execute_result" 2691 | } 2692 | ], 2693 | "source": [ 2694 | "rfm.head()" 2695 | ] 2696 | }, 2697 | { 2698 | "cell_type": "code", 2699 | "execution_count": 54, 2700 | "metadata": {}, 2701 | "outputs": [ 2702 | { 2703 | "data": { 2704 | "text/html": [ 2705 | "
\n", 2706 | "\n", 2719 | "\n", 2720 | " \n", 2721 | " \n", 2722 | " \n", 2723 | " \n", 2724 | " \n", 2725 | " \n", 2726 | " \n", 2727 | " \n", 2728 | " \n", 2729 | " \n", 2730 | " \n", 2731 | " \n", 2732 | " \n", 2733 | " \n", 2734 | " \n", 2735 | " \n", 2736 | " \n", 2737 | " \n", 2738 | " \n", 2739 | " \n", 2740 | " \n", 2741 | " \n", 2742 | " \n", 2743 | " \n", 2744 | " \n", 2745 | " \n", 2746 | " \n", 2747 | " \n", 2748 | " \n", 2749 | " \n", 2750 | " \n", 2751 | " \n", 2752 | " \n", 2753 | " \n", 2754 | " \n", 2755 | " \n", 2756 | " \n", 2757 | " \n", 2758 | " \n", 2759 | " \n", 2760 | " \n", 2761 | " \n", 2762 | " \n", 2763 | " \n", 2764 | " \n", 2765 | " \n", 2766 | " \n", 2767 | " \n", 2768 | "
countmeanstdmin25%50%75%max
Recency43838998-11550136372
Frequency438395205118441035710
Monetary438319058519-251112856561646341777
\n", 2769 | "
" 2770 | ], 2771 | "text/plain": [ 2772 | " count mean std min 25% 50% 75% max\n", 2773 | "Recency 4383 89 98 -1 15 50 136 372\n", 2774 | "Frequency 4383 95 205 1 18 44 103 5710\n", 2775 | "Monetary 4383 1905 8519 -25111 285 656 1646 341777" 2776 | ] 2777 | }, 2778 | "execution_count": 54, 2779 | "metadata": {}, 2780 | "output_type": "execute_result" 2781 | } 2782 | ], 2783 | "source": [ 2784 | "rfm.describe().T" 2785 | ] 2786 | }, 2787 | { 2788 | "cell_type": "markdown", 2789 | "execution_count": null, 2790 | "metadata": {}, 2791 | "source": [ 2792 | "If we rank 5 points out of 3, 555 are champions." 2793 | ] 2794 | }, 2795 | { 2796 | "cell_type": "code", 2797 | "execution_count": 55, 2798 | "metadata": {}, 2799 | "outputs": [ 2800 | { 2801 | "data": { 2802 | "text/html": [ 2803 | "
\n", 2804 | "\n", 2817 | "\n", 2818 | " \n", 2819 | " \n", 2820 | " \n", 2821 | " \n", 2822 | " \n", 2823 | " \n", 2824 | " \n", 2825 | " \n", 2826 | " \n", 2827 | " \n", 2828 | " \n", 2829 | " \n", 2830 | " \n", 2831 | " \n", 2832 | " \n", 2833 | " \n", 2834 | " \n", 2835 | " \n", 2836 | " \n", 2837 | " \n", 2838 | " \n", 2839 | " \n", 2840 | " \n", 2841 | " \n", 2842 | " \n", 2843 | " \n", 2844 | " \n", 2845 | " \n", 2846 | " \n", 2847 | " \n", 2848 | " \n", 2849 | " \n", 2850 | " \n", 2851 | " \n", 2852 | " \n", 2853 | " \n", 2854 | " \n", 2855 | " \n", 2856 | " \n", 2857 | " \n", 2858 | " \n", 2859 | " \n", 2860 | " \n", 2861 | " \n", 2862 | " \n", 2863 | " \n", 2864 | " \n", 2865 | " \n", 2866 | " \n", 2867 | " \n", 2868 | " \n", 2869 | " \n", 2870 | " \n", 2871 | " \n", 2872 | " \n", 2873 | " \n", 2874 | " \n", 2875 | " \n", 2876 | " \n", 2877 | " \n", 2878 | " \n", 2879 | " \n", 2880 | " \n", 2881 | " \n", 2882 | " \n", 2883 | " \n", 2884 | " \n", 2885 | " \n", 2886 | " \n", 2887 | " \n", 2888 | " \n", 2889 | " \n", 2890 | " \n", 2891 | " \n", 2892 | "
RecencyFrequencyMonetaryRecencyScoreFrequencyScoreMonetaryScoreRFM_SCORE
Customer ID
12415921219544555555
1243171734303555555
1243302877053555555
12471676719208555555
12472365810727555555
\n", 2893 | "
" 2894 | ], 2895 | "text/plain": [ 2896 | " Recency Frequency Monetary RecencyScore FrequencyScore \\\n", 2897 | "Customer ID \n", 2898 | "12415 9 212 19544 5 5 \n", 2899 | "12431 7 173 4303 5 5 \n", 2900 | "12433 0 287 7053 5 5 \n", 2901 | "12471 6 767 19208 5 5 \n", 2902 | "12472 3 658 10727 5 5 \n", 2903 | "\n", 2904 | " MonetaryScore RFM_SCORE \n", 2905 | "Customer ID \n", 2906 | "12415 5 555 \n", 2907 | "12431 5 555 \n", 2908 | "12433 5 555 \n", 2909 | "12471 5 555 \n", 2910 | "12472 5 555 " 2911 | ] 2912 | }, 2913 | "execution_count": 55, 2914 | "metadata": {}, 2915 | "output_type": "execute_result" 2916 | } 2917 | ], 2918 | "source": [ 2919 | "rfm[rfm[\"RFM_SCORE\"] == \"555\"].head()" 2920 | ] 2921 | }, 2922 | { 2923 | "cell_type": "markdown", 2924 | "execution_count": null, 2925 | "metadata": {}, 2926 | "source": [ 2927 | "If we rank 1 point out of 3, that is, 111 ones are the lowest." 2928 | ] 2929 | }, 2930 | { 2931 | "cell_type": "code", 2932 | "execution_count": 56, 2933 | "metadata": {}, 2934 | "outputs": [ 2935 | { 2936 | "data": { 2937 | "text/html": [ 2938 | "
\n", 2939 | "\n", 2952 | "\n", 2953 | " \n", 2954 | " \n", 2955 | " \n", 2956 | " \n", 2957 | " \n", 2958 | " \n", 2959 | " \n", 2960 | " \n", 2961 | " \n", 2962 | " \n", 2963 | " \n", 2964 | " \n", 2965 | " \n", 2966 | " \n", 2967 | " \n", 2968 | " \n", 2969 | " \n", 2970 | " \n", 2971 | " \n", 2972 | " \n", 2973 | " \n", 2974 | " \n", 2975 | " \n", 2976 | " \n", 2977 | " \n", 2978 | " \n", 2979 | " \n", 2980 | " \n", 2981 | " \n", 2982 | " \n", 2983 | " \n", 2984 | " \n", 2985 | " \n", 2986 | " \n", 2987 | " \n", 2988 | " \n", 2989 | " \n", 2990 | " \n", 2991 | " \n", 2992 | " \n", 2993 | " \n", 2994 | " \n", 2995 | " \n", 2996 | " \n", 2997 | " \n", 2998 | " \n", 2999 | " \n", 3000 | " \n", 3001 | " \n", 3002 | " \n", 3003 | " \n", 3004 | " \n", 3005 | " \n", 3006 | " \n", 3007 | " \n", 3008 | " \n", 3009 | " \n", 3010 | " \n", 3011 | " \n", 3012 | " \n", 3013 | " \n", 3014 | " \n", 3015 | " \n", 3016 | " \n", 3017 | " \n", 3018 | " \n", 3019 | " \n", 3020 | " \n", 3021 | " \n", 3022 | " \n", 3023 | " \n", 3024 | " \n", 3025 | " \n", 3026 | " \n", 3027 | "
RecencyFrequencyMonetaryRecencyScoreFrequencyScoreMonetaryScoreRFM_SCORE
Customer ID
123623721130111111
123823161-18111111
12404316163111111
1241629011203111111
12466316157111111
\n", 3028 | "
" 3029 | ], 3030 | "text/plain": [ 3031 | " Recency Frequency Monetary RecencyScore FrequencyScore \\\n", 3032 | "Customer ID \n", 3033 | "12362 372 1 130 1 1 \n", 3034 | "12382 316 1 -18 1 1 \n", 3035 | "12404 316 1 63 1 1 \n", 3036 | "12416 290 11 203 1 1 \n", 3037 | "12466 316 1 57 1 1 \n", 3038 | "\n", 3039 | " MonetaryScore RFM_SCORE \n", 3040 | "Customer ID \n", 3041 | "12362 1 111 \n", 3042 | "12382 1 111 \n", 3043 | "12404 1 111 \n", 3044 | "12416 1 111 \n", 3045 | "12466 1 111 " 3046 | ] 3047 | }, 3048 | "execution_count": 56, 3049 | "metadata": {}, 3050 | "output_type": "execute_result" 3051 | } 3052 | ], 3053 | "source": [ 3054 | "rfm[rfm[\"RFM_SCORE\"] == \"111\"].head()" 3055 | ] 3056 | }, 3057 | { 3058 | "cell_type": "markdown", 3059 | "execution_count": null, 3060 | "metadata": {}, 3061 | "source": [ 3062 | "Let's do regex segmentation. With the help of regex, we will set rfm aside and consider r and f.\n", 3063 | "\n", 3064 | "Example: If you see 1-2 in r and 1-2 in f, write 'Hibernating'" 3065 | ] 3066 | }, 3067 | { 3068 | "cell_type": "code", 3069 | "execution_count": 57, 3070 | "metadata": {}, 3071 | "outputs": [], 3072 | "source": [ 3073 | "seg_map = {\n", 3074 | " r'[1-2][1-2]': 'Hibernating',\n", 3075 | " r'[1-2][3-4]': 'At Risk',\n", 3076 | " r'[1-2]5': 'Can\\'t Loose',\n", 3077 | " r'3[1-2]': 'About to Sleep',\n", 3078 | " r'33': 'Need Attention',\n", 3079 | " r'[3-4][4-5]': 'Loyal Customers',\n", 3080 | " r'41': 'Promising',\n", 3081 | " r'51': 'New Customers',\n", 3082 | " r'[4-5][2-3]': 'Potential Loyalists',\n", 3083 | " r'5[4-5]': 'Champions'\n", 3084 | "}" 3085 | ] 3086 | }, 3087 | { 3088 | "cell_type": "code", 3089 | "execution_count": 58, 3090 | "metadata": {}, 3091 | "outputs": [ 3092 | { 3093 | "data": { 3094 | "text/html": [ 3095 | "
\n", 3096 | "\n", 3109 | "\n", 3110 | " \n", 3111 | " \n", 3112 | " \n", 3113 | " \n", 3114 | " \n", 3115 | " \n", 3116 | " \n", 3117 | " \n", 3118 | " \n", 3119 | " \n", 3120 | " \n", 3121 | " \n", 3122 | " \n", 3123 | " \n", 3124 | " \n", 3125 | " \n", 3126 | " \n", 3127 | " \n", 3128 | " \n", 3129 | " \n", 3130 | " \n", 3131 | " \n", 3132 | " \n", 3133 | " \n", 3134 | " \n", 3135 | " \n", 3136 | " \n", 3137 | " \n", 3138 | " \n", 3139 | " \n", 3140 | " \n", 3141 | " \n", 3142 | " \n", 3143 | " \n", 3144 | " \n", 3145 | " \n", 3146 | " \n", 3147 | " \n", 3148 | " \n", 3149 | " \n", 3150 | " \n", 3151 | " \n", 3152 | " \n", 3153 | " \n", 3154 | " \n", 3155 | " \n", 3156 | " \n", 3157 | " \n", 3158 | " \n", 3159 | " \n", 3160 | " \n", 3161 | " \n", 3162 | " \n", 3163 | " \n", 3164 | " \n", 3165 | " \n", 3166 | " \n", 3167 | " \n", 3168 | " \n", 3169 | " \n", 3170 | " \n", 3171 | " \n", 3172 | " \n", 3173 | " \n", 3174 | " \n", 3175 | " \n", 3176 | " \n", 3177 | " \n", 3178 | " \n", 3179 | " \n", 3180 | " \n", 3181 | " \n", 3182 | " \n", 3183 | " \n", 3184 | " \n", 3185 | " \n", 3186 | " \n", 3187 | " \n", 3188 | " \n", 3189 | " \n", 3190 | " \n", 3191 | "
RecencyFrequencyMonetaryRecencyScoreFrequencyScoreMonetaryScoreRFM_SCORESegment
Customer ID
123466546-65331331Need Attention
123471711323544544Champions
123487220222221221Hibernating
12349411072647345345Loyal Customers
12351921301522522Potential Loyalists
\n", 3192 | "
" 3193 | ], 3194 | "text/plain": [ 3195 | " Recency Frequency Monetary RecencyScore FrequencyScore \\\n", 3196 | "Customer ID \n", 3197 | "12346 65 46 -65 3 3 \n", 3198 | "12347 1 71 1323 5 4 \n", 3199 | "12348 72 20 222 2 2 \n", 3200 | "12349 41 107 2647 3 4 \n", 3201 | "12351 9 21 301 5 2 \n", 3202 | "\n", 3203 | " MonetaryScore RFM_SCORE Segment \n", 3204 | "Customer ID \n", 3205 | "12346 1 331 Need Attention \n", 3206 | "12347 4 544 Champions \n", 3207 | "12348 1 221 Hibernating \n", 3208 | "12349 5 345 Loyal Customers \n", 3209 | "12351 2 522 Potential Loyalists " 3210 | ] 3211 | }, 3212 | "execution_count": 58, 3213 | "metadata": {}, 3214 | "output_type": "execute_result" 3215 | } 3216 | ], 3217 | "source": [ 3218 | "rfm['Segment'] = rfm['RecencyScore'].astype(str) + rfm['FrequencyScore'].astype(str)\n", 3219 | "rfm['Segment'] = rfm['Segment'].replace(seg_map, regex=True)\n", 3220 | "rfm.head()" 3221 | ] 3222 | }, 3223 | { 3224 | "cell_type": "code", 3225 | "execution_count": 59, 3226 | "metadata": {}, 3227 | "outputs": [ 3228 | { 3229 | "data": { 3230 | "text/html": [ 3231 | "
\n", 3232 | "\n", 3249 | "\n", 3250 | " \n", 3251 | " \n", 3252 | " \n", 3253 | " \n", 3254 | " \n", 3255 | " \n", 3256 | " \n", 3257 | " \n", 3258 | " \n", 3259 | " \n", 3260 | " \n", 3261 | " \n", 3262 | " \n", 3263 | " \n", 3264 | " \n", 3265 | " \n", 3266 | " \n", 3267 | " \n", 3268 | " \n", 3269 | " \n", 3270 | " \n", 3271 | " \n", 3272 | " \n", 3273 | " \n", 3274 | " \n", 3275 | " \n", 3276 | " \n", 3277 | " \n", 3278 | " \n", 3279 | " \n", 3280 | " \n", 3281 | " \n", 3282 | " \n", 3283 | " \n", 3284 | " \n", 3285 | " \n", 3286 | " \n", 3287 | " \n", 3288 | " \n", 3289 | " \n", 3290 | " \n", 3291 | " \n", 3292 | " \n", 3293 | " \n", 3294 | " \n", 3295 | " \n", 3296 | " \n", 3297 | " \n", 3298 | " \n", 3299 | " \n", 3300 | " \n", 3301 | " \n", 3302 | " \n", 3303 | " \n", 3304 | " \n", 3305 | " \n", 3306 | " \n", 3307 | " \n", 3308 | " \n", 3309 | " \n", 3310 | " \n", 3311 | " \n", 3312 | " \n", 3313 | " \n", 3314 | " \n", 3315 | " \n", 3316 | " \n", 3317 | " \n", 3318 | " \n", 3319 | " \n", 3320 | " \n", 3321 | " \n", 3322 | " \n", 3323 | " \n", 3324 | " \n", 3325 | " \n", 3326 | " \n", 3327 | " \n", 3328 | " \n", 3329 | " \n", 3330 | " \n", 3331 | " \n", 3332 | " \n", 3333 | " \n", 3334 | " \n", 3335 | " \n", 3336 | " \n", 3337 | " \n", 3338 | " \n", 3339 | " \n", 3340 | " \n", 3341 | " \n", 3342 | " \n", 3343 | " \n", 3344 | " \n", 3345 | " \n", 3346 | " \n", 3347 | " \n", 3348 | " \n", 3349 | " \n", 3350 | " \n", 3351 | " \n", 3352 | " \n", 3353 | " \n", 3354 | " \n", 3355 | " \n", 3356 | " \n", 3357 | " \n", 3358 | " \n", 3359 | " \n", 3360 | " \n", 3361 | " \n", 3362 | " \n", 3363 | " \n", 3364 | " \n", 3365 | " \n", 3366 | " \n", 3367 | " \n", 3368 | "
RecencyFrequencyMonetary
meancountmeancountmeancount
Segment
About to Sleep5134615346383346
At Risk161620596201062620
Can't Loose1219422894287694
Champions56672726676534667
Hibernating20910241310242761024
Loyal Customers357681707682533768
Need Attention5016746167857167
New Customers66576544165
Potential Loyalists1653437534910534
Promising239889843698
\n", 3369 | "
" 3370 | ], 3371 | "text/plain": [ 3372 | " Recency Frequency Monetary \n", 3373 | " mean count mean count mean count\n", 3374 | "Segment \n", 3375 | "About to Sleep 51 346 15 346 383 346\n", 3376 | "At Risk 161 620 59 620 1062 620\n", 3377 | "Can't Loose 121 94 228 94 2876 94\n", 3378 | "Champions 5 667 272 667 6534 667\n", 3379 | "Hibernating 209 1024 13 1024 276 1024\n", 3380 | "Loyal Customers 35 768 170 768 2533 768\n", 3381 | "Need Attention 50 167 46 167 857 167\n", 3382 | "New Customers 6 65 7 65 441 65\n", 3383 | "Potential Loyalists 16 534 37 534 910 534\n", 3384 | "Promising 23 98 8 98 436 98" 3385 | ] 3386 | }, 3387 | "execution_count": 59, 3388 | "metadata": {}, 3389 | "output_type": "execute_result" 3390 | } 3391 | ], 3392 | "source": [ 3393 | "rfm[[\"Segment\", \"Recency\",\"Frequency\",\"Monetary\"]].groupby(\"Segment\").agg([\"mean\",\"count\"])" 3394 | ] 3395 | }, 3396 | { 3397 | "cell_type": "markdown", 3398 | "execution_count": null, 3399 | "metadata": {}, 3400 | "source": [ 3401 | "## If we need to comment, let's make an example of champions.\n", 3402 | "\n", 3403 | "- Recency is the last 666 number of shopping last 5,\n", 3404 | "- Frequency average of 272 out of 666 how much shopping it makes,\n", 3405 | "- Monetary has spent an average of 6533 currencies over 666 shoppers." 3406 | ] 3407 | }, 3408 | { 3409 | "cell_type": "markdown", 3410 | "execution_count": null, 3411 | "metadata": {}, 3412 | "source": [ 3413 | "Now, let's choose the class (Need Attention) that needs attention.\n", 3414 | "If we make strategy evaluations: you can take their \"Customer ID\" and keep it in excel, send sales department and prepare a campaign for them and make it more efficient." 3415 | ] 3416 | }, 3417 | { 3418 | "cell_type": "code", 3419 | "execution_count": 60, 3420 | "metadata": {}, 3421 | "outputs": [ 3422 | { 3423 | "data": { 3424 | "text/html": [ 3425 | "
\n", 3426 | "\n", 3439 | "\n", 3440 | " \n", 3441 | " \n", 3442 | " \n", 3443 | " \n", 3444 | " \n", 3445 | " \n", 3446 | " \n", 3447 | " \n", 3448 | " \n", 3449 | " \n", 3450 | " \n", 3451 | " \n", 3452 | " \n", 3453 | " \n", 3454 | " \n", 3455 | " \n", 3456 | " \n", 3457 | " \n", 3458 | " \n", 3459 | " \n", 3460 | " \n", 3461 | " \n", 3462 | " \n", 3463 | " \n", 3464 | " \n", 3465 | " \n", 3466 | " \n", 3467 | " \n", 3468 | " \n", 3469 | " \n", 3470 | " \n", 3471 | " \n", 3472 | " \n", 3473 | " \n", 3474 | " \n", 3475 | " \n", 3476 | " \n", 3477 | " \n", 3478 | " \n", 3479 | " \n", 3480 | " \n", 3481 | " \n", 3482 | " \n", 3483 | " \n", 3484 | " \n", 3485 | " \n", 3486 | " \n", 3487 | " \n", 3488 | " \n", 3489 | " \n", 3490 | " \n", 3491 | " \n", 3492 | " \n", 3493 | " \n", 3494 | " \n", 3495 | " \n", 3496 | " \n", 3497 | " \n", 3498 | " \n", 3499 | " \n", 3500 | " \n", 3501 | " \n", 3502 | " \n", 3503 | " \n", 3504 | " \n", 3505 | " \n", 3506 | " \n", 3507 | " \n", 3508 | " \n", 3509 | " \n", 3510 | " \n", 3511 | " \n", 3512 | " \n", 3513 | " \n", 3514 | " \n", 3515 | " \n", 3516 | " \n", 3517 | " \n", 3518 | " \n", 3519 | " \n", 3520 | " \n", 3521 | "
RecencyFrequencyMonetaryRecencyScoreFrequencyScoreMonetaryScoreRFM_SCORESegment
Customer ID
123466546-65331331Need Attention
1237455502246335335Need Attention
123795641768333333Need Attention
1238936491433334334Need Attention
124256459904333333Need Attention
\n", 3522 | "
" 3523 | ], 3524 | "text/plain": [ 3525 | " Recency Frequency Monetary RecencyScore FrequencyScore \\\n", 3526 | "Customer ID \n", 3527 | "12346 65 46 -65 3 3 \n", 3528 | "12374 55 50 2246 3 3 \n", 3529 | "12379 56 41 768 3 3 \n", 3530 | "12389 36 49 1433 3 3 \n", 3531 | "12425 64 59 904 3 3 \n", 3532 | "\n", 3533 | " MonetaryScore RFM_SCORE Segment \n", 3534 | "Customer ID \n", 3535 | "12346 1 331 Need Attention \n", 3536 | "12374 5 335 Need Attention \n", 3537 | "12379 3 333 Need Attention \n", 3538 | "12389 4 334 Need Attention \n", 3539 | "12425 3 333 Need Attention " 3540 | ] 3541 | }, 3542 | "execution_count": 60, 3543 | "metadata": {}, 3544 | "output_type": "execute_result" 3545 | } 3546 | ], 3547 | "source": [ 3548 | "rfm[rfm[\"Segment\"] == \"Need Attention\"].head()" 3549 | ] 3550 | }, 3551 | { 3552 | "cell_type": "markdown", 3553 | "execution_count": null, 3554 | "metadata": {}, 3555 | "source": [ 3556 | "## 14. Finally, create an excel file named New Customer." 3557 | ] 3558 | }, 3559 | { 3560 | "cell_type": "code", 3561 | "execution_count": 61, 3562 | "metadata": {}, 3563 | "outputs": [ 3564 | { 3565 | "data": { 3566 | "text/plain": [ 3567 | "Int64Index([12386, 12427, 12441, 12538, 12686, 12738, 13010, 13011, 13029,\n", 3568 | " 13094, 13145, 13254, 13258, 13270, 13369, 13747, 13848, 14119,\n", 3569 | " 14213, 14306, 14491, 14576, 14589, 14865, 14987, 15018, 15181,\n", 3570 | " 15212, 15299, 15304, 15649, 15728, 15899, 15914, 15922, 15973,\n", 3571 | " 16194, 16473, 16545, 16552, 16711, 16752, 16988, 16995, 17026,\n", 3572 | " 17170, 17181, 17262, 17281, 17339, 17378, 17468, 17556, 17616,\n", 3573 | " 17674, 17723, 17857, 17870, 17924, 17925, 17951, 18084, 18113,\n", 3574 | " 18161, 18269],\n", 3575 | " dtype='int64', name='Customer ID')" 3576 | ] 3577 | }, 3578 | "execution_count": 61, 3579 | "metadata": {}, 3580 | "output_type": "execute_result" 3581 | } 3582 | ], 3583 | "source": [ 3584 | "rfm[rfm[\"Segment\"] == \"New Customers\"].index" 3585 | ] 3586 | }, 3587 | { 3588 | "cell_type": "code", 3589 | "execution_count": 62, 3590 | "metadata": {}, 3591 | "outputs": [], 3592 | "source": [ 3593 | "new_df = pd.DataFrame()\n", 3594 | "new_df[\"NewCustomerID\"] = rfm[rfm[\"Segment\"] == \"New Customers\"].index" 3595 | ] 3596 | }, 3597 | { 3598 | "cell_type": "code", 3599 | "execution_count": 63, 3600 | "metadata": {}, 3601 | "outputs": [ 3602 | { 3603 | "data": { 3604 | "text/html": [ 3605 | "
\n", 3606 | "\n", 3619 | "\n", 3620 | " \n", 3621 | " \n", 3622 | " \n", 3623 | " \n", 3624 | " \n", 3625 | " \n", 3626 | " \n", 3627 | " \n", 3628 | " \n", 3629 | " \n", 3630 | " \n", 3631 | " \n", 3632 | " \n", 3633 | " \n", 3634 | " \n", 3635 | " \n", 3636 | " \n", 3637 | " \n", 3638 | " \n", 3639 | " \n", 3640 | " \n", 3641 | " \n", 3642 | " \n", 3643 | " \n", 3644 | " \n", 3645 | " \n", 3646 | " \n", 3647 | " \n", 3648 | "
NewCustomerID
012386
112427
212441
312538
412686
\n", 3649 | "
" 3650 | ], 3651 | "text/plain": [ 3652 | " NewCustomerID\n", 3653 | "0 12386\n", 3654 | "1 12427\n", 3655 | "2 12441\n", 3656 | "3 12538\n", 3657 | "4 12686" 3658 | ] 3659 | }, 3660 | "execution_count": 63, 3661 | "metadata": {}, 3662 | "output_type": "execute_result" 3663 | } 3664 | ], 3665 | "source": [ 3666 | "new_df.head()" 3667 | ] 3668 | }, 3669 | { 3670 | "cell_type": "code", 3671 | "execution_count": 64, 3672 | "metadata": {}, 3673 | "outputs": [], 3674 | "source": [ 3675 | "new_df.to_csv(\"new_customers.csv\")" 3676 | ] 3677 | }, 3678 | { 3679 | "cell_type": "markdown", 3680 | "execution_count": null, 3681 | "metadata": {}, 3682 | "source": [ 3683 | "\n", 3684 | "\n", 3685 | "# Conclusion\n", 3686 | "\n", 3687 | " After this notebook, my aim is to prepare 'kernel' which is 'not clear' data set.\n", 3688 | "\n", 3689 | " If you have any suggestions, please could you write for me? I wil be happy for comment and critics!\n", 3690 | "\n", 3691 | " Thank you for your suggestion and votes ;)\n", 3692 | "\n" 3693 | ] 3694 | }, 3695 | { 3696 | "cell_type": "code", 3697 | "execution_count": null, 3698 | "metadata": {}, 3699 | "outputs": [], 3700 | "source": [] 3701 | } 3702 | ], 3703 | "metadata": { 3704 | "kernelspec": { 3705 | "display_name": "Python 3", 3706 | "language": "python", 3707 | "name": "python3" 3708 | }, 3709 | "language_info": { 3710 | "codemirror_mode": { 3711 | "name": "ipython", 3712 | "version": 3 3713 | }, 3714 | "file_extension": ".py", 3715 | "mimetype": "text/x-python", 3716 | "name": "python", 3717 | "nbconvert_exporter": "python", 3718 | "pygments_lexer": "ipython3", 3719 | "version": "3.7.6" 3720 | } 3721 | }, 3722 | "nbformat": 4, 3723 | "nbformat_minor": 4 3724 | } 3725 | --------------------------------------------------------------------------------