├── LICENSE ├── README.md ├── Section 2 ├── 00_market_basket_analysis.ipynb ├── 01_prepare_data.ipynb ├── 02_apriori_algorithm.ipynb ├── 03_association_rules.ipynb ├── 04_visualize_results.ipynb ├── groceries.csv └── grocery_transactions.csv ├── Section 3 ├── 01_curse_of_dimensionality.ipynb ├── 02_pca_key_ideas.ipynb ├── 03_the_math_behind_pca.ipynb ├── 04_wholesale_data.ipynb └── wholesale_customers_data.csv └── Section 4 ├── 01_concepts.ipynb ├── 02_kmeans_implementation.ipynb ├── 03_kmeans_evaluation.ipynb ├── 04_wholesale_data.ipynb └── data └── wholesale_customers_data.csv /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Packt 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | # Hands-On Unsupervised Learning with Python [Video] 5 | This is the code repository for [Hands-On Unsupervised Learning with Python [Video]](https://www.packtpub.com/application-development/hands-unsupervised-learning-python-video?utm_source=github&utm_medium=repository&utm_campaign=9781788992855), published by [Packt](https://www.packtpub.com/?utm_source=github). It contains all the supporting project files necessary to work through the video course from start to finish. 6 | ## About the Video Course 7 | Use Unsupervised Learning tools like Market Basket Analysis, Principal Component Analysis, and Clustering algorithms to discover and extract hidden yet valuable structure in your customer data. Start by building your own recommendation engine using association rules that result from market basket analysis. Extract informative signals from noisy data using principal component analysis. Capitalize on the ability of cluster algorithms to identify natural groupings of your data to optimize, for instance, the targeting of your marketing efforts. 8 | 9 | After watching this course and experimenting with the provided code, you will have required the requisite skills to apply key principles of Unsupervised Learning using Python. 10 | 11 |

What You Will Learn

12 |
13 |
23 | 24 | ## Instructions and Navigation 25 | ### Assumed Knowledge 26 | To fully benefit from the coverage included in this course, you will need:
27 | Prior Python programming experience is a requirement, and experience with data analysis and machine learning analysis will be helpful. 28 | ### Technical Requirements 29 | This course has the following software requirements:
30 | Minimum Hardware Requirements
For successful completion of this course, students will require the computer systems with at least the following:
Recommended Hardware Requirements
For an optimal experience with hands-on labs and other practical activities, we recommend the following configuration:
Software Requirements
31 | ## Related Products 32 | * [Hands-On Machine Learning with Python and Scikit-Learn [Video]](https://www.packtpub.com/big-data-and-business-intelligence/hands-machine-learning-python-and-scikit-learn-video?utm_source=github&utm_medium=repository&utm_campaign=9781788991056) 33 | 34 | * [Hands-on Machine Learning with TensorFlow [Video]](https://www.packtpub.com/big-data-and-business-intelligence/hands-machine-learning-tensorflow-video?utm_source=github&utm_medium=repository&utm_campaign=9781789136999) 35 | 36 | * [Hands-On Test Driven Development with Python [Video]](https://www.packtpub.com/application-development/hands-test-driven-development-python-video?utm_source=github&utm_medium=repository&utm_campaign=9781789138313) 37 | 38 | ### Download a free PDF 39 | 40 | If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.
41 |

https://packt.link/free-ebook/9781789348279

-------------------------------------------------------------------------------- /Section 2/00_market_basket_analysis.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 3, 6 | "metadata": { 7 | "hide_input": false, 8 | "scrolled": true, 9 | "slideshow": { 10 | "slide_type": "skip" 11 | } 12 | }, 13 | "outputs": [], 14 | "source": [ 15 | "from jupyterthemes import jtplot\n", 16 | "jtplot.style(theme='onedork', context='talk', fscale=1.4, spines=False, gridlines='--', ticks=True, grid=False, figsize=(6, 4.5))\n", 17 | "from os.path import join\n", 18 | "import pandas as pd\n", 19 | "import numpy as np\n", 20 | "import seaborn as sns\n", 21 | "current_palette = sns.color_palette()\n", 22 | "%matplotlib inline\n", 23 | "import matplotlib.pyplot as plt\n", 24 | "from matplotlib_venn import venn2\n", 25 | "from matplotlib import rcParams\n", 26 | "from matplotlib.ticker import FuncFormatter\n", 27 | "from scipy.stats import fisher_exact\n", 28 | "from ipywidgets import interact, IntSlider, FloatSlider" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": { 34 | "hide_input": true, 35 | "slideshow": { 36 | "slide_type": "slide" 37 | } 38 | }, 39 | "source": [ 40 | "### Simulation of Association Rule Metrics" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": { 47 | "hide_input": true, 48 | "scrolled": false, 49 | "slideshow": { 50 | "slide_type": "fragment" 51 | } 52 | }, 53 | "outputs": [ 54 | { 55 | "data": { 56 | "application/vnd.jupyter.widget-view+json": { 57 | "model_id": "39028cfe907840698f088856e4edfd7b", 58 | "version_major": 2, 59 | "version_minor": 0 60 | }, 61 | "text/html": [ 62 | "

Failed to display Jupyter Widget of type interactive.

\n", 63 | "

\n", 64 | " If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean\n", 65 | " that the widgets JavaScript is still loading. If this message persists, it\n", 66 | " likely means that the widgets JavaScript library is either not installed or\n", 67 | " not enabled. See the Jupyter\n", 68 | " Widgets Documentation for setup instructions.\n", 69 | "

\n", 70 | "

\n", 71 | " If you're reading this message in another frontend (for example, a static\n", 72 | " rendering on GitHub or NBViewer),\n", 73 | " it may mean that your frontend doesn't currently support widgets.\n", 74 | "

\n" 75 | ], 76 | "text/plain": [ 77 | "interactive(children=(IntSlider(value=100, description='antecedent', max=1000, min=5, step=5), IntSlider(value=100, description='consequent', max=1000, min=5, step=5), FloatSlider(value=0.5, description='joint_percent', max=1.0, min=0.01), IntSlider(value=500, description='total', max=1000, min=10, step=10), Output()), _dom_classes=('widget-interact',))" 78 | ] 79 | }, 80 | "metadata": {}, 81 | "output_type": "display_data" 82 | } 83 | ], 84 | "source": [ 85 | "total_widget = IntSlider(min=10, max=1000, step=10, value=500)\n", 86 | "antecedent_widget = IntSlider(min=5, max=1000, step=5, value=100)\n", 87 | "consequent_widget = IntSlider(min=5, max=1000, step=5, value=100)\n", 88 | "joint_widget = FloatSlider(min=.01, max=1.0, value=.5)\n", 89 | "\n", 90 | "def plot_metrics(antecedent, consequent, joint_percent, total):\n", 91 | " \"\"\"Interactive Venn Diagram of joint transactions and plot of support, confidence, and lift \n", 92 | " Slider Inputs:\n", 93 | " - total: total transactions for all itemsets\n", 94 | " - antecedent, consequent: all transactions involving either itemset\n", 95 | " - joint_percent: percentage of (smaller of) antecedent/consequent involving both itemsets\n", 96 | "\n", 97 | " Venn Diagram Calculations: \n", 98 | " - joint = joint_percent * min(antecedent, consequent)\n", 99 | " - antecedent, consequent: original values - joint transactions\n", 100 | "\n", 101 | " Metric Calculations:\n", 102 | " - Support Antecedent: antecedent/total\n", 103 | " - Support Consequent: Consequent/total\n", 104 | " - Support Joint Transactions: joint/total\n", 105 | " - Rule Confidence: Support Joint Transactions / total\n", 106 | " - Rule Lift: Support Joint Transactions / (Support Antecedent * Support Consequent)\n", 107 | " \"\"\"\n", 108 | "\n", 109 | " fig = plt.figure(figsize=(15, 8))\n", 110 | " ax1 = plt.subplot2grid((2, 2), (0, 0)) \n", 111 | " ax2 = plt.subplot2grid((2, 2), (0, 1))\n", 112 | " ax3 = plt.subplot2grid((2, 2), (1, 0))\n", 113 | " ax4 = plt.subplot2grid((2, 2), (1, 1))\n", 114 | " \n", 115 | " \n", 116 | " joint = int(joint_percent * min(antecedent, consequent))\n", 117 | " \n", 118 | " contingency_table = [[joint, consequent - joint], [antecedent - joint, max(total - antecedent - consequent + joint, 0)]]\n", 119 | " contingency_df = pd.DataFrame(contingency_table, columns=['Consequent', 'Not Consequent'], index=['Antecedent', 'Not Antecedent']).astype(int)\n", 120 | " sns.heatmap(contingency_df, ax=ax1, annot=True, cmap='Blues', square=True, vmin=0, vmax=total, fmt='.0f')\n", 121 | " ax1.set_title('Contingency Table')\n", 122 | " \n", 123 | " v = venn2(subsets=(antecedent - joint, consequent - joint, joint),\n", 124 | " set_labels=['Antecedent', 'Consequent'],\n", 125 | " set_colors=current_palette[:2],\n", 126 | " ax=ax2)\n", 127 | " ax2.set_title(\"{} Transactions\".format(total))\n", 128 | "\n", 129 | " support_antecedent = antecedent / total\n", 130 | " support_consequent = consequent / total\n", 131 | "\n", 132 | " support = pd.Series({'Antecedent': support_antecedent,\n", 133 | " 'Consequent': support_consequent})\n", 134 | " support.plot(kind='bar', ax=ax3,\n", 135 | " color=current_palette[:2], title='Support', ylim=(0, 1), rot=0)\n", 136 | " ax3.yaxis.set_major_formatter(\n", 137 | " FuncFormatter(lambda y, _: '{:.0%}'.format(y)))\n", 138 | "\n", 139 | " support_joint = joint / total\n", 140 | " confidence = support_joint / support_antecedent\n", 141 | " lift = support_joint / (support_antecedent * support_consequent)\n", 142 | "\n", 143 | " _, pvalue = fisher_exact(contingency_table, alternative='greater')\n", 144 | "\n", 145 | " metrics = pd.Series(\n", 146 | " {'Confidence': confidence, 'Lift': lift, 'p-Value': pvalue})\n", 147 | " metrics.plot(kind='bar', ax=ax4,\n", 148 | " color=current_palette[2:5], rot=0, ylim=(0, 2))\n", 149 | " ax3.yaxis.set_major_formatter(\n", 150 | " FuncFormatter(lambda y, _: '{:.0%}'.format(y)))\n", 151 | "\n", 152 | " for ax, series in {ax3: support, ax4: metrics}.items():\n", 153 | " rects = ax.patches\n", 154 | " labels = ['{:.0%}'.format(x) for x in series.tolist()]\n", 155 | " for rect, label in zip(rects, labels):\n", 156 | " height = min(rect.get_height() + .01, 2.05)\n", 157 | " ax.text(rect.get_x() + rect.get_width() / 2,\n", 158 | " height, label, ha='center', va='bottom')\n", 159 | "\n", 160 | " plt.suptitle('Assocation Rule Analysis {Antecedent => Consequent}')\n", 161 | " plt.tight_layout()\n", 162 | " plt.subplots_adjust(top=0.9)\n", 163 | " plt.show()\n", 164 | "\n", 165 | "interact(plot_metrics,\n", 166 | " antecedent=antecedent_widget,\n", 167 | " consequent=consequent_widget,\n", 168 | " joint_percent=joint_widget,\n", 169 | " total=total_widget);" 170 | ] 171 | } 172 | ], 173 | "metadata": { 174 | "celltoolbar": "Slideshow", 175 | "hide_input": true, 176 | "kernelspec": { 177 | "display_name": "Python 3", 178 | "language": "python", 179 | "name": "python3" 180 | }, 181 | "language_info": { 182 | "codemirror_mode": { 183 | "name": "ipython", 184 | "version": 3 185 | }, 186 | "file_extension": ".py", 187 | "mimetype": "text/x-python", 188 | "name": "python", 189 | "nbconvert_exporter": "python", 190 | "pygments_lexer": "ipython3", 191 | "version": "3.6.3" 192 | } 193 | }, 194 | "nbformat": 4, 195 | "nbformat_minor": 2 196 | } 197 | -------------------------------------------------------------------------------- /Section 2/02_apriori_algorithm.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "slideshow": { 7 | "slide_type": "skip" 8 | } 9 | }, 10 | "source": [ 11 | "### Imports" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 1, 17 | "metadata": { 18 | "slideshow": { 19 | "slide_type": "skip" 20 | } 21 | }, 22 | "outputs": [], 23 | "source": [ 24 | "import pandas as pd\n", 25 | "import numpy as np\n", 26 | "from itertools import combinations\n", 27 | "from time import time\n", 28 | "from jupyterthemes import jtplot\n", 29 | "jtplot.style(theme='onedork', context='talk', fscale=1.8, spines=False, gridlines='--', ticks=True, grid=False, figsize=(12, 8))" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": { 35 | "slideshow": { 36 | "slide_type": "slide" 37 | } 38 | }, 39 | "source": [ 40 | "### Load the data & build the product-transaction matrix" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 2, 46 | "metadata": { 47 | "hide_input": false, 48 | "slideshow": { 49 | "slide_type": "fragment" 50 | } 51 | }, 52 | "outputs": [], 53 | "source": [ 54 | "def get_transaction_data():\n", 55 | " \"\"\"Load groceries transaction data into DataFrame\"\"\"\n", 56 | " df = pd.read_csv('grocery_transactions.csv')\n", 57 | " df = df.stack().reset_index(-1, drop=True)\n", 58 | " df.index.names = ['tx_id']\n", 59 | " df = pd.get_dummies(df, prefix='', prefix_sep='')\n", 60 | " return df.groupby(level='tx_id').sum()" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": { 66 | "slideshow": { 67 | "slide_type": "slide" 68 | } 69 | }, 70 | "source": [ 71 | "### Create itemset candidates" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 11, 77 | "metadata": { 78 | "slideshow": { 79 | "slide_type": "fragment" 80 | } 81 | }, 82 | "outputs": [ 83 | { 84 | "name": "stdout", 85 | "output_type": "stream", 86 | "text": [ 87 | "0 Instant food products\n", 88 | "1 UHT-milk\n", 89 | "2 abrasive cleaner\n", 90 | "3 artif. sweetener\n", 91 | "4 baby cosmetics\n", 92 | "dtype: object\n", 93 | "(9834, 169)\n" 94 | ] 95 | } 96 | ], 97 | "source": [ 98 | "data = get_transaction_data()\n", 99 | "\n", 100 | "item_id = pd.Series(dict(enumerate(data.columns)))\n", 101 | "print(item_id.head())\n", 102 | "transactions = data.values\n", 103 | "print(transactions.shape)\n", 104 | "\n", 105 | "min_support = 0.01\n", 106 | "item_length = 1\n", 107 | "candidates = list(zip(item_id.index))\n", 108 | "candidates_tested = 0\n", 109 | "itemsets = pd.DataFrame(columns=['support', 'length'])" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 4, 115 | "metadata": { 116 | "slideshow": { 117 | "slide_type": "fragment" 118 | } 119 | }, 120 | "outputs": [ 121 | { 122 | "data": { 123 | "text/plain": [ 124 | "[(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)]" 125 | ] 126 | }, 127 | "execution_count": 4, 128 | "metadata": {}, 129 | "output_type": "execute_result" 130 | } 131 | ], 132 | "source": [ 133 | "candidates[:10]" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": { 139 | "slideshow": { 140 | "slide_type": "slide" 141 | } 142 | }, 143 | "source": [ 144 | "### Candidate Generation" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 5, 150 | "metadata": { 151 | "slideshow": { 152 | "slide_type": "fragment" 153 | } 154 | }, 155 | "outputs": [ 156 | { 157 | "name": "stdout", 158 | "output_type": "stream", 159 | "text": [ 160 | "Length 1: 169 [(0,), (1,), (2,), (3,), (4,)]\n", 161 | "Length 2: 14,196 [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5)]\n", 162 | "Length 3: 790,244 [(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 1, 5), (0, 1, 6)]\n", 163 | "Length 4: 32,795,126 [(0, 1, 2, 3), (0, 1, 2, 4), (0, 1, 2, 5), (0, 1, 2, 6), (0, 1, 2, 7)]\n" 164 | ] 165 | } 166 | ], 167 | "source": [ 168 | "for i in range(1, 5):\n", 169 | " remaining_items = np.unique([item for t in candidates for item in t])\n", 170 | " new_candidates = list(combinations(remaining_items, r=i))\n", 171 | " print('Length {}: {:>10,.0f}'.format(i, len(new_candidates)), \n", 172 | " new_candidates[:5])" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": { 178 | "slideshow": { 179 | "slide_type": "slide" 180 | } 181 | }, 182 | "source": [ 183 | "### The apriori pruning based on support" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 6, 189 | "metadata": { 190 | "slideshow": { 191 | "slide_type": "fragment" 192 | } 193 | }, 194 | "outputs": [], 195 | "source": [ 196 | "def prune_candidates(all_txn, candidates, candidate_size, min_support):\n", 197 | " \"\"\"Return DataFrame with itemsets of candidate_size with min_support\n", 198 | " all_txn: numpy array of transaction-product matrix\n", 199 | " candidates: list of tuples containing product id\n", 200 | " candidate_size: length of item set\n", 201 | " min_support: support threshold\n", 202 | " \"\"\"\n", 203 | " itemsets = {}\n", 204 | " for candidate in candidates:\n", 205 | " candidate_txn = all_txn[:, candidate].reshape(-1, candidate_size) \n", 206 | " relevant_txn = candidate_txn[(candidate_txn == 1).all(axis=1)]\n", 207 | " candidate_support = relevant_txn.shape[0] / all_txn.shape[0]\n", 208 | " if candidate_support >= min_support:\n", 209 | " itemsets[frozenset(candidate)] = candidate_support\n", 210 | " result = pd.Series(itemsets).to_frame('support')\n", 211 | " return result.assign(length=candidate_size) " 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": { 217 | "slideshow": { 218 | "slide_type": "slide" 219 | } 220 | }, 221 | "source": [ 222 | "### Running the apriori algorithm" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 7, 228 | "metadata": { 229 | "slideshow": { 230 | "slide_type": "fragment" 231 | } 232 | }, 233 | "outputs": [ 234 | { 235 | "name": "stdout", 236 | "output_type": "stream", 237 | "text": [ 238 | "Itemset Length 1\tCandidates: 169\tNew Items: 88\n", 239 | "Itemset Length 2\tCandidates: 3,828\tNew Items: 213\n", 240 | "Itemset Length 3\tCandidates: 16,215\tNew Items: 32\n", 241 | "Itemset Length 4\tCandidates: 3,060\tNew Items: 0\n", 242 | "\n", 243 | "Potential Itemsets: 748,288,838,313,422,294,120,286,634,350,736,906,063,837,462,003,712 \n", 244 | "Tested Itemsets: 23,272\n" 245 | ] 246 | } 247 | ], 248 | "source": [ 249 | "while candidates:\n", 250 | " new_items = prune_candidates(\n", 251 | " transactions, candidates, item_length, min_support)\n", 252 | " itemsets = itemsets.append(new_items)\n", 253 | " candidates_tested += len(candidates)\n", 254 | " print('Itemset Length {}\\tCandidates: {:>7,.0f}\\tNew Items: {:>7,.0f}'\n", 255 | " .format(item_length, len(candidates), len(new_items)))\n", 256 | " item_length += 1\n", 257 | " remaining_items = np.unique([item for t in new_items.index for item in t])\n", 258 | " candidates = list(combinations(remaining_items, r=item_length))\n", 259 | "print('\\nPotential Itemsets: {:,.0f} \\nTested Itemsets: {:,.0f}'.format(\n", 260 | " 2**len(item_id) - 1, candidates_tested))" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 8, 266 | "metadata": { 267 | "slideshow": { 268 | "slide_type": "slide" 269 | } 270 | }, 271 | "outputs": [ 272 | { 273 | "name": "stdout", 274 | "output_type": "stream", 275 | "text": [ 276 | "\n", 277 | "Index: 333 entries, (1) to (162, 166, 167)\n", 278 | "Data columns (total 2 columns):\n", 279 | "support 333 non-null float64\n", 280 | "length 333 non-null object\n", 281 | "dtypes: float64(1), object(1)\n", 282 | "memory usage: 7.8+ KB\n" 283 | ] 284 | }, 285 | { 286 | "data": { 287 | "text/html": [ 288 | "
\n", 289 | "\n", 302 | "\n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | "
supportlength
(166)0.2555421
(103)0.1935121
(123)0.1839541
(139)0.1743951
(167)0.1395161
\n", 338 | "
" 339 | ], 340 | "text/plain": [ 341 | " support length\n", 342 | "(166) 0.255542 1\n", 343 | "(103) 0.193512 1\n", 344 | "(123) 0.183954 1\n", 345 | "(139) 0.174395 1\n", 346 | "(167) 0.139516 1" 347 | ] 348 | }, 349 | "execution_count": 8, 350 | "metadata": {}, 351 | "output_type": "execute_result" 352 | } 353 | ], 354 | "source": [ 355 | "itemsets.info()\n", 356 | "itemsets.sort_values('support', ascending=False).head()" 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": 9, 362 | "metadata": { 363 | "slideshow": { 364 | "slide_type": "slide" 365 | } 366 | }, 367 | "outputs": [ 368 | { 369 | "data": { 370 | "text/plain": [ 371 | "1 88\n", 372 | "2 213\n", 373 | "3 32\n", 374 | "Name: length, dtype: int64" 375 | ] 376 | }, 377 | "execution_count": 9, 378 | "metadata": {}, 379 | "output_type": "execute_result" 380 | } 381 | ], 382 | "source": [ 383 | "itemsets.length.value_counts().sort_index()" 384 | ] 385 | }, 386 | { 387 | "cell_type": "code", 388 | "execution_count": 10, 389 | "metadata": { 390 | "slideshow": { 391 | "slide_type": "fragment" 392 | } 393 | }, 394 | "outputs": [ 395 | { 396 | "data": { 397 | "text/plain": [ 398 | "count 333.000000\n", 399 | "mean 0.025071\n", 400 | "std 0.027325\n", 401 | "min 0.010067\n", 402 | "25% 0.011897\n", 403 | "50% 0.016270\n", 404 | "75% 0.026032\n", 405 | "max 0.255542\n", 406 | "Name: support, dtype: float64" 407 | ] 408 | }, 409 | "execution_count": 10, 410 | "metadata": {}, 411 | "output_type": "execute_result" 412 | } 413 | ], 414 | "source": [ 415 | "itemsets.support.describe()" 416 | ] 417 | } 418 | ], 419 | "metadata": { 420 | "celltoolbar": "Slideshow", 421 | "hide_input": false, 422 | "kernelspec": { 423 | "display_name": "Python 3", 424 | "language": "python", 425 | "name": "python3" 426 | }, 427 | "language_info": { 428 | "codemirror_mode": { 429 | "name": "ipython", 430 | "version": 3 431 | }, 432 | "file_extension": ".py", 433 | "mimetype": "text/x-python", 434 | "name": "python", 435 | "nbconvert_exporter": "python", 436 | "pygments_lexer": "ipython3", 437 | "version": "3.6.3" 438 | } 439 | }, 440 | "nbformat": 4, 441 | "nbformat_minor": 2 442 | } 443 | -------------------------------------------------------------------------------- /Section 2/03_association_rules.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "slideshow": { 7 | "slide_type": "skip" 8 | } 9 | }, 10 | "source": [ 11 | "### Imports" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 1, 17 | "metadata": { 18 | "slideshow": { 19 | "slide_type": "skip" 20 | } 21 | }, 22 | "outputs": [], 23 | "source": [ 24 | "import pandas as pd\n", 25 | "import numpy as np\n", 26 | "from itertools import combinations\n", 27 | "from time import time\n", 28 | "from scipy.stats import fisher_exact\n", 29 | "from jupyterthemes import jtplot\n", 30 | "jtplot.style(theme='onedork', context='talk', fscale=1.8, spines=False, gridlines='--', ticks=True, grid=False, figsize=(12, 8))\n", 31 | "import warnings\n", 32 | "warnings.filterwarnings('ignore')" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": { 38 | "slideshow": { 39 | "slide_type": "skip" 40 | } 41 | }, 42 | "source": [ 43 | "### Load the data & build the product-transaction matrix" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 2, 49 | "metadata": { 50 | "hide_input": false, 51 | "slideshow": { 52 | "slide_type": "skip" 53 | } 54 | }, 55 | "outputs": [], 56 | "source": [ 57 | "def get_transaction_data():\n", 58 | " \"\"\"Load groceries transaction data into DataFrame\"\"\"\n", 59 | " df = pd.read_csv('grocery_transactions.csv')\n", 60 | " df = df.stack().reset_index(-1, drop=True)\n", 61 | " df.index.names = ['tx_id']\n", 62 | " return pd.get_dummies(df, prefix='', prefix_sep='').groupby(level='tx_id').sum()" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 3, 68 | "metadata": { 69 | "run_control": { 70 | "marked": true 71 | }, 72 | "slideshow": { 73 | "slide_type": "fragment" 74 | } 75 | }, 76 | "outputs": [], 77 | "source": [ 78 | "data = get_transaction_data()\n", 79 | "\n", 80 | "item_id = pd.Series(dict(enumerate(data.columns)))\n", 81 | "transactions = data.values\n", 82 | "n_txn = transactions.shape[0]\n", 83 | "min_support = 0.01\n", 84 | "\n", 85 | "item_length = 1\n", 86 | "candidates = list(zip(item_id.index))\n", 87 | "itemsets = pd.DataFrame(columns=['support', 'length'])\n", 88 | "\n", 89 | "new_rules = []\n", 90 | "rule_data = ['itemset', 'antecedent', 'consequent',\n", 91 | " 'support_rule', 'support_antecedent', 'support_consequent',\n", 92 | " 'confidence', 'lift', 'pvalue']\n", 93 | "rules = pd.DataFrame(columns=rule_data)" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 4, 99 | "metadata": { 100 | "slideshow": { 101 | "slide_type": "skip" 102 | } 103 | }, 104 | "outputs": [], 105 | "source": [ 106 | "def prune_candidates(all_txn, candidates, candidate_size, min_support):\n", 107 | " \"\"\"Return DataFrame with itemsets of candidate_size with min_support\n", 108 | " all_txn: numpy array of transaction-product matrix\n", 109 | " candidates: list of tuples containing product id\n", 110 | " candidate_size: length of item set\n", 111 | " min_support: support threshold\n", 112 | " \"\"\"\n", 113 | " itemsets = {}\n", 114 | " for candidate in candidates:\n", 115 | " candidate_txn = all_txn[:, candidate].reshape(-1, candidate_size)\n", 116 | " relevant_txn = candidate_txn[(candidate_txn == 1).all(axis=1)]\n", 117 | " support = relevant_txn.shape[0] / all_txn.shape[0]\n", 118 | " if support >= min_support:\n", 119 | " itemsets[frozenset(candidate)] = support\n", 120 | " return pd.Series(itemsets).to_frame('support').assign(length=candidate_size)" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 5, 126 | "metadata": { 127 | "slideshow": { 128 | "slide_type": "slide" 129 | } 130 | }, 131 | "outputs": [], 132 | "source": [ 133 | "def find_association_rules(itemsets, n_txn, n_items, min_confidence=0, min_lift=0, min_pvalue=0):\n", 134 | " \"\"\"Find rules {antecedent} => {consequent} with min_confidence, min_lift and min_pvalue\n", 135 | " itemsets: DataFrame containing all itemsets and their support\n", 136 | " min_confidence, min_lift, min_pvalue: confidence & lift & pvalue thresholds\n", 137 | " \"\"\"\n", 138 | " support = itemsets.loc[:, 'support'].to_dict()\n", 139 | " new_rules = []\n", 140 | " for itemset in itemsets.loc[itemsets.length == n_items].index:\n", 141 | " for n_antecedents in range(1, n_items):\n", 142 | " antecedents = [frozenset(a)\n", 143 | " for a in combinations(itemset, r=n_antecedents)]\n", 144 | " for antecedent in antecedents:\n", 145 | " consequent = itemset.difference(antecedent)\n", 146 | " sAC = support[itemset]\n", 147 | " sA, sC = support[antecedent], support[consequent]\n", 148 | " confidence = sAC / sA\n", 149 | " lift = sAC / (sA * sC)\n", 150 | " contingency_table = n_txn * np.array([[sAC, sA - sAC],\n", 151 | " [sC - sAC, 1 - sA - sC + sAC]])\n", 152 | " _, p_value = fisher_exact(contingency_table,\n", 153 | " alternative='greater')\n", 154 | "\n", 155 | " if (confidence >= min_confidence) and (lift >= min_lift) and (p_value >= min_pvalue):\n", 156 | " new_rule = [itemset, antecedent, consequent,\n", 157 | " support[itemset], support[antecedent], support[consequent],\n", 158 | " confidence, lift, p_value]\n", 159 | " new_rules.append(new_rule)\n", 160 | " return new_rules" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 6, 166 | "metadata": { 167 | "slideshow": { 168 | "slide_type": "slide" 169 | } 170 | }, 171 | "outputs": [ 172 | { 173 | "name": "stdout", 174 | "output_type": "stream", 175 | "text": [ 176 | "Itemset Length 1\tCandidates: 169\tNew Items: 88\tNew Rules: 0\n", 177 | "Itemset Length 2\tCandidates: 3,828\tNew Items: 213\tNew Rules: 426\n", 178 | "Itemset Length 3\tCandidates: 16,215\tNew Items: 32\tNew Rules: 192\n", 179 | "Itemset Length 4\tCandidates: 3,060\tNew Items: 0\tNew Rules: 0\n" 180 | ] 181 | } 182 | ], 183 | "source": [ 184 | "while candidates:\n", 185 | " new_items = prune_candidates(transactions, candidates, item_length, min_support)\n", 186 | " itemsets = itemsets.append(new_items)\n", 187 | "\n", 188 | " if item_length > 1:\n", 189 | " new_rules = find_association_rules(itemsets, n_txn, item_length)\n", 190 | " rules = pd.concat([rules, pd.DataFrame(new_rules, columns=rules.columns)], ignore_index=True)\n", 191 | " \n", 192 | " print('Itemset Length {}\\tCandidates: {:>7,.0f}\\tNew Items: {:>7,.0f}\\tNew Rules: {:>7,.0f}'.format(\n", 193 | " item_length, len(candidates), len(new_items), len(new_rules)))\n", 194 | " \n", 195 | " item_length += 1\n", 196 | " remaining_items = np.unique([item for t in new_items.index for item in t])\n", 197 | " candidates = list(combinations(remaining_items, r=item_length))\n", 198 | "\n", 199 | "rules = rules.apply(pd.to_numeric, errors='ignore')" 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": 7, 205 | "metadata": { 206 | "slideshow": { 207 | "slide_type": "slide" 208 | } 209 | }, 210 | "outputs": [ 211 | { 212 | "name": "stdout", 213 | "output_type": "stream", 214 | "text": [ 215 | "\n", 216 | "RangeIndex: 618 entries, 0 to 617\n", 217 | "Data columns (total 9 columns):\n", 218 | "itemset 618 non-null object\n", 219 | "antecedent 618 non-null object\n", 220 | "consequent 618 non-null object\n", 221 | "support_rule 618 non-null float64\n", 222 | "support_antecedent 618 non-null float64\n", 223 | "support_consequent 618 non-null float64\n", 224 | "confidence 618 non-null float64\n", 225 | "lift 618 non-null float64\n", 226 | "pvalue 618 non-null float64\n", 227 | "dtypes: float64(6), object(3)\n", 228 | "memory usage: 43.5+ KB\n" 229 | ] 230 | }, 231 | { 232 | "data": { 233 | "text/html": [ 234 | "
\n", 235 | "\n", 248 | "\n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | "
itemsetantecedentconsequentsupport_rulesupport_antecedentsupport_consequentconfidenceliftpvalue
0(9, 103)(9)(103)0.0197270.0524710.1935120.3759691.9428692.229585e-23
1(9, 103)(103)(9)0.0197270.1935120.0524710.1019441.9428692.229585e-23
2(9, 123)(9)(123)0.0136260.0524710.1839540.2596901.4117147.615064e-06
3(9, 123)(123)(9)0.0136260.1839540.0524710.0740741.4117147.655712e-06
4(9, 124)(9)(124)0.0173890.0524710.1090100.3313953.0400587.774255e-45
\n", 326 | "
" 327 | ], 328 | "text/plain": [ 329 | " itemset antecedent consequent support_rule support_antecedent \\\n", 330 | "0 (9, 103) (9) (103) 0.019727 0.052471 \n", 331 | "1 (9, 103) (103) (9) 0.019727 0.193512 \n", 332 | "2 (9, 123) (9) (123) 0.013626 0.052471 \n", 333 | "3 (9, 123) (123) (9) 0.013626 0.183954 \n", 334 | "4 (9, 124) (9) (124) 0.017389 0.052471 \n", 335 | "\n", 336 | " support_consequent confidence lift pvalue \n", 337 | "0 0.193512 0.375969 1.942869 2.229585e-23 \n", 338 | "1 0.052471 0.101944 1.942869 2.229585e-23 \n", 339 | "2 0.183954 0.259690 1.411714 7.615064e-06 \n", 340 | "3 0.052471 0.074074 1.411714 7.655712e-06 \n", 341 | "4 0.109010 0.331395 3.040058 7.774255e-45 " 342 | ] 343 | }, 344 | "execution_count": 7, 345 | "metadata": {}, 346 | "output_type": "execute_result" 347 | } 348 | ], 349 | "source": [ 350 | "rules.info()\n", 351 | "rules.head()" 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": 8, 357 | "metadata": { 358 | "slideshow": { 359 | "slide_type": "slide" 360 | } 361 | }, 362 | "outputs": [], 363 | "source": [ 364 | "with pd.HDFStore('rules.h5') as store:\n", 365 | " store.put('rules', rules)" 366 | ] 367 | } 368 | ], 369 | "metadata": { 370 | "celltoolbar": "Slideshow", 371 | "hide_input": false, 372 | "kernelspec": { 373 | "display_name": "Python 3", 374 | "language": "python", 375 | "name": "python3" 376 | }, 377 | "language_info": { 378 | "codemirror_mode": { 379 | "name": "ipython", 380 | "version": 3 381 | }, 382 | "file_extension": ".py", 383 | "mimetype": "text/x-python", 384 | "name": "python", 385 | "nbconvert_exporter": "python", 386 | "pygments_lexer": "ipython3", 387 | "version": "3.6.3" 388 | } 389 | }, 390 | "nbformat": 4, 391 | "nbformat_minor": 2 392 | } 393 | -------------------------------------------------------------------------------- /Section 3/01_curse_of_dimensionality.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 3, 6 | "metadata": { 7 | "ExecuteTime": { 8 | "end_time": "2018-01-07T00:02:27.245559Z", 9 | "start_time": "2018-01-07T00:02:27.152493Z" 10 | }, 11 | "slideshow": { 12 | "slide_type": "skip" 13 | } 14 | }, 15 | "outputs": [], 16 | "source": [ 17 | "import pandas as pd\n", 18 | "import numpy as np\n", 19 | "from numpy import clip, full, fill_diagonal\n", 20 | "from numpy.linalg import inv, norm, lstsq\n", 21 | "from numpy.random import uniform, multivariate_normal, rand, randn, seed\n", 22 | "from itertools import repeat\n", 23 | "import matplotlib.pyplot as plt\n", 24 | "from matplotlib.patches import Patch\n", 25 | "from mpl_toolkits.mplot3d import Axes3D\n", 26 | "from matplotlib import gridspec\n", 27 | "from matplotlib.colors import to_rgba\n", 28 | "import seaborn as sns\n", 29 | "from scipy.spatial.distance import pdist, squareform\n", 30 | "from jupyterthemes import jtplot\n", 31 | "from sklearn.decomposition import PCA\n", 32 | "from sklearn.preprocessing import StandardScaler\n", 33 | "from sklearn.datasets import make_swiss_roll\n", 34 | "jtplot.style(theme='onedork', context='talk', fscale=1.8, spines=False, \n", 35 | " gridlines='--', ticks=True, grid=False, figsize=(7, 5))\n", 36 | "%matplotlib notebook\n", 37 | "pd.options.display.float_format = '{:,.2f}'.format\n", 38 | "seed(42)" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": { 44 | "slideshow": { 45 | "slide_type": "slide" 46 | } 47 | }, 48 | "source": [ 49 | "### Simulate pairwise distances of points in $\\mathbb{R}^n$ (while $n$ increases)" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 4, 55 | "metadata": { 56 | "ExecuteTime": { 57 | "end_time": "2018-01-06T22:04:30.214799Z", 58 | "start_time": "2018-01-06T22:04:30.022585Z" 59 | }, 60 | "hide_input": true, 61 | "slideshow": { 62 | "slide_type": "skip" 63 | } 64 | }, 65 | "outputs": [], 66 | "source": [ 67 | "def get_distance_metrics(points):\n", 68 | " \"\"\"Calculate mean of pairwise distances and \n", 69 | " mean of min pairwise distances\"\"\"\n", 70 | " pairwise_dist = squareform(pdist(points))\n", 71 | " fill_diagonal(pairwise_dist, np.nanmax(pairwise_dist))\n", 72 | " avg_distance = np.mean(np.nanmean(pairwise_dist, axis=1))\n", 73 | " avg_min_distance = np.mean(np.nanmin(pairwise_dist, axis=1))\n", 74 | " return avg_distance, avg_min_distance\n", 75 | "\n", 76 | "\n", 77 | "def simulate_distances(m, n, mean, var, corr):\n", 78 | " \"\"\"Draw m random vectors of dimension n \n", 79 | " from uniform and multivariate normal distributions\n", 80 | " and return pairwise distances\"\"\"\n", 81 | " uni_dist = get_distance_metrics(uniform(size=(m, n)))\n", 82 | "\n", 83 | " cov = full(shape=(n, n), fill_value=var * corr)\n", 84 | " fill_diagonal(cov, var)\n", 85 | " normal_points = multivariate_normal(\n", 86 | " full(shape=(n,), fill_value=mean), cov, m)\n", 87 | " normal_points = clip(normal_points, a_min=0, a_max=1)\n", 88 | " norm_dist = get_distance_metrics(normal_points)\n", 89 | " return uni_dist, norm_dist" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": 6, 95 | "metadata": { 96 | "ExecuteTime": { 97 | "end_time": "2018-01-06T22:07:09.967153Z", 98 | "start_time": "2018-01-06T22:04:33.731110Z" 99 | }, 100 | "slideshow": { 101 | "slide_type": "slide" 102 | } 103 | }, 104 | "outputs": [], 105 | "source": [ 106 | "# sampling params\n", 107 | "n_points = 1000\n", 108 | "min_dim, max_dim, step = 1, 2502, 100\n", 109 | "dimensions = range(min_dim, max_dim, step)\n", 110 | "\n", 111 | "# normal distribution params\n", 112 | "mean = 0.5 \n", 113 | "var = (mean/3)**2 # 99% of sample in [0, 1]\n", 114 | "corr = 0.25\n", 115 | "\n", 116 | "# run simulation\n", 117 | "avg_dist = []\n", 118 | "for dim in dimensions:\n", 119 | " uni_dist, norm_dist = simulate_distances(\n", 120 | " n_points, dim, mean, var, corr)\n", 121 | " avg_dist.append([*uni_dist, *norm_dist])\n", 122 | " \n", 123 | "col_names = ['Avg. Uniform', 'Min. Uniform',\n", 124 | " 'Avg. Normal', 'Min. Normal']\n", 125 | "distances = pd.DataFrame(data=avg_dist, \n", 126 | " columns=col_names, index=dimensions)" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": 5, 132 | "metadata": { 133 | "ExecuteTime": { 134 | "end_time": "2018-01-06T21:51:09.292659Z", 135 | "start_time": "2018-01-06T21:51:09.261636Z" 136 | }, 137 | "slideshow": { 138 | "slide_type": "slide" 139 | } 140 | }, 141 | "outputs": [], 142 | "source": [ 143 | "def simulate_distances(m, n, mean, var, corr):\n", 144 | " \"\"\"Draw m random vectors of dimension n \n", 145 | " from uniform and normal distributions\n", 146 | " and return pairwise distance metrics\"\"\"\n", 147 | " uni_dist = get_distance_metrics(uniform(size=(m, n)))\n", 148 | " cov = full(shape=(n, n), fill_value=var * corr)\n", 149 | " fill_diagonal(cov, var)\n", 150 | " normal_points = multivariate_normal(\n", 151 | " full(shape=(n,), fill_value=mean), cov, m)\n", 152 | " normal_points = clip(normal_points, a_min=0, a_max=1)\n", 153 | " norm_dist = get_distance_metrics(normal_points)\n", 154 | " return uni_dist, norm_dist" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": null, 160 | "metadata": { 161 | "slideshow": { 162 | "slide_type": "slide" 163 | } 164 | }, 165 | "outputs": [], 166 | "source": [ 167 | "def get_distance_metrics(points):\n", 168 | " \"\"\"Calculate mean of pairwise distances and \n", 169 | " mean of min pairwise distances\"\"\"\n", 170 | " pairwise_dist = squareform(pdist(points))\n", 171 | " fill_diagonal(pairwise_dist, np.nanmean(pairwise_dist, axis=1))\n", 172 | " avg_distance = np.mean(np.nanmean(pairwise_dist, axis=1))\n", 173 | " fill_diagonal(pairwise_dist, np.nanmax(pairwise_dist, axis=1))\n", 174 | " avg_min_distance = np.mean(np.nanmin(pairwise_dist, axis=1))\n", 175 | " return avg_distance, avg_min_distance" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": 3, 181 | "metadata": { 182 | "hide_input": true, 183 | "slideshow": { 184 | "slide_type": "skip" 185 | } 186 | }, 187 | "outputs": [], 188 | "source": [ 189 | "def get_distance_metrics(points):\n", 190 | " \"\"\"Calculate mean of pairwise distances and \n", 191 | " mean of min pairwise distances\"\"\"\n", 192 | " pairwise_dist = squareform(pdist(points))\n", 193 | " fill_diagonal(pairwise_dist, np.nanmax(pairwise_dist))\n", 194 | " avg_distance = np.mean(np.nanmean(pairwise_dist, axis=1))\n", 195 | " avg_min_distance = np.mean(np.nanmin(pairwise_dist, axis=1))\n", 196 | " return avg_distance, avg_min_distance\n", 197 | "\n", 198 | "def simulate_distances(m, n, mean, var, corr):\n", 199 | " \"\"\"Draw m random vectors of dimension n \n", 200 | " from uniform and multivariate normal distributions\n", 201 | " and return pairwise distance metrics\"\"\"\n", 202 | " uni_dist = get_distance_metrics(uniform(size=(m, n)))\n", 203 | " cov = full(shape=(n, n), fill_value=var * corr)\n", 204 | " fill_diagonal(cov, var)\n", 205 | " normal_points = multivariate_normal(\n", 206 | " full(shape=(n,), fill_value=mean), cov, m)\n", 207 | " normal_points = clip(normal_points, a_min=0, a_max=1)\n", 208 | " norm_dist = get_distance_metrics(normal_points)\n", 209 | " return uni_dist, norm_dist" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 6, 215 | "metadata": { 216 | "ExecuteTime": { 217 | "end_time": "2018-01-06T22:07:09.967153Z", 218 | "start_time": "2018-01-06T22:04:33.731110Z" 219 | }, 220 | "slideshow": { 221 | "slide_type": "skip" 222 | } 223 | }, 224 | "outputs": [], 225 | "source": [ 226 | "# sampling params\n", 227 | "n_points = 1000\n", 228 | "min_dim, max_dim, step = 1, 2502, 100\n", 229 | "dimensions = range(min_dim, max_dim, step)\n", 230 | "\n", 231 | "# normal distribution params\n", 232 | "mean = 0.5 \n", 233 | "var = (mean/3)**2 # 99% of sample in [0, 1]\n", 234 | "corr = 0.25\n", 235 | "\n", 236 | "# run simulation\n", 237 | "avg_dist = []\n", 238 | "for dim in dimensions:\n", 239 | " uni_dist, norm_dist = simulate_distances(\n", 240 | " n_points, dim, mean, var, corr)\n", 241 | " avg_dist.append([*uni_dist, *norm_dist])\n", 242 | " \n", 243 | "col_names = ['Avg. Uniform', 'Min. Uniform',\n", 244 | " 'Avg. Normal', 'Min. Normal']\n", 245 | "distances = pd.DataFrame(data=avg_dist, \n", 246 | " columns=col_names, index=dimensions)" 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": 4, 252 | "metadata": { 253 | "ExecuteTime": { 254 | "end_time": "2018-01-06T22:07:52.384435Z", 255 | "start_time": "2018-01-06T22:07:51.971141Z" 256 | }, 257 | "hide_input": true, 258 | "slideshow": { 259 | "slide_type": "slide" 260 | } 261 | }, 262 | "outputs": [ 263 | { 264 | "data": { 265 | "image/png": "\n", 266 | "text/plain": [ 267 | "" 268 | ] 269 | }, 270 | "metadata": {}, 271 | "output_type": "display_data" 272 | } 273 | ], 274 | "source": [ 275 | "title = 'Distance of {:,.0f} Data Points in a Unit Hypercube'.format(n_points)\n", 276 | "fig, axes = plt.subplots(2, 1)\n", 277 | "distances[[ 'Avg. Uniform', 'Avg. Normal']].plot.bar(figsize=(14, 8), title='Average ' + title, ax=axes[0])\n", 278 | "distances[[ 'Min. Uniform', 'Min. Normal']].plot.bar(figsize=(14, 8), title='Minimum ' + title, ax=axes[1])\n", 279 | "\n", 280 | "for ax in axes:\n", 281 | " ax.grid(axis='y', lw=1, ls='--')\n", 282 | " for p in ax.patches:\n", 283 | " ax.annotate('{:.1f}'.format(p.get_height()), (p.get_x() + .01, p.get_height() + .25))\n", 284 | "plt.tight_layout();" 285 | ] 286 | } 287 | ], 288 | "metadata": { 289 | "celltoolbar": "Slideshow", 290 | "hide_input": false, 291 | "kernelspec": { 292 | "display_name": "Python 3", 293 | "language": "python", 294 | "name": "python3" 295 | }, 296 | "language_info": { 297 | "codemirror_mode": { 298 | "name": "ipython", 299 | "version": 3 300 | }, 301 | "file_extension": ".py", 302 | "mimetype": "text/x-python", 303 | "name": "python", 304 | "nbconvert_exporter": "python", 305 | "pygments_lexer": "ipython3", 306 | "version": "3.6.3" 307 | } 308 | }, 309 | "nbformat": 4, 310 | "nbformat_minor": 1 311 | } 312 | -------------------------------------------------------------------------------- /Section 3/wholesale_customers_data.csv: -------------------------------------------------------------------------------- 1 | Channel,Region,Fresh,Milk,Grocery,Frozen,Detergents_Paper,Delicatessen 2 | 2,3,12669,9656,7561,214,2674,1338 3 | 2,3,7057,9810,9568,1762,3293,1776 4 | 2,3,6353,8808,7684,2405,3516,7844 5 | 1,3,13265,1196,4221,6404,507,1788 6 | 2,3,22615,5410,7198,3915,1777,5185 7 | 2,3,9413,8259,5126,666,1795,1451 8 | 2,3,12126,3199,6975,480,3140,545 9 | 2,3,7579,4956,9426,1669,3321,2566 10 | 1,3,5963,3648,6192,425,1716,750 11 | 2,3,6006,11093,18881,1159,7425,2098 12 | 2,3,3366,5403,12974,4400,5977,1744 13 | 2,3,13146,1124,4523,1420,549,497 14 | 2,3,31714,12319,11757,287,3881,2931 15 | 2,3,21217,6208,14982,3095,6707,602 16 | 2,3,24653,9465,12091,294,5058,2168 17 | 1,3,10253,1114,3821,397,964,412 18 | 2,3,1020,8816,12121,134,4508,1080 19 | 1,3,5876,6157,2933,839,370,4478 20 | 2,3,18601,6327,10099,2205,2767,3181 21 | 1,3,7780,2495,9464,669,2518,501 22 | 2,3,17546,4519,4602,1066,2259,2124 23 | 1,3,5567,871,2010,3383,375,569 24 | 1,3,31276,1917,4469,9408,2381,4334 25 | 2,3,26373,36423,22019,5154,4337,16523 26 | 2,3,22647,9776,13792,2915,4482,5778 27 | 2,3,16165,4230,7595,201,4003,57 28 | 1,3,9898,961,2861,3151,242,833 29 | 1,3,14276,803,3045,485,100,518 30 | 2,3,4113,20484,25957,1158,8604,5206 31 | 1,3,43088,2100,2609,1200,1107,823 32 | 1,3,18815,3610,11107,1148,2134,2963 33 | 1,3,2612,4339,3133,2088,820,985 34 | 1,3,21632,1318,2886,266,918,405 35 | 1,3,29729,4786,7326,6130,361,1083 36 | 1,3,1502,1979,2262,425,483,395 37 | 2,3,688,5491,11091,833,4239,436 38 | 1,3,29955,4362,5428,1729,862,4626 39 | 2,3,15168,10556,12477,1920,6506,714 40 | 2,3,4591,15729,16709,33,6956,433 41 | 1,3,56159,555,902,10002,212,2916 42 | 1,3,24025,4332,4757,9510,1145,5864 43 | 1,3,19176,3065,5956,2033,2575,2802 44 | 2,3,10850,7555,14961,188,6899,46 45 | 2,3,630,11095,23998,787,9529,72 46 | 2,3,9670,7027,10471,541,4618,65 47 | 2,3,5181,22044,21531,1740,7353,4985 48 | 2,3,3103,14069,21955,1668,6792,1452 49 | 2,3,44466,54259,55571,7782,24171,6465 50 | 2,3,11519,6152,10868,584,5121,1476 51 | 2,3,4967,21412,28921,1798,13583,1163 52 | 1,3,6269,1095,1980,3860,609,2162 53 | 1,3,3347,4051,6996,239,1538,301 54 | 2,3,40721,3916,5876,532,2587,1278 55 | 2,3,491,10473,11532,744,5611,224 56 | 1,3,27329,1449,1947,2436,204,1333 57 | 1,3,5264,3683,5005,1057,2024,1130 58 | 2,3,4098,29892,26866,2616,17740,1340 59 | 2,3,5417,9933,10487,38,7572,1282 60 | 1,3,13779,1970,1648,596,227,436 61 | 1,3,6137,5360,8040,129,3084,1603 62 | 2,3,8590,3045,7854,96,4095,225 63 | 2,3,35942,38369,59598,3254,26701,2017 64 | 2,3,7823,6245,6544,4154,4074,964 65 | 2,3,9396,11601,15775,2896,7677,1295 66 | 1,3,4760,1227,3250,3724,1247,1145 67 | 2,3,85,20959,45828,36,24231,1423 68 | 1,3,9,1534,7417,175,3468,27 69 | 2,3,19913,6759,13462,1256,5141,834 70 | 1,3,2446,7260,3993,5870,788,3095 71 | 1,3,8352,2820,1293,779,656,144 72 | 1,3,16705,2037,3202,10643,116,1365 73 | 1,3,18291,1266,21042,5373,4173,14472 74 | 1,3,4420,5139,2661,8872,1321,181 75 | 2,3,19899,5332,8713,8132,764,648 76 | 2,3,8190,6343,9794,1285,1901,1780 77 | 1,3,20398,1137,3,4407,3,975 78 | 1,3,717,3587,6532,7530,529,894 79 | 2,3,12205,12697,28540,869,12034,1009 80 | 1,3,10766,1175,2067,2096,301,167 81 | 1,3,1640,3259,3655,868,1202,1653 82 | 1,3,7005,829,3009,430,610,529 83 | 2,3,219,9540,14403,283,7818,156 84 | 2,3,10362,9232,11009,737,3537,2342 85 | 1,3,20874,1563,1783,2320,550,772 86 | 2,3,11867,3327,4814,1178,3837,120 87 | 2,3,16117,46197,92780,1026,40827,2944 88 | 2,3,22925,73498,32114,987,20070,903 89 | 1,3,43265,5025,8117,6312,1579,14351 90 | 1,3,7864,542,4042,9735,165,46 91 | 1,3,24904,3836,5330,3443,454,3178 92 | 1,3,11405,596,1638,3347,69,360 93 | 1,3,12754,2762,2530,8693,627,1117 94 | 2,3,9198,27472,32034,3232,18906,5130 95 | 1,3,11314,3090,2062,35009,71,2698 96 | 2,3,5626,12220,11323,206,5038,244 97 | 1,3,3,2920,6252,440,223,709 98 | 2,3,23,2616,8118,145,3874,217 99 | 1,3,403,254,610,774,54,63 100 | 1,3,503,112,778,895,56,132 101 | 1,3,9658,2182,1909,5639,215,323 102 | 2,3,11594,7779,12144,3252,8035,3029 103 | 2,3,1420,10810,16267,1593,6766,1838 104 | 2,3,2932,6459,7677,2561,4573,1386 105 | 1,3,56082,3504,8906,18028,1480,2498 106 | 1,3,14100,2132,3445,1336,1491,548 107 | 1,3,15587,1014,3970,910,139,1378 108 | 2,3,1454,6337,10704,133,6830,1831 109 | 2,3,8797,10646,14886,2471,8969,1438 110 | 2,3,1531,8397,6981,247,2505,1236 111 | 2,3,1406,16729,28986,673,836,3 112 | 1,3,11818,1648,1694,2276,169,1647 113 | 2,3,12579,11114,17569,805,6457,1519 114 | 1,3,19046,2770,2469,8853,483,2708 115 | 1,3,14438,2295,1733,3220,585,1561 116 | 1,3,18044,1080,2000,2555,118,1266 117 | 1,3,11134,793,2988,2715,276,610 118 | 1,3,11173,2521,3355,1517,310,222 119 | 1,3,6990,3880,5380,1647,319,1160 120 | 1,3,20049,1891,2362,5343,411,933 121 | 1,3,8258,2344,2147,3896,266,635 122 | 1,3,17160,1200,3412,2417,174,1136 123 | 1,3,4020,3234,1498,2395,264,255 124 | 1,3,12212,201,245,1991,25,860 125 | 2,3,11170,10769,8814,2194,1976,143 126 | 1,3,36050,1642,2961,4787,500,1621 127 | 1,3,76237,3473,7102,16538,778,918 128 | 1,3,19219,1840,1658,8195,349,483 129 | 2,3,21465,7243,10685,880,2386,2749 130 | 1,3,140,8847,3823,142,1062,3 131 | 1,3,42312,926,1510,1718,410,1819 132 | 1,3,7149,2428,699,6316,395,911 133 | 1,3,2101,589,314,346,70,310 134 | 1,3,14903,2032,2479,576,955,328 135 | 1,3,9434,1042,1235,436,256,396 136 | 1,3,7388,1882,2174,720,47,537 137 | 1,3,6300,1289,2591,1170,199,326 138 | 1,3,4625,8579,7030,4575,2447,1542 139 | 1,3,3087,8080,8282,661,721,36 140 | 1,3,13537,4257,5034,155,249,3271 141 | 1,3,5387,4979,3343,825,637,929 142 | 1,3,17623,4280,7305,2279,960,2616 143 | 1,3,30379,13252,5189,321,51,1450 144 | 1,3,37036,7152,8253,2995,20,3 145 | 1,3,10405,1596,1096,8425,399,318 146 | 1,3,18827,3677,1988,118,516,201 147 | 2,3,22039,8384,34792,42,12591,4430 148 | 1,3,7769,1936,2177,926,73,520 149 | 1,3,9203,3373,2707,1286,1082,526 150 | 1,3,5924,584,542,4052,283,434 151 | 1,3,31812,1433,1651,800,113,1440 152 | 1,3,16225,1825,1765,853,170,1067 153 | 1,3,1289,3328,2022,531,255,1774 154 | 1,3,18840,1371,3135,3001,352,184 155 | 1,3,3463,9250,2368,779,302,1627 156 | 1,3,622,55,137,75,7,8 157 | 2,3,1989,10690,19460,233,11577,2153 158 | 2,3,3830,5291,14855,317,6694,3182 159 | 1,3,17773,1366,2474,3378,811,418 160 | 2,3,2861,6570,9618,930,4004,1682 161 | 2,3,355,7704,14682,398,8077,303 162 | 2,3,1725,3651,12822,824,4424,2157 163 | 1,3,12434,540,283,1092,3,2233 164 | 1,3,15177,2024,3810,2665,232,610 165 | 2,3,5531,15726,26870,2367,13726,446 166 | 2,3,5224,7603,8584,2540,3674,238 167 | 2,3,15615,12653,19858,4425,7108,2379 168 | 2,3,4822,6721,9170,993,4973,3637 169 | 1,3,2926,3195,3268,405,1680,693 170 | 1,3,5809,735,803,1393,79,429 171 | 1,3,5414,717,2155,2399,69,750 172 | 2,3,260,8675,13430,1116,7015,323 173 | 2,3,200,25862,19816,651,8773,6250 174 | 1,3,955,5479,6536,333,2840,707 175 | 2,3,514,7677,19805,937,9836,716 176 | 1,3,286,1208,5241,2515,153,1442 177 | 2,3,2343,7845,11874,52,4196,1697 178 | 1,3,45640,6958,6536,7368,1532,230 179 | 1,3,12759,7330,4533,1752,20,2631 180 | 1,3,11002,7075,4945,1152,120,395 181 | 1,3,3157,4888,2500,4477,273,2165 182 | 1,3,12356,6036,8887,402,1382,2794 183 | 1,3,112151,29627,18148,16745,4948,8550 184 | 1,3,694,8533,10518,443,6907,156 185 | 1,3,36847,43950,20170,36534,239,47943 186 | 1,3,327,918,4710,74,334,11 187 | 1,3,8170,6448,1139,2181,58,247 188 | 1,3,3009,521,854,3470,949,727 189 | 1,3,2438,8002,9819,6269,3459,3 190 | 2,3,8040,7639,11687,2758,6839,404 191 | 2,3,834,11577,11522,275,4027,1856 192 | 1,3,16936,6250,1981,7332,118,64 193 | 1,3,13624,295,1381,890,43,84 194 | 1,3,5509,1461,2251,547,187,409 195 | 2,3,180,3485,20292,959,5618,666 196 | 1,3,7107,1012,2974,806,355,1142 197 | 1,3,17023,5139,5230,7888,330,1755 198 | 1,1,30624,7209,4897,18711,763,2876 199 | 2,1,2427,7097,10391,1127,4314,1468 200 | 1,1,11686,2154,6824,3527,592,697 201 | 1,1,9670,2280,2112,520,402,347 202 | 2,1,3067,13240,23127,3941,9959,731 203 | 2,1,4484,14399,24708,3549,14235,1681 204 | 1,1,25203,11487,9490,5065,284,6854 205 | 1,1,583,685,2216,469,954,18 206 | 1,1,1956,891,5226,1383,5,1328 207 | 2,1,1107,11711,23596,955,9265,710 208 | 1,1,6373,780,950,878,288,285 209 | 2,1,2541,4737,6089,2946,5316,120 210 | 1,1,1537,3748,5838,1859,3381,806 211 | 2,1,5550,12729,16767,864,12420,797 212 | 1,1,18567,1895,1393,1801,244,2100 213 | 2,1,12119,28326,39694,4736,19410,2870 214 | 1,1,7291,1012,2062,1291,240,1775 215 | 1,1,3317,6602,6861,1329,3961,1215 216 | 2,1,2362,6551,11364,913,5957,791 217 | 1,1,2806,10765,15538,1374,5828,2388 218 | 2,1,2532,16599,36486,179,13308,674 219 | 1,1,18044,1475,2046,2532,130,1158 220 | 2,1,18,7504,15205,1285,4797,6372 221 | 1,1,4155,367,1390,2306,86,130 222 | 1,1,14755,899,1382,1765,56,749 223 | 1,1,5396,7503,10646,91,4167,239 224 | 1,1,5041,1115,2856,7496,256,375 225 | 2,1,2790,2527,5265,5612,788,1360 226 | 1,1,7274,659,1499,784,70,659 227 | 1,1,12680,3243,4157,660,761,786 228 | 2,1,20782,5921,9212,1759,2568,1553 229 | 1,1,4042,2204,1563,2286,263,689 230 | 1,1,1869,577,572,950,4762,203 231 | 1,1,8656,2746,2501,6845,694,980 232 | 2,1,11072,5989,5615,8321,955,2137 233 | 1,1,2344,10678,3828,1439,1566,490 234 | 1,1,25962,1780,3838,638,284,834 235 | 1,1,964,4984,3316,937,409,7 236 | 1,1,15603,2703,3833,4260,325,2563 237 | 1,1,1838,6380,2824,1218,1216,295 238 | 1,1,8635,820,3047,2312,415,225 239 | 1,1,18692,3838,593,4634,28,1215 240 | 1,1,7363,475,585,1112,72,216 241 | 1,1,47493,2567,3779,5243,828,2253 242 | 1,1,22096,3575,7041,11422,343,2564 243 | 1,1,24929,1801,2475,2216,412,1047 244 | 1,1,18226,659,2914,3752,586,578 245 | 1,1,11210,3576,5119,561,1682,2398 246 | 1,1,6202,7775,10817,1183,3143,1970 247 | 2,1,3062,6154,13916,230,8933,2784 248 | 1,1,8885,2428,1777,1777,430,610 249 | 1,1,13569,346,489,2077,44,659 250 | 1,1,15671,5279,2406,559,562,572 251 | 1,1,8040,3795,2070,6340,918,291 252 | 1,1,3191,1993,1799,1730,234,710 253 | 2,1,6134,23133,33586,6746,18594,5121 254 | 1,1,6623,1860,4740,7683,205,1693 255 | 1,1,29526,7961,16966,432,363,1391 256 | 1,1,10379,17972,4748,4686,1547,3265 257 | 1,1,31614,489,1495,3242,111,615 258 | 1,1,11092,5008,5249,453,392,373 259 | 1,1,8475,1931,1883,5004,3593,987 260 | 1,1,56083,4563,2124,6422,730,3321 261 | 1,1,53205,4959,7336,3012,967,818 262 | 1,1,9193,4885,2157,327,780,548 263 | 1,1,7858,1110,1094,6818,49,287 264 | 1,1,23257,1372,1677,982,429,655 265 | 1,1,2153,1115,6684,4324,2894,411 266 | 2,1,1073,9679,15445,61,5980,1265 267 | 1,1,5909,23527,13699,10155,830,3636 268 | 2,1,572,9763,22182,2221,4882,2563 269 | 1,1,20893,1222,2576,3975,737,3628 270 | 2,1,11908,8053,19847,1069,6374,698 271 | 1,1,15218,258,1138,2516,333,204 272 | 1,1,4720,1032,975,5500,197,56 273 | 1,1,2083,5007,1563,1120,147,1550 274 | 1,1,514,8323,6869,529,93,1040 275 | 1,3,36817,3045,1493,4802,210,1824 276 | 1,3,894,1703,1841,744,759,1153 277 | 1,3,680,1610,223,862,96,379 278 | 1,3,27901,3749,6964,4479,603,2503 279 | 1,3,9061,829,683,16919,621,139 280 | 1,3,11693,2317,2543,5845,274,1409 281 | 2,3,17360,6200,9694,1293,3620,1721 282 | 1,3,3366,2884,2431,977,167,1104 283 | 2,3,12238,7108,6235,1093,2328,2079 284 | 1,3,49063,3965,4252,5970,1041,1404 285 | 1,3,25767,3613,2013,10303,314,1384 286 | 1,3,68951,4411,12609,8692,751,2406 287 | 1,3,40254,640,3600,1042,436,18 288 | 1,3,7149,2247,1242,1619,1226,128 289 | 1,3,15354,2102,2828,8366,386,1027 290 | 1,3,16260,594,1296,848,445,258 291 | 1,3,42786,286,471,1388,32,22 292 | 1,3,2708,2160,2642,502,965,1522 293 | 1,3,6022,3354,3261,2507,212,686 294 | 1,3,2838,3086,4329,3838,825,1060 295 | 2,2,3996,11103,12469,902,5952,741 296 | 1,2,21273,2013,6550,909,811,1854 297 | 2,2,7588,1897,5234,417,2208,254 298 | 1,2,19087,1304,3643,3045,710,898 299 | 2,2,8090,3199,6986,1455,3712,531 300 | 2,2,6758,4560,9965,934,4538,1037 301 | 1,2,444,879,2060,264,290,259 302 | 2,2,16448,6243,6360,824,2662,2005 303 | 2,2,5283,13316,20399,1809,8752,172 304 | 2,2,2886,5302,9785,364,6236,555 305 | 2,2,2599,3688,13829,492,10069,59 306 | 2,2,161,7460,24773,617,11783,2410 307 | 2,2,243,12939,8852,799,3909,211 308 | 2,2,6468,12867,21570,1840,7558,1543 309 | 1,2,17327,2374,2842,1149,351,925 310 | 1,2,6987,1020,3007,416,257,656 311 | 2,2,918,20655,13567,1465,6846,806 312 | 1,2,7034,1492,2405,12569,299,1117 313 | 1,2,29635,2335,8280,3046,371,117 314 | 2,2,2137,3737,19172,1274,17120,142 315 | 1,2,9784,925,2405,4447,183,297 316 | 1,2,10617,1795,7647,1483,857,1233 317 | 2,2,1479,14982,11924,662,3891,3508 318 | 1,2,7127,1375,2201,2679,83,1059 319 | 1,2,1182,3088,6114,978,821,1637 320 | 1,2,11800,2713,3558,2121,706,51 321 | 2,2,9759,25071,17645,1128,12408,1625 322 | 1,2,1774,3696,2280,514,275,834 323 | 1,2,9155,1897,5167,2714,228,1113 324 | 1,2,15881,713,3315,3703,1470,229 325 | 1,2,13360,944,11593,915,1679,573 326 | 1,2,25977,3587,2464,2369,140,1092 327 | 1,2,32717,16784,13626,60869,1272,5609 328 | 1,2,4414,1610,1431,3498,387,834 329 | 1,2,542,899,1664,414,88,522 330 | 1,2,16933,2209,3389,7849,210,1534 331 | 1,2,5113,1486,4583,5127,492,739 332 | 1,2,9790,1786,5109,3570,182,1043 333 | 2,2,11223,14881,26839,1234,9606,1102 334 | 1,2,22321,3216,1447,2208,178,2602 335 | 2,2,8565,4980,67298,131,38102,1215 336 | 2,2,16823,928,2743,11559,332,3486 337 | 2,2,27082,6817,10790,1365,4111,2139 338 | 1,2,13970,1511,1330,650,146,778 339 | 1,2,9351,1347,2611,8170,442,868 340 | 1,2,3,333,7021,15601,15,550 341 | 1,2,2617,1188,5332,9584,573,1942 342 | 2,3,381,4025,9670,388,7271,1371 343 | 2,3,2320,5763,11238,767,5162,2158 344 | 1,3,255,5758,5923,349,4595,1328 345 | 2,3,1689,6964,26316,1456,15469,37 346 | 1,3,3043,1172,1763,2234,217,379 347 | 1,3,1198,2602,8335,402,3843,303 348 | 2,3,2771,6939,15541,2693,6600,1115 349 | 2,3,27380,7184,12311,2809,4621,1022 350 | 1,3,3428,2380,2028,1341,1184,665 351 | 2,3,5981,14641,20521,2005,12218,445 352 | 1,3,3521,1099,1997,1796,173,995 353 | 2,3,1210,10044,22294,1741,12638,3137 354 | 1,3,608,1106,1533,830,90,195 355 | 2,3,117,6264,21203,228,8682,1111 356 | 1,3,14039,7393,2548,6386,1333,2341 357 | 1,3,190,727,2012,245,184,127 358 | 1,3,22686,134,218,3157,9,548 359 | 2,3,37,1275,22272,137,6747,110 360 | 1,3,759,18664,1660,6114,536,4100 361 | 1,3,796,5878,2109,340,232,776 362 | 1,3,19746,2872,2006,2601,468,503 363 | 1,3,4734,607,864,1206,159,405 364 | 1,3,2121,1601,2453,560,179,712 365 | 1,3,4627,997,4438,191,1335,314 366 | 1,3,2615,873,1524,1103,514,468 367 | 2,3,4692,6128,8025,1619,4515,3105 368 | 1,3,9561,2217,1664,1173,222,447 369 | 1,3,3477,894,534,1457,252,342 370 | 1,3,22335,1196,2406,2046,101,558 371 | 1,3,6211,337,683,1089,41,296 372 | 2,3,39679,3944,4955,1364,523,2235 373 | 1,3,20105,1887,1939,8164,716,790 374 | 1,3,3884,3801,1641,876,397,4829 375 | 2,3,15076,6257,7398,1504,1916,3113 376 | 1,3,6338,2256,1668,1492,311,686 377 | 1,3,5841,1450,1162,597,476,70 378 | 2,3,3136,8630,13586,5641,4666,1426 379 | 1,3,38793,3154,2648,1034,96,1242 380 | 1,3,3225,3294,1902,282,68,1114 381 | 2,3,4048,5164,10391,130,813,179 382 | 1,3,28257,944,2146,3881,600,270 383 | 1,3,17770,4591,1617,9927,246,532 384 | 1,3,34454,7435,8469,2540,1711,2893 385 | 1,3,1821,1364,3450,4006,397,361 386 | 1,3,10683,21858,15400,3635,282,5120 387 | 1,3,11635,922,1614,2583,192,1068 388 | 1,3,1206,3620,2857,1945,353,967 389 | 1,3,20918,1916,1573,1960,231,961 390 | 1,3,9785,848,1172,1677,200,406 391 | 1,3,9385,1530,1422,3019,227,684 392 | 1,3,3352,1181,1328,5502,311,1000 393 | 1,3,2647,2761,2313,907,95,1827 394 | 1,3,518,4180,3600,659,122,654 395 | 1,3,23632,6730,3842,8620,385,819 396 | 1,3,12377,865,3204,1398,149,452 397 | 1,3,9602,1316,1263,2921,841,290 398 | 2,3,4515,11991,9345,2644,3378,2213 399 | 1,3,11535,1666,1428,6838,64,743 400 | 1,3,11442,1032,582,5390,74,247 401 | 1,3,9612,577,935,1601,469,375 402 | 1,3,4446,906,1238,3576,153,1014 403 | 1,3,27167,2801,2128,13223,92,1902 404 | 1,3,26539,4753,5091,220,10,340 405 | 1,3,25606,11006,4604,127,632,288 406 | 1,3,18073,4613,3444,4324,914,715 407 | 1,3,6884,1046,1167,2069,593,378 408 | 1,3,25066,5010,5026,9806,1092,960 409 | 2,3,7362,12844,18683,2854,7883,553 410 | 2,3,8257,3880,6407,1646,2730,344 411 | 1,3,8708,3634,6100,2349,2123,5137 412 | 1,3,6633,2096,4563,1389,1860,1892 413 | 1,3,2126,3289,3281,1535,235,4365 414 | 1,3,97,3605,12400,98,2970,62 415 | 1,3,4983,4859,6633,17866,912,2435 416 | 1,3,5969,1990,3417,5679,1135,290 417 | 2,3,7842,6046,8552,1691,3540,1874 418 | 2,3,4389,10940,10908,848,6728,993 419 | 1,3,5065,5499,11055,364,3485,1063 420 | 2,3,660,8494,18622,133,6740,776 421 | 1,3,8861,3783,2223,633,1580,1521 422 | 1,3,4456,5266,13227,25,6818,1393 423 | 2,3,17063,4847,9053,1031,3415,1784 424 | 1,3,26400,1377,4172,830,948,1218 425 | 2,3,17565,3686,4657,1059,1803,668 426 | 2,3,16980,2884,12232,874,3213,249 427 | 1,3,11243,2408,2593,15348,108,1886 428 | 1,3,13134,9347,14316,3141,5079,1894 429 | 1,3,31012,16687,5429,15082,439,1163 430 | 1,3,3047,5970,4910,2198,850,317 431 | 1,3,8607,1750,3580,47,84,2501 432 | 1,3,3097,4230,16483,575,241,2080 433 | 1,3,8533,5506,5160,13486,1377,1498 434 | 1,3,21117,1162,4754,269,1328,395 435 | 1,3,1982,3218,1493,1541,356,1449 436 | 1,3,16731,3922,7994,688,2371,838 437 | 1,3,29703,12051,16027,13135,182,2204 438 | 1,3,39228,1431,764,4510,93,2346 439 | 2,3,14531,15488,30243,437,14841,1867 440 | 1,3,10290,1981,2232,1038,168,2125 441 | 1,3,2787,1698,2510,65,477,52 442 | -------------------------------------------------------------------------------- /Section 4/data/wholesale_customers_data.csv: -------------------------------------------------------------------------------- 1 | "Channel","Region","Fresh","Milk","Grocery","Frozen","Detergents_Paper","Delicatessen" 2 | 2,3,12669,9656,7561,214,2674,1338 3 | 2,3,7057,9810,9568,1762,3293,1776 4 | 2,3,6353,8808,7684,2405,3516,7844 5 | 1,3,13265,1196,4221,6404,507,1788 6 | 2,3,22615,5410,7198,3915,1777,5185 7 | 2,3,9413,8259,5126,666,1795,1451 8 | 2,3,12126,3199,6975,480,3140,545 9 | 2,3,7579,4956,9426,1669,3321,2566 10 | 1,3,5963,3648,6192,425,1716,750 11 | 2,3,6006,11093,18881,1159,7425,2098 12 | 2,3,3366,5403,12974,4400,5977,1744 13 | 2,3,13146,1124,4523,1420,549,497 14 | 2,3,31714,12319,11757,287,3881,2931 15 | 2,3,21217,6208,14982,3095,6707,602 16 | 2,3,24653,9465,12091,294,5058,2168 17 | 1,3,10253,1114,3821,397,964,412 18 | 2,3,1020,8816,12121,134,4508,1080 19 | 1,3,5876,6157,2933,839,370,4478 20 | 2,3,18601,6327,10099,2205,2767,3181 21 | 1,3,7780,2495,9464,669,2518,501 22 | 2,3,17546,4519,4602,1066,2259,2124 23 | 1,3,5567,871,2010,3383,375,569 24 | 1,3,31276,1917,4469,9408,2381,4334 25 | 2,3,26373,36423,22019,5154,4337,16523 26 | 2,3,22647,9776,13792,2915,4482,5778 27 | 2,3,16165,4230,7595,201,4003,57 28 | 1,3,9898,961,2861,3151,242,833 29 | 1,3,14276,803,3045,485,100,518 30 | 2,3,4113,20484,25957,1158,8604,5206 31 | 1,3,43088,2100,2609,1200,1107,823 32 | 1,3,18815,3610,11107,1148,2134,2963 33 | 1,3,2612,4339,3133,2088,820,985 34 | 1,3,21632,1318,2886,266,918,405 35 | 1,3,29729,4786,7326,6130,361,1083 36 | 1,3,1502,1979,2262,425,483,395 37 | 2,3,688,5491,11091,833,4239,436 38 | 1,3,29955,4362,5428,1729,862,4626 39 | 2,3,15168,10556,12477,1920,6506,714 40 | 2,3,4591,15729,16709,33,6956,433 41 | 1,3,56159,555,902,10002,212,2916 42 | 1,3,24025,4332,4757,9510,1145,5864 43 | 1,3,19176,3065,5956,2033,2575,2802 44 | 2,3,10850,7555,14961,188,6899,46 45 | 2,3,630,11095,23998,787,9529,72 46 | 2,3,9670,7027,10471,541,4618,65 47 | 2,3,5181,22044,21531,1740,7353,4985 48 | 2,3,3103,14069,21955,1668,6792,1452 49 | 2,3,44466,54259,55571,7782,24171,6465 50 | 2,3,11519,6152,10868,584,5121,1476 51 | 2,3,4967,21412,28921,1798,13583,1163 52 | 1,3,6269,1095,1980,3860,609,2162 53 | 1,3,3347,4051,6996,239,1538,301 54 | 2,3,40721,3916,5876,532,2587,1278 55 | 2,3,491,10473,11532,744,5611,224 56 | 1,3,27329,1449,1947,2436,204,1333 57 | 1,3,5264,3683,5005,1057,2024,1130 58 | 2,3,4098,29892,26866,2616,17740,1340 59 | 2,3,5417,9933,10487,38,7572,1282 60 | 1,3,13779,1970,1648,596,227,436 61 | 1,3,6137,5360,8040,129,3084,1603 62 | 2,3,8590,3045,7854,96,4095,225 63 | 2,3,35942,38369,59598,3254,26701,2017 64 | 2,3,7823,6245,6544,4154,4074,964 65 | 2,3,9396,11601,15775,2896,7677,1295 66 | 1,3,4760,1227,3250,3724,1247,1145 67 | 2,3,85,20959,45828,36,24231,1423 68 | 1,3,9,1534,7417,175,3468,27 69 | 2,3,19913,6759,13462,1256,5141,834 70 | 1,3,2446,7260,3993,5870,788,3095 71 | 1,3,8352,2820,1293,779,656,144 72 | 1,3,16705,2037,3202,10643,116,1365 73 | 1,3,18291,1266,21042,5373,4173,14472 74 | 1,3,4420,5139,2661,8872,1321,181 75 | 2,3,19899,5332,8713,8132,764,648 76 | 2,3,8190,6343,9794,1285,1901,1780 77 | 1,3,20398,1137,3,4407,3,975 78 | 1,3,717,3587,6532,7530,529,894 79 | 2,3,12205,12697,28540,869,12034,1009 80 | 1,3,10766,1175,2067,2096,301,167 81 | 1,3,1640,3259,3655,868,1202,1653 82 | 1,3,7005,829,3009,430,610,529 83 | 2,3,219,9540,14403,283,7818,156 84 | 2,3,10362,9232,11009,737,3537,2342 85 | 1,3,20874,1563,1783,2320,550,772 86 | 2,3,11867,3327,4814,1178,3837,120 87 | 2,3,16117,46197,92780,1026,40827,2944 88 | 2,3,22925,73498,32114,987,20070,903 89 | 1,3,43265,5025,8117,6312,1579,14351 90 | 1,3,7864,542,4042,9735,165,46 91 | 1,3,24904,3836,5330,3443,454,3178 92 | 1,3,11405,596,1638,3347,69,360 93 | 1,3,12754,2762,2530,8693,627,1117 94 | 2,3,9198,27472,32034,3232,18906,5130 95 | 1,3,11314,3090,2062,35009,71,2698 96 | 2,3,5626,12220,11323,206,5038,244 97 | 1,3,3,2920,6252,440,223,709 98 | 2,3,23,2616,8118,145,3874,217 99 | 1,3,403,254,610,774,54,63 100 | 1,3,503,112,778,895,56,132 101 | 1,3,9658,2182,1909,5639,215,323 102 | 2,3,11594,7779,12144,3252,8035,3029 103 | 2,3,1420,10810,16267,1593,6766,1838 104 | 2,3,2932,6459,7677,2561,4573,1386 105 | 1,3,56082,3504,8906,18028,1480,2498 106 | 1,3,14100,2132,3445,1336,1491,548 107 | 1,3,15587,1014,3970,910,139,1378 108 | 2,3,1454,6337,10704,133,6830,1831 109 | 2,3,8797,10646,14886,2471,8969,1438 110 | 2,3,1531,8397,6981,247,2505,1236 111 | 2,3,1406,16729,28986,673,836,3 112 | 1,3,11818,1648,1694,2276,169,1647 113 | 2,3,12579,11114,17569,805,6457,1519 114 | 1,3,19046,2770,2469,8853,483,2708 115 | 1,3,14438,2295,1733,3220,585,1561 116 | 1,3,18044,1080,2000,2555,118,1266 117 | 1,3,11134,793,2988,2715,276,610 118 | 1,3,11173,2521,3355,1517,310,222 119 | 1,3,6990,3880,5380,1647,319,1160 120 | 1,3,20049,1891,2362,5343,411,933 121 | 1,3,8258,2344,2147,3896,266,635 122 | 1,3,17160,1200,3412,2417,174,1136 123 | 1,3,4020,3234,1498,2395,264,255 124 | 1,3,12212,201,245,1991,25,860 125 | 2,3,11170,10769,8814,2194,1976,143 126 | 1,3,36050,1642,2961,4787,500,1621 127 | 1,3,76237,3473,7102,16538,778,918 128 | 1,3,19219,1840,1658,8195,349,483 129 | 2,3,21465,7243,10685,880,2386,2749 130 | 1,3,140,8847,3823,142,1062,3 131 | 1,3,42312,926,1510,1718,410,1819 132 | 1,3,7149,2428,699,6316,395,911 133 | 1,3,2101,589,314,346,70,310 134 | 1,3,14903,2032,2479,576,955,328 135 | 1,3,9434,1042,1235,436,256,396 136 | 1,3,7388,1882,2174,720,47,537 137 | 1,3,6300,1289,2591,1170,199,326 138 | 1,3,4625,8579,7030,4575,2447,1542 139 | 1,3,3087,8080,8282,661,721,36 140 | 1,3,13537,4257,5034,155,249,3271 141 | 1,3,5387,4979,3343,825,637,929 142 | 1,3,17623,4280,7305,2279,960,2616 143 | 1,3,30379,13252,5189,321,51,1450 144 | 1,3,37036,7152,8253,2995,20,3 145 | 1,3,10405,1596,1096,8425,399,318 146 | 1,3,18827,3677,1988,118,516,201 147 | 2,3,22039,8384,34792,42,12591,4430 148 | 1,3,7769,1936,2177,926,73,520 149 | 1,3,9203,3373,2707,1286,1082,526 150 | 1,3,5924,584,542,4052,283,434 151 | 1,3,31812,1433,1651,800,113,1440 152 | 1,3,16225,1825,1765,853,170,1067 153 | 1,3,1289,3328,2022,531,255,1774 154 | 1,3,18840,1371,3135,3001,352,184 155 | 1,3,3463,9250,2368,779,302,1627 156 | 1,3,622,55,137,75,7,8 157 | 2,3,1989,10690,19460,233,11577,2153 158 | 2,3,3830,5291,14855,317,6694,3182 159 | 1,3,17773,1366,2474,3378,811,418 160 | 2,3,2861,6570,9618,930,4004,1682 161 | 2,3,355,7704,14682,398,8077,303 162 | 2,3,1725,3651,12822,824,4424,2157 163 | 1,3,12434,540,283,1092,3,2233 164 | 1,3,15177,2024,3810,2665,232,610 165 | 2,3,5531,15726,26870,2367,13726,446 166 | 2,3,5224,7603,8584,2540,3674,238 167 | 2,3,15615,12653,19858,4425,7108,2379 168 | 2,3,4822,6721,9170,993,4973,3637 169 | 1,3,2926,3195,3268,405,1680,693 170 | 1,3,5809,735,803,1393,79,429 171 | 1,3,5414,717,2155,2399,69,750 172 | 2,3,260,8675,13430,1116,7015,323 173 | 2,3,200,25862,19816,651,8773,6250 174 | 1,3,955,5479,6536,333,2840,707 175 | 2,3,514,7677,19805,937,9836,716 176 | 1,3,286,1208,5241,2515,153,1442 177 | 2,3,2343,7845,11874,52,4196,1697 178 | 1,3,45640,6958,6536,7368,1532,230 179 | 1,3,12759,7330,4533,1752,20,2631 180 | 1,3,11002,7075,4945,1152,120,395 181 | 1,3,3157,4888,2500,4477,273,2165 182 | 1,3,12356,6036,8887,402,1382,2794 183 | 1,3,112151,29627,18148,16745,4948,8550 184 | 1,3,694,8533,10518,443,6907,156 185 | 1,3,36847,43950,20170,36534,239,47943 186 | 1,3,327,918,4710,74,334,11 187 | 1,3,8170,6448,1139,2181,58,247 188 | 1,3,3009,521,854,3470,949,727 189 | 1,3,2438,8002,9819,6269,3459,3 190 | 2,3,8040,7639,11687,2758,6839,404 191 | 2,3,834,11577,11522,275,4027,1856 192 | 1,3,16936,6250,1981,7332,118,64 193 | 1,3,13624,295,1381,890,43,84 194 | 1,3,5509,1461,2251,547,187,409 195 | 2,3,180,3485,20292,959,5618,666 196 | 1,3,7107,1012,2974,806,355,1142 197 | 1,3,17023,5139,5230,7888,330,1755 198 | 1,1,30624,7209,4897,18711,763,2876 199 | 2,1,2427,7097,10391,1127,4314,1468 200 | 1,1,11686,2154,6824,3527,592,697 201 | 1,1,9670,2280,2112,520,402,347 202 | 2,1,3067,13240,23127,3941,9959,731 203 | 2,1,4484,14399,24708,3549,14235,1681 204 | 1,1,25203,11487,9490,5065,284,6854 205 | 1,1,583,685,2216,469,954,18 206 | 1,1,1956,891,5226,1383,5,1328 207 | 2,1,1107,11711,23596,955,9265,710 208 | 1,1,6373,780,950,878,288,285 209 | 2,1,2541,4737,6089,2946,5316,120 210 | 1,1,1537,3748,5838,1859,3381,806 211 | 2,1,5550,12729,16767,864,12420,797 212 | 1,1,18567,1895,1393,1801,244,2100 213 | 2,1,12119,28326,39694,4736,19410,2870 214 | 1,1,7291,1012,2062,1291,240,1775 215 | 1,1,3317,6602,6861,1329,3961,1215 216 | 2,1,2362,6551,11364,913,5957,791 217 | 1,1,2806,10765,15538,1374,5828,2388 218 | 2,1,2532,16599,36486,179,13308,674 219 | 1,1,18044,1475,2046,2532,130,1158 220 | 2,1,18,7504,15205,1285,4797,6372 221 | 1,1,4155,367,1390,2306,86,130 222 | 1,1,14755,899,1382,1765,56,749 223 | 1,1,5396,7503,10646,91,4167,239 224 | 1,1,5041,1115,2856,7496,256,375 225 | 2,1,2790,2527,5265,5612,788,1360 226 | 1,1,7274,659,1499,784,70,659 227 | 1,1,12680,3243,4157,660,761,786 228 | 2,1,20782,5921,9212,1759,2568,1553 229 | 1,1,4042,2204,1563,2286,263,689 230 | 1,1,1869,577,572,950,4762,203 231 | 1,1,8656,2746,2501,6845,694,980 232 | 2,1,11072,5989,5615,8321,955,2137 233 | 1,1,2344,10678,3828,1439,1566,490 234 | 1,1,25962,1780,3838,638,284,834 235 | 1,1,964,4984,3316,937,409,7 236 | 1,1,15603,2703,3833,4260,325,2563 237 | 1,1,1838,6380,2824,1218,1216,295 238 | 1,1,8635,820,3047,2312,415,225 239 | 1,1,18692,3838,593,4634,28,1215 240 | 1,1,7363,475,585,1112,72,216 241 | 1,1,47493,2567,3779,5243,828,2253 242 | 1,1,22096,3575,7041,11422,343,2564 243 | 1,1,24929,1801,2475,2216,412,1047 244 | 1,1,18226,659,2914,3752,586,578 245 | 1,1,11210,3576,5119,561,1682,2398 246 | 1,1,6202,7775,10817,1183,3143,1970 247 | 2,1,3062,6154,13916,230,8933,2784 248 | 1,1,8885,2428,1777,1777,430,610 249 | 1,1,13569,346,489,2077,44,659 250 | 1,1,15671,5279,2406,559,562,572 251 | 1,1,8040,3795,2070,6340,918,291 252 | 1,1,3191,1993,1799,1730,234,710 253 | 2,1,6134,23133,33586,6746,18594,5121 254 | 1,1,6623,1860,4740,7683,205,1693 255 | 1,1,29526,7961,16966,432,363,1391 256 | 1,1,10379,17972,4748,4686,1547,3265 257 | 1,1,31614,489,1495,3242,111,615 258 | 1,1,11092,5008,5249,453,392,373 259 | 1,1,8475,1931,1883,5004,3593,987 260 | 1,1,56083,4563,2124,6422,730,3321 261 | 1,1,53205,4959,7336,3012,967,818 262 | 1,1,9193,4885,2157,327,780,548 263 | 1,1,7858,1110,1094,6818,49,287 264 | 1,1,23257,1372,1677,982,429,655 265 | 1,1,2153,1115,6684,4324,2894,411 266 | 2,1,1073,9679,15445,61,5980,1265 267 | 1,1,5909,23527,13699,10155,830,3636 268 | 2,1,572,9763,22182,2221,4882,2563 269 | 1,1,20893,1222,2576,3975,737,3628 270 | 2,1,11908,8053,19847,1069,6374,698 271 | 1,1,15218,258,1138,2516,333,204 272 | 1,1,4720,1032,975,5500,197,56 273 | 1,1,2083,5007,1563,1120,147,1550 274 | 1,1,514,8323,6869,529,93,1040 275 | 1,3,36817,3045,1493,4802,210,1824 276 | 1,3,894,1703,1841,744,759,1153 277 | 1,3,680,1610,223,862,96,379 278 | 1,3,27901,3749,6964,4479,603,2503 279 | 1,3,9061,829,683,16919,621,139 280 | 1,3,11693,2317,2543,5845,274,1409 281 | 2,3,17360,6200,9694,1293,3620,1721 282 | 1,3,3366,2884,2431,977,167,1104 283 | 2,3,12238,7108,6235,1093,2328,2079 284 | 1,3,49063,3965,4252,5970,1041,1404 285 | 1,3,25767,3613,2013,10303,314,1384 286 | 1,3,68951,4411,12609,8692,751,2406 287 | 1,3,40254,640,3600,1042,436,18 288 | 1,3,7149,2247,1242,1619,1226,128 289 | 1,3,15354,2102,2828,8366,386,1027 290 | 1,3,16260,594,1296,848,445,258 291 | 1,3,42786,286,471,1388,32,22 292 | 1,3,2708,2160,2642,502,965,1522 293 | 1,3,6022,3354,3261,2507,212,686 294 | 1,3,2838,3086,4329,3838,825,1060 295 | 2,2,3996,11103,12469,902,5952,741 296 | 1,2,21273,2013,6550,909,811,1854 297 | 2,2,7588,1897,5234,417,2208,254 298 | 1,2,19087,1304,3643,3045,710,898 299 | 2,2,8090,3199,6986,1455,3712,531 300 | 2,2,6758,4560,9965,934,4538,1037 301 | 1,2,444,879,2060,264,290,259 302 | 2,2,16448,6243,6360,824,2662,2005 303 | 2,2,5283,13316,20399,1809,8752,172 304 | 2,2,2886,5302,9785,364,6236,555 305 | 2,2,2599,3688,13829,492,10069,59 306 | 2,2,161,7460,24773,617,11783,2410 307 | 2,2,243,12939,8852,799,3909,211 308 | 2,2,6468,12867,21570,1840,7558,1543 309 | 1,2,17327,2374,2842,1149,351,925 310 | 1,2,6987,1020,3007,416,257,656 311 | 2,2,918,20655,13567,1465,6846,806 312 | 1,2,7034,1492,2405,12569,299,1117 313 | 1,2,29635,2335,8280,3046,371,117 314 | 2,2,2137,3737,19172,1274,17120,142 315 | 1,2,9784,925,2405,4447,183,297 316 | 1,2,10617,1795,7647,1483,857,1233 317 | 2,2,1479,14982,11924,662,3891,3508 318 | 1,2,7127,1375,2201,2679,83,1059 319 | 1,2,1182,3088,6114,978,821,1637 320 | 1,2,11800,2713,3558,2121,706,51 321 | 2,2,9759,25071,17645,1128,12408,1625 322 | 1,2,1774,3696,2280,514,275,834 323 | 1,2,9155,1897,5167,2714,228,1113 324 | 1,2,15881,713,3315,3703,1470,229 325 | 1,2,13360,944,11593,915,1679,573 326 | 1,2,25977,3587,2464,2369,140,1092 327 | 1,2,32717,16784,13626,60869,1272,5609 328 | 1,2,4414,1610,1431,3498,387,834 329 | 1,2,542,899,1664,414,88,522 330 | 1,2,16933,2209,3389,7849,210,1534 331 | 1,2,5113,1486,4583,5127,492,739 332 | 1,2,9790,1786,5109,3570,182,1043 333 | 2,2,11223,14881,26839,1234,9606,1102 334 | 1,2,22321,3216,1447,2208,178,2602 335 | 2,2,8565,4980,67298,131,38102,1215 336 | 2,2,16823,928,2743,11559,332,3486 337 | 2,2,27082,6817,10790,1365,4111,2139 338 | 1,2,13970,1511,1330,650,146,778 339 | 1,2,9351,1347,2611,8170,442,868 340 | 1,2,3,333,7021,15601,15,550 341 | 1,2,2617,1188,5332,9584,573,1942 342 | 2,3,381,4025,9670,388,7271,1371 343 | 2,3,2320,5763,11238,767,5162,2158 344 | 1,3,255,5758,5923,349,4595,1328 345 | 2,3,1689,6964,26316,1456,15469,37 346 | 1,3,3043,1172,1763,2234,217,379 347 | 1,3,1198,2602,8335,402,3843,303 348 | 2,3,2771,6939,15541,2693,6600,1115 349 | 2,3,27380,7184,12311,2809,4621,1022 350 | 1,3,3428,2380,2028,1341,1184,665 351 | 2,3,5981,14641,20521,2005,12218,445 352 | 1,3,3521,1099,1997,1796,173,995 353 | 2,3,1210,10044,22294,1741,12638,3137 354 | 1,3,608,1106,1533,830,90,195 355 | 2,3,117,6264,21203,228,8682,1111 356 | 1,3,14039,7393,2548,6386,1333,2341 357 | 1,3,190,727,2012,245,184,127 358 | 1,3,22686,134,218,3157,9,548 359 | 2,3,37,1275,22272,137,6747,110 360 | 1,3,759,18664,1660,6114,536,4100 361 | 1,3,796,5878,2109,340,232,776 362 | 1,3,19746,2872,2006,2601,468,503 363 | 1,3,4734,607,864,1206,159,405 364 | 1,3,2121,1601,2453,560,179,712 365 | 1,3,4627,997,4438,191,1335,314 366 | 1,3,2615,873,1524,1103,514,468 367 | 2,3,4692,6128,8025,1619,4515,3105 368 | 1,3,9561,2217,1664,1173,222,447 369 | 1,3,3477,894,534,1457,252,342 370 | 1,3,22335,1196,2406,2046,101,558 371 | 1,3,6211,337,683,1089,41,296 372 | 2,3,39679,3944,4955,1364,523,2235 373 | 1,3,20105,1887,1939,8164,716,790 374 | 1,3,3884,3801,1641,876,397,4829 375 | 2,3,15076,6257,7398,1504,1916,3113 376 | 1,3,6338,2256,1668,1492,311,686 377 | 1,3,5841,1450,1162,597,476,70 378 | 2,3,3136,8630,13586,5641,4666,1426 379 | 1,3,38793,3154,2648,1034,96,1242 380 | 1,3,3225,3294,1902,282,68,1114 381 | 2,3,4048,5164,10391,130,813,179 382 | 1,3,28257,944,2146,3881,600,270 383 | 1,3,17770,4591,1617,9927,246,532 384 | 1,3,34454,7435,8469,2540,1711,2893 385 | 1,3,1821,1364,3450,4006,397,361 386 | 1,3,10683,21858,15400,3635,282,5120 387 | 1,3,11635,922,1614,2583,192,1068 388 | 1,3,1206,3620,2857,1945,353,967 389 | 1,3,20918,1916,1573,1960,231,961 390 | 1,3,9785,848,1172,1677,200,406 391 | 1,3,9385,1530,1422,3019,227,684 392 | 1,3,3352,1181,1328,5502,311,1000 393 | 1,3,2647,2761,2313,907,95,1827 394 | 1,3,518,4180,3600,659,122,654 395 | 1,3,23632,6730,3842,8620,385,819 396 | 1,3,12377,865,3204,1398,149,452 397 | 1,3,9602,1316,1263,2921,841,290 398 | 2,3,4515,11991,9345,2644,3378,2213 399 | 1,3,11535,1666,1428,6838,64,743 400 | 1,3,11442,1032,582,5390,74,247 401 | 1,3,9612,577,935,1601,469,375 402 | 1,3,4446,906,1238,3576,153,1014 403 | 1,3,27167,2801,2128,13223,92,1902 404 | 1,3,26539,4753,5091,220,10,340 405 | 1,3,25606,11006,4604,127,632,288 406 | 1,3,18073,4613,3444,4324,914,715 407 | 1,3,6884,1046,1167,2069,593,378 408 | 1,3,25066,5010,5026,9806,1092,960 409 | 2,3,7362,12844,18683,2854,7883,553 410 | 2,3,8257,3880,6407,1646,2730,344 411 | 1,3,8708,3634,6100,2349,2123,5137 412 | 1,3,6633,2096,4563,1389,1860,1892 413 | 1,3,2126,3289,3281,1535,235,4365 414 | 1,3,97,3605,12400,98,2970,62 415 | 1,3,4983,4859,6633,17866,912,2435 416 | 1,3,5969,1990,3417,5679,1135,290 417 | 2,3,7842,6046,8552,1691,3540,1874 418 | 2,3,4389,10940,10908,848,6728,993 419 | 1,3,5065,5499,11055,364,3485,1063 420 | 2,3,660,8494,18622,133,6740,776 421 | 1,3,8861,3783,2223,633,1580,1521 422 | 1,3,4456,5266,13227,25,6818,1393 423 | 2,3,17063,4847,9053,1031,3415,1784 424 | 1,3,26400,1377,4172,830,948,1218 425 | 2,3,17565,3686,4657,1059,1803,668 426 | 2,3,16980,2884,12232,874,3213,249 427 | 1,3,11243,2408,2593,15348,108,1886 428 | 1,3,13134,9347,14316,3141,5079,1894 429 | 1,3,31012,16687,5429,15082,439,1163 430 | 1,3,3047,5970,4910,2198,850,317 431 | 1,3,8607,1750,3580,47,84,2501 432 | 1,3,3097,4230,16483,575,241,2080 433 | 1,3,8533,5506,5160,13486,1377,1498 434 | 1,3,21117,1162,4754,269,1328,395 435 | 1,3,1982,3218,1493,1541,356,1449 436 | 1,3,16731,3922,7994,688,2371,838 437 | 1,3,29703,12051,16027,13135,182,2204 438 | 1,3,39228,1431,764,4510,93,2346 439 | 2,3,14531,15488,30243,437,14841,1867 440 | 1,3,10290,1981,2232,1038,168,2125 441 | 1,3,2787,1698,2510,65,477,52 442 | --------------------------------------------------------------------------------