├── README.md ├── bank_telemarketing_data_analysis.ipynb ├── images ├── Random Forest.png ├── jaccard系数.png ├── 余弦相似性.png ├── 曼哈顿距离.png └── 欧式距离.png ├── 关联分析 └── README.md ├── 分类算法 ├── README.md └── 用户流失预测分析与应用.ipynb ├── 回归分析 ├── README.md └── 大型促销活动前的销售预测.ipynb └── 聚类分析 ├── README.md └── 客户特征的聚类与探索性分析.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # 机器学习(Machine Learning) 2 | - 监督学习(Supervised Learning):训练样本带有信息标记(**y值**),利用已有的训练样本信息学习数据的规律预测未知的新样本标签。 3 | - 回归分析(Regression) 4 | - 分类(Classification) 5 | - 无监督学习(Unsupervised Learning):训练样本的标记信息时未知的,目测是为了揭露训练样本的内在数学,结构和信息,为进一步的数据挖掘提供基础。 6 | - 聚类(Clustering) 7 | ## 1 回归 8 | 回归是研究自变量x对因变量y影响的一种数据分析方法。主要应用场景是**进行预测和空值**,例如,计划制定、KPI制定、目标制定等;也可以基于预测的数据与实际数据进行比对和分析,确定事件发展程度并给未来行动提供方向性指导。 9 | 10 | 回归分析可应用于分析自变量和因变量的影响关系(已知x,求y),也可以分析自变量对因变量的影响方向(正向or反向影响)。 11 | 12 | **常用的回归算法包括**: 13 | - 线性回归 14 | - 二项式回归 15 | - 对数回归 16 | - 指数回归 17 | - 核SVM 18 | - 岭回归 19 | - Lasso 20 | 21 | 优点: 22 | - 数据模式和结果便于理解,如线性回归用y=ax+b的形式表达 23 | - 基于函数公式的业务应用中,可直接代入法求解,应用起来容易。 24 | 25 | 缺点: 26 | - 只能分析少量变量间相互关系,无法处理海量变量间的相互作用关系,尤其是变量共同因素对因变量的影响程度。 27 | 28 | ### 1.1 注意回归变量之间的共线性问题 29 | 检验共线性的三个指标: 30 | - 容忍度:[0,1],每个自变量(x)作为因变量(y)进行回归建模得到的残差比例。值越小,说明共线性问题的可能性越大。 31 | - 方差膨胀因子:容忍度的倒数,值越大则共线性问题越明显,通常以10作为判断边界。VIF<10,不存在多重共线性;10<=VIF<=100,存在较强的多重共线性;VIF>=100,存在严重多重共线性 32 | - 特征值:对自变量进行主成分分析,如果多个维度的特征值等于0,则可能存在比较严重的共线性。 33 | - 相关系数:R>0.8:可能存在较强的相关性。 34 | 35 | **解决共线性的5种常用方法**: 36 | - 增大样本量 37 | - 岭回归法 38 | - 逐步回归法 39 | - 主成分回归 40 | - 人工去除 41 | 42 | ### 1.2 相关系数、判定系数和回归系数之间的关系 43 | 假设一回归方程:y = 42.738x + 169.94,其中R的平方 = 0.5252,如果对这两个变量作相关性分析,还会得到相关系数R=0.72468551874050. 44 | 45 | 回归系数:42.738,自变量x的**回归系数**;0.5252是该方程的**判定系数**;0.724....是两个变量的**相关性系数**。 46 | - 判定系数:自变量对因变量的方差解释程度的值;计算公式为:回归平方和与总离差平方和之比值 47 | - 相关系数:又称为解释系数,是衡量变量间的相关程度或密切程度的值,本质是线性相关性的判断。 48 | 49 | 三者间的关系: 50 | - 判定系数是**所有参与模型中自变量的对因变量联合影响程度**,而非某个自变量的影响程度。 51 | - 回归系数与相关系数的关系:回归系数>0,相关系数取值为(0,1)。说明两者正相关;如果系数小于0,相关系数取值为(-1,0),说明两者负相关。 52 | 53 | ## 2 分类算法 54 | - 一种对**离散型随机变量**建模或预测的监督学习算法。 55 | - 使用案例包括邮件过滤、金融欺诈和预测雇员异动等输出为类别的任务。 56 | - 分类算法通常适用于预测一个类别(或类别的概率)而不是连续的数值。 57 | 58 | ### 2.1 分类算法的应用 59 | - 预测 60 | - 提炼应用规则 61 | - 提取变量特征 62 | - 处理缺失值 63 | 64 | ### 2.2 (基础)决策树 Decision Tree 65 | 决策树是一个树结构(可以是二叉树或非二叉树)。 66 | 67 | 其每个非叶节点表示一个**特征属性**上的测试,每个分支代表这个特征属性在某个值域上的输出,而每个叶节点存放一个类别。 68 | 69 | 使用决策树进行决策的过程就是从**根节点开始**,测试待分类项中相应的特征属性,并按照其值选择输出分支,知道到达叶子节点,将**叶子节点**存放的类别作为决策结果。 70 | 71 | **优点:** 72 | - 适用任何类型的数据(类别变量更普遍) 73 | - 直观、决策树可以提供可视化,便于理解 74 | - 模型预测出的结果简单,可解释性强 75 | - 适用于小规模数据 76 | 77 | **缺点:** 78 | - 当数据中存在连续变量的属性时,决策树表现并不是很好 79 | - 不稳定性,一点点的扰动或者改动都可能改动整棵树 80 | - 特殊属性增加时,错误增加的比较快 81 | - 很容易在训练数据中生成复杂的树结构,造成过拟合。 82 | 83 | ### 2.3 随机森林 Random Forest 84 | ![image](https://github.com/teamowu/Machine-Learning/blob/master/images/Random%20Forest.png) 85 | 86 | **优点:** 87 | - 随机森林不容易限于过拟合 88 | - 具有很好的抗噪声能力 89 | - 处理很高维度(feature多)的数据,并且不用做特征选择 90 | - 训练速度快 91 | 92 | ## 3 聚类 93 | - 一种无监督式机器学习(即**数据没有标注**) 94 | - 算法基于数据的内部结构寻找观察样本的自然族群(即集群) 95 | - 使用案例包括客户细分,新闻聚类,文章推荐等等。 96 | 97 | **用于衡量相似性的几个指标**: 98 | - 欧式距离 Euclidean distance 99 | - 定义:指在m维空间中两个点之间的真实距离,或者向量的自然长度(即该点到原点的距离) 100 | - 用途: 101 | 102 | ![image](https://github.com/teamowu/Machine-Learning/blob/master/images/%E6%AC%A7%E5%BC%8F%E8%B7%9D%E7%A6%BB.png) 103 | 104 | - 曼哈顿距离 Manhattan distance 105 | - 定义:就是表示两个点在标准坐标系上的绝对轴距之和。 106 | - 用途: 107 | 108 | ![image](https://github.com/teamowu/Machine-Learning/blob/master/images/%E6%9B%BC%E5%93%88%E9%A1%BF%E8%B7%9D%E7%A6%BB.png) 109 | 110 | - 余弦相似性 cosine 111 | - 定义:通过计算两个向量的夹角余弦值来评估他们的相似度。 112 | - 用途:新闻分类 113 | 114 | ![image](https://github.com/teamowu/Machine-Learning/blob/master/images/%E4%BD%99%E5%BC%A6%E7%9B%B8%E4%BC%BC%E6%80%A7.png) 115 | 116 | - Jaccard系数 117 | - 定义:给定两个集合A,B,Jaccard 系数定义为A与B交集的大小与A与B并集的大小的比值。 118 | - 用途:用于比较有限样本集之间的相似性与差异性。Jaccard系数值越大,样本相似度越高。 119 | 120 | ![image](https://github.com/teamowu/Machine-Learning/blob/master/images/jaccard%E7%B3%BB%E6%95%B0.png) 121 | 122 | ### 3.1 层次聚类 Hierarchical Cluster Analysis(HCA) 123 | 层次聚类是一系列基于以下概念的聚类算法: 124 | - 最开始由一个数据点作为一个集群 125 | - 对于每个集群,基于相同的标准合并集群 126 | - 重复这一过程直到只留下一个集群,因此就得到了集群的层次结构。 127 | 128 | ### 3.2 K均值聚类 K-means Clustering Algorithm 129 | - 聚类的度量基于样本点之间的几何距离(即在坐标平面中的距离) 130 | - 集群是围绕在聚类中心的族群,而集群呈现出类球状并具有相似的大小 131 | - 对于给定的k值,算法先给出一个初始的分组方法,然后通过反复迭代的方法改变分组,使得每一次改进之后的分组方案较前一次好 132 | 133 | ### 3.3 DBSCAN 134 | - 基于密度的算法,它将样本点的密度区域组成一个集群 135 | - DBSCAN不需要假设集群为球状,并且它的性能是可拓展的 136 | - 不需要每个点都被分配到一个集群中,这降低了集群的异常数据。 137 | 138 | 139 | -------------------------------------------------------------------------------- /bank_telemarketing_data_analysis.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Pyhton数据分析:银行电话营销数据分析" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## 一、前言\n", 15 | "### 项目介绍:\n", 16 | "在我们的日常生活中,银行为我们的财产提供了基本的安全保障,便利了我们的生活,而我们在银行的一些记录信息,方便了银行对我们进行一些行为预测,本项目则根据客户的以往记录信息,预测客户是否办理存款业务。\n" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "### 数据集分析\n", 24 | "该项目的数据集对应的任务是「分类任务」,本数据集共包含25317行,18列数据,其中字段为y的列是标签列,包含0和1两个值,0表示不订购业务,1表示订购业务,其他列是特征列\n", 25 | "### 字段的描述\n", 26 | "ID:客户唯一标识
\n", 27 | "age:客户年龄
\n", 28 | "job:客户的职业
\n", 29 | "marital:婚姻状况
\n", 30 | "education:受教育水平
\n", 31 | "default:是否有违约记录
\n", 32 | "balance:每年账户的平均余额
\n", 33 | "housing:是否有住房贷款
\n", 34 | "loan:是否有个人贷款
\n", 35 | "contact:与客户联系的沟通方式
\n", 36 | "day:最后一次联系的时间(几号)
\n", 37 | "month:最后一次联系的时间(月份)
\n", 38 | "duration:最后一次联系的交流时长
\n", 39 | "campaign:在本次活动中,与该客户交流过的次数
\n", 40 | "pdays:距离上次活动最后一次联系该客户,过去了多久
\n", 41 | "previous:在本次活动之前,与该客户交流过的次数
\n", 42 | "poutcome:上一次活动的结果
\n", 43 | "y:客户是否会订购定期存款业务,0表示不订购,1表示订购
" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "## 二、提出问题\n", 51 | "* 哪个分类模型更适合预测客户是否订购定期存款业务?\n", 52 | "\n", 53 | "### 分析流程\n", 54 | "* 查看数据\n", 55 | "* 特征处理\n", 56 | "* 选择模型\n", 57 | "* 数据归一化" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "## 三、探索性数据分析\n", 65 | "### 导入必备的库" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 1, 71 | "metadata": {}, 72 | "outputs": [], 73 | "source": [ 74 | "#Basic library\n", 75 | "import pandas as pd\n", 76 | "import numpy as np\n", 77 | "import matplotlib.pyplot as plt\n", 78 | "import seaborn as sns\n", 79 | "\n", 80 | "#machine learning\n", 81 | "from sklearn.model_selection import train_test_split\n", 82 | "from sklearn.model_selection import cross_val_score\n", 83 | "from sklearn.metrics import roc_auc_score\n", 84 | "from sklearn.preprocessing import LabelEncoder, MinMaxScaler\n", 85 | "\n", 86 | "#Model\n", 87 | "from sklearn.neighbors import KNeighborsClassifier\n", 88 | "from sklearn.linear_model import LogisticRegression\n", 89 | "from sklearn.tree import DecisionTreeClassifier\n", 90 | "\n", 91 | "#igonore warnings\n", 92 | "import warnings\n", 93 | "warnings.filterwarnings('ignore')" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "### 查看数据" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 2, 106 | "metadata": { 107 | "scrolled": true 108 | }, 109 | "outputs": [ 110 | { 111 | "data": { 112 | "text/html": [ 113 | "
\n", 114 | "\n", 127 | "\n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | "
agejobmaritaleducationdefaultbalancehousingloancontactdaymonthdurationcampaignpdayspreviouspoutcomey
ID
143managementmarriedtertiaryno291yesnounknown9may1502-10unknown0
242techniciandivorcedprimaryno5076yesnocellular7apr9912512other0
347admin.marriedsecondaryno104yesyescellular14jul772-10unknown0
428managementsinglesecondaryno-994yesyescellular18jul1742-10unknown0
542techniciandivorcedsecondaryno2974yesnounknown21may1875-10unknown0
\n", 273 | "
" 274 | ], 275 | "text/plain": [ 276 | " age job marital education default balance housing loan \\\n", 277 | "ID \n", 278 | "1 43 management married tertiary no 291 yes no \n", 279 | "2 42 technician divorced primary no 5076 yes no \n", 280 | "3 47 admin. married secondary no 104 yes yes \n", 281 | "4 28 management single secondary no -994 yes yes \n", 282 | "5 42 technician divorced secondary no 2974 yes no \n", 283 | "\n", 284 | " contact day month duration campaign pdays previous poutcome y \n", 285 | "ID \n", 286 | "1 unknown 9 may 150 2 -1 0 unknown 0 \n", 287 | "2 cellular 7 apr 99 1 251 2 other 0 \n", 288 | "3 cellular 14 jul 77 2 -1 0 unknown 0 \n", 289 | "4 cellular 18 jul 174 2 -1 0 unknown 0 \n", 290 | "5 unknown 21 may 187 5 -1 0 unknown 0 " 291 | ] 292 | }, 293 | "execution_count": 2, 294 | "metadata": {}, 295 | "output_type": "execute_result" 296 | } 297 | ], 298 | "source": [ 299 | "data_all = pd.read_csv('./dataFile/train_set.csv',index_col='ID')\n", 300 | "data_all.head()" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": 3, 306 | "metadata": {}, 307 | "outputs": [ 308 | { 309 | "name": "stdout", 310 | "output_type": "stream", 311 | "text": [ 312 | "\n", 313 | "Int64Index: 25317 entries, 1 to 25317\n", 314 | "Data columns (total 17 columns):\n", 315 | "age 25317 non-null int64\n", 316 | "job 25317 non-null object\n", 317 | "marital 25317 non-null object\n", 318 | "education 25317 non-null object\n", 319 | "default 25317 non-null object\n", 320 | "balance 25317 non-null int64\n", 321 | "housing 25317 non-null object\n", 322 | "loan 25317 non-null object\n", 323 | "contact 25317 non-null object\n", 324 | "day 25317 non-null int64\n", 325 | "month 25317 non-null object\n", 326 | "duration 25317 non-null int64\n", 327 | "campaign 25317 non-null int64\n", 328 | "pdays 25317 non-null int64\n", 329 | "previous 25317 non-null int64\n", 330 | "poutcome 25317 non-null object\n", 331 | "y 25317 non-null int64\n", 332 | "dtypes: int64(8), object(9)\n", 333 | "memory usage: 3.5+ MB\n" 334 | ] 335 | } 336 | ], 337 | "source": [ 338 | "#查看数据的基本信息\n", 339 | "data_all.info()" 340 | ] 341 | }, 342 | { 343 | "cell_type": "code", 344 | "execution_count": 4, 345 | "metadata": {}, 346 | "outputs": [ 347 | { 348 | "data": { 349 | "text/plain": [ 350 | "(25317, 17)" 351 | ] 352 | }, 353 | "execution_count": 4, 354 | "metadata": {}, 355 | "output_type": "execute_result" 356 | } 357 | ], 358 | "source": [ 359 | "# 查看数据data_all的维度\n", 360 | "data_all.shape" 361 | ] 362 | }, 363 | { 364 | "cell_type": "code", 365 | "execution_count": 6, 366 | "metadata": { 367 | "scrolled": true 368 | }, 369 | "outputs": [ 370 | { 371 | "data": { 372 | "text/plain": [ 373 | "age False\n", 374 | "job False\n", 375 | "marital False\n", 376 | "education False\n", 377 | "default False\n", 378 | "balance False\n", 379 | "housing False\n", 380 | "loan False\n", 381 | "contact False\n", 382 | "day False\n", 383 | "month False\n", 384 | "duration False\n", 385 | "campaign False\n", 386 | "pdays False\n", 387 | "previous False\n", 388 | "poutcome False\n", 389 | "y False\n", 390 | "dtype: bool" 391 | ] 392 | }, 393 | "execution_count": 6, 394 | "metadata": {}, 395 | "output_type": "execute_result" 396 | } 397 | ], 398 | "source": [ 399 | "# 查看每列数据是否包含缺失值。\n", 400 | "data_all.isnull().any()" 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "metadata": {}, 406 | "source": [ 407 | "显然,数据集中不包含任何缺失值。" 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "metadata": {}, 413 | "source": [ 414 | "### 特征处理" 415 | ] 416 | }, 417 | { 418 | "cell_type": "code", 419 | "execution_count": 7, 420 | "metadata": {}, 421 | "outputs": [ 422 | { 423 | "data": { 424 | "text/plain": [ 425 | "['job',\n", 426 | " 'marital',\n", 427 | " 'education',\n", 428 | " 'default',\n", 429 | " 'housing',\n", 430 | " 'loan',\n", 431 | " 'contact',\n", 432 | " 'month',\n", 433 | " 'poutcome']" 434 | ] 435 | }, 436 | "execution_count": 7, 437 | "metadata": {}, 438 | "output_type": "execute_result" 439 | } 440 | ], 441 | "source": [ 442 | "# 获得data_all中列的数据类型是object的列的列名。\n", 443 | "data_obj_col = data_all.select_dtypes('object').columns.to_list()\n", 444 | "data_obj_col" 445 | ] 446 | }, 447 | { 448 | "cell_type": "code", 449 | "execution_count": 9, 450 | "metadata": { 451 | "scrolled": true 452 | }, 453 | "outputs": [ 454 | { 455 | "data": { 456 | "text/plain": [ 457 | "(25317, 9)" 458 | ] 459 | }, 460 | "execution_count": 9, 461 | "metadata": {}, 462 | "output_type": "execute_result" 463 | } 464 | ], 465 | "source": [ 466 | "# 获得数据集中列的数据类型为object的所有数据,以及打印数据的维度\n", 467 | "data_obj=data_all[data_obj_col]\n", 468 | "data_obj.shape" 469 | ] 470 | }, 471 | { 472 | "cell_type": "code", 473 | "execution_count": 10, 474 | "metadata": {}, 475 | "outputs": [], 476 | "source": [ 477 | "# 依据data_obj_col,获得数据集中列的数据类型为数值型的列的列。\n", 478 | "data_num_col = data_all.columns.difference(data_obj_col)" 479 | ] 480 | }, 481 | { 482 | "cell_type": "code", 483 | "execution_count": 11, 484 | "metadata": {}, 485 | "outputs": [ 486 | { 487 | "data": { 488 | "text/plain": [ 489 | "(25317, 8)" 490 | ] 491 | }, 492 | "execution_count": 11, 493 | "metadata": {}, 494 | "output_type": "execute_result" 495 | } 496 | ], 497 | "source": [ 498 | "# 获得数据集中列的数据类型为数值型的所有数据,以及数据的维度\n", 499 | "data_num=data_all[data_num_col]\n", 500 | "data_num.shape" 501 | ] 502 | }, 503 | { 504 | "cell_type": "code", 505 | "execution_count": 12, 506 | "metadata": {}, 507 | "outputs": [ 508 | { 509 | "data": { 510 | "text/plain": [ 511 | "['age', 'balance', 'campaign', 'day', 'duration', 'pdays', 'previous', 'y']" 512 | ] 513 | }, 514 | "execution_count": 12, 515 | "metadata": {}, 516 | "output_type": "execute_result" 517 | } 518 | ], 519 | "source": [ 520 | "# 打印data_num的列名\n", 521 | "data_num.columns.to_list()" 522 | ] 523 | }, 524 | { 525 | "cell_type": "markdown", 526 | "metadata": {}, 527 | "source": [ 528 | "从以上输出的数据可知:\n", 529 | "* Object类型的列有9个\n", 530 | "* 数值类型的列有8个,数值类型的列名分别为:'age', 'balance', 'campaign', 'day', 'duration', 'pdays', 'previous','y'" 531 | ] 532 | }, 533 | { 534 | "cell_type": "markdown", 535 | "metadata": {}, 536 | "source": [ 537 | "### 标签编码\n", 538 | "将object类型的列中只有两个值的列进行标签编码,将编码后的列添加到data_num数据集中" 539 | ] 540 | }, 541 | { 542 | "cell_type": "code", 543 | "execution_count": 13, 544 | "metadata": {}, 545 | "outputs": [ 546 | { 547 | "data": { 548 | "text/plain": [ 549 | "['default', 'housing', 'loan']" 550 | ] 551 | }, 552 | "execution_count": 13, 553 | "metadata": {}, 554 | "output_type": "execute_result" 555 | } 556 | ], 557 | "source": [ 558 | "# 计算data_obj中每列中的唯一值;然后得到每一列中只有两个值的列名\n", 559 | "two_unique_cols = data_obj.nunique()[data_obj.nunique()==2].index.tolist()\n", 560 | "two_unique_cols" 561 | ] 562 | }, 563 | { 564 | "cell_type": "code", 565 | "execution_count": 14, 566 | "metadata": {}, 567 | "outputs": [], 568 | "source": [ 569 | "# 对列中唯一值只有两个值的列进行标签编码,将标签编码后的数据存到data_num数据集中\n", 570 | "y = data_all[two_unique_cols].apply(LabelEncoder().fit_transform)\n", 571 | "data_num = pd.concat([y, data_num],ignore_index=False, sort=True, axis=1)" 572 | ] 573 | }, 574 | { 575 | "cell_type": "code", 576 | "execution_count": 15, 577 | "metadata": { 578 | "scrolled": true 579 | }, 580 | "outputs": [ 581 | { 582 | "name": "stdout", 583 | "output_type": "stream", 584 | "text": [ 585 | "data_num的维度是:(25317, 11)\n" 586 | ] 587 | }, 588 | { 589 | "data": { 590 | "text/plain": [ 591 | "Index(['default', 'housing', 'loan', 'age', 'balance', 'campaign', 'day',\n", 592 | " 'duration', 'pdays', 'previous', 'y'],\n", 593 | " dtype='object')" 594 | ] 595 | }, 596 | "execution_count": 15, 597 | "metadata": {}, 598 | "output_type": "execute_result" 599 | } 600 | ], 601 | "source": [ 602 | "# 打印data_num的维度和列名\n", 603 | "print('data_num的维度是:{}'.format(data_num.shape))\n", 604 | "data_num.columns" 605 | ] 606 | }, 607 | { 608 | "cell_type": "markdown", 609 | "metadata": {}, 610 | "source": [ 611 | "### 数据抽样\n", 612 | "由于建模的时候,样本不平衡会对模型的训练产生很大的影响,这里将采取简单的方法对数据进行抽样,以使y中每类的样本相对平衡" 613 | ] 614 | }, 615 | { 616 | "cell_type": "code", 617 | "execution_count": 16, 618 | "metadata": { 619 | "scrolled": false 620 | }, 621 | "outputs": [ 622 | { 623 | "data": { 624 | "text/plain": [ 625 | "" 626 | ] 627 | }, 628 | "execution_count": 16, 629 | "metadata": {}, 630 | "output_type": "execute_result" 631 | }, 632 | { 633 | "data": { 634 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZUAAAEKCAYAAADaa8itAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAADwtJREFUeJzt3X+s3fVdx/HnCwo6dYRiC8MW7bI0xjqVsYY17h8cSVdItDiHGcmkTpIuC9MtMWboH3YBSaZumrEgSXUddJlD4japSWdtms3F7BcXR/jp0htEuBbphbINJRkpvv3jfu84a0/b09vPud97uM9HcnLO930+3+95f5Ob+8r3+/me70lVIUlSC2f13YAk6dXDUJEkNWOoSJKaMVQkSc0YKpKkZgwVSVIzhookqRlDRZLUjKEiSWpmRd8NLLZVq1bVunXr+m5DkibK/fff/2xVrT7VuGUXKuvWrWNqaqrvNiRpoiT5z1HGefpLktSMoSJJasZQkSQ1Y6hIkpoxVCRJzRgqkqRmDBVJUjOGiiSpGUNFktTMsvtG/Zl68x/s7rsFLUH3//n1fbcgLQkeqUiSmjFUJEnNGCqSpGYMFUlSM4aKJKkZQ0WS1IyhIklqxlCRJDVjqEiSmjFUJEnNGCqSpGYMFUlSM4aKJKkZQ0WS1IyhIklqxlCRJDVjqEiSmhlbqCS5JMmXkjyW5JEkH+jqFyTZn+Rg97yyqyfJbUmmkzyY5LKBbW3rxh9Msm2g/uYkD3Xr3JYk49ofSdKpjfNI5Sjw+1X1c8Am4MYkG4CbgANVtR440C0DXAWs7x7bgTtgLoSAHcBbgMuBHfNB1I3ZPrDeljHujyTpFMYWKlX1dFX9W/f6BeAxYA2wFbirG3YXcE33eiuwu+Z8HTg/ycXA24H9VXWkqp4H9gNbuvfOq6qvVVUBuwe2JUnqwaLMqSRZB7wJ+AZwUVU9DXPBA1zYDVsDPDWw2kxXO1l9ZkhdktSTsYdKkp8APgd8sKq+d7KhQ2q1gPqwHrYnmUoyNTs7e6qWJUkLNNZQSXIOc4Hymar6fFd+pjt1Rfd8uKvPAJcMrL4WOHSK+toh9eNU1c6q2lhVG1evXn1mOyVJOqFxXv0V4JPAY1X1FwNv7QHmr+DaBtw7UL++uwpsE/Dd7vTYPmBzkpXdBP1mYF/33gtJNnWfdf3AtiRJPVgxxm2/Ffgt4KEkD3S1PwI+AtyT5AbgSeDa7r29wNXANPAi8B6AqjqS5Bbgvm7czVV1pHv9PuBO4DXAF7uHJKknYwuVqvpXhs97AFw5ZHwBN55gW7uAXUPqU8Abz6BNSVJDfqNektSMoSJJasZQkSQ1Y6hIkpoxVCRJzRgqkqRmDBVJUjOGiiSpGUNFktSMoSJJasZQkSQ1Y6hIkpoxVCRJzRgqkqRmDBVJUjOGiiSpGUNFktSMoSJJasZQkSQ1Y6hIkpoxVCRJzRgqkqRmDBVJUjOGiiSpGUNFktSMoSJJasZQkSQ1Y6hIkpoxVCRJzRgqkqRmDBVJUjOGiiSpGUNFktSMoSJJasZQkSQ1Y6hIkpoxVCRJzYwtVJLsSnI4ycMDtQ8n+a8kD3SPqwfe+8Mk00m+neTtA/UtXW06yU0D9dcn+UaSg0n+Lsm549oXSdJoxnmkciewZUj9L6vq0u6xFyDJBuBdwM936/xVkrOTnA3cDlwFbACu68YC/Gm3rfXA88ANY9wXSdIIxhYqVfUV4MiIw7cCd1fV96vqP4Bp4PLuMV1Vj1fVS8DdwNYkAd4G/H23/l3ANU13QJJ02vqYU3l/kge702Mru9oa4KmBMTNd7UT1nwS+U1VHj6lLknq02KFyB/AG4FLgaeBjXT1DxtYC6kMl2Z5kKsnU7Ozs6XUsSRrZooZKVT1TVS9X1f8Bf83c6S2YO9K4ZGDoWuDQSerPAucnWXFM/USfu7OqNlbVxtWrV7fZGUnScRY1VJJcPLD468D8lWF7gHcl+ZEkrwfWA98E7gPWd1d6ncvcZP6eqirgS8A7u/W3Afcuxj5Ikk5sxamHLEySzwJXAKuSzAA7gCuSXMrcqaongPcCVNUjSe4BHgWOAjdW1cvddt4P7APOBnZV1SPdR3wIuDvJnwDfAj45rn2RJI1mbKFSVdcNKZ/wH39V3QrcOqS+F9g7pP44r5w+kyQtAX6jXpLUjKEiSWrGUJEkNWOoSJKaMVQkSc0YKpKkZgwVSVIzhookqRlDRZLUjKEiSWrGUJEkNWOoSJKaMVQkSc2MFCpJDoxSkyQtbye99X2SHwV+jLnfRFnJKz/jex7wU2PuTZI0YU71eyrvBT7IXIDczyuh8j3g9jH2JUmaQCcNlar6OPDxJL9bVZ9YpJ4kSRNqpF9+rKpPJPllYN3gOlW1e0x9SZIm0EihkuTTwBuAB4CXu3IBhook6QdG/Y36jcCGqqpxNiNJmmyjfk/lYeB142xEkjT5Rj1SWQU8muSbwPfni1X1a2PpSpI0kUYNlQ+PswlJ0qvDqFd//cu4G5EkTb5Rr/56gbmrvQDOBc4B/reqzhtXY5KkyTPqkcprB5eTXANcPpaOJEkTa0F3Ka6qfwDe1rgXSdKEG/X01zsGFs9i7nsrfmdFkvRDRr3661cHXh8FngC2Nu9GkjTRRp1Tec+4G5EkTb5Rf6RrbZIvJDmc5Jkkn0uydtzNSZImy6gT9Z8C9jD3uyprgH/sapIk/cCoobK6qj5VVUe7x53A6jH2JUmaQKOGyrNJ3p3k7O7xbuC5cTYmSZo8o4bK7wC/Cfw38DTwTsDJe0nSDxn1kuJbgG1V9TxAkguAjzIXNpIkAaMfqfzifKAAVNUR4E3jaUmSNKlGDZWzkqycX+iOVEY9ypEkLROjhsrHgK8muSXJzcBXgT872QpJdnXfa3l4oHZBkv1JDnbPK7t6ktyWZDrJg0kuG1hnWzf+YJJtA/U3J3moW+e2JDmdHZcktTdSqFTVbuA3gGeAWeAdVfXpU6x2J7DlmNpNwIGqWg8c6JYBrgLWd4/twB3wgyOiHcBbmLsr8o6BI6Y7urHz6x37WZKkRTbyKayqehR49DTGfyXJumPKW4Erutd3AV8GPtTVd1dVAV9Pcn6Si7ux+7s5HJLsB7Yk+TJwXlV9ravvBq4Bvjhqf5Kk9hZ06/szcFFVPQ3QPV/Y1dcATw2Mm+lqJ6vPDKkPlWR7kqkkU7Ozs2e8E5Kk4RY7VE5k2HxILaA+VFXtrKqNVbVx9WpvBCBJ47LYofJMd1qL7vlwV58BLhkYtxY4dIr62iF1SVKPFjtU9gDzV3BtA+4dqF/fXQW2Cfhud3psH7A5ycpugn4zsK9774Ukm7qrvq4f2JYkqSdj+65Jks8yN9G+KskMc1dxfQS4J8kNwJPAtd3wvcDVwDTwIt0tYKrqSJJbgPu6cTfPT9oD72PuCrPXMDdB7yS9JPVsbKFSVded4K0rh4wt4MYTbGcXsGtIfQp445n0KElqa6lM1EuSXgUMFUlSM4aKJKkZQ0WS1IyhIklqxlCRJDVjqEiSmjFUJEnNGCqSpGYMFUlSM4aKJKkZQ0WS1IyhIklqxlCRJDVjqEiSmjFUJEnNGCqSpGYMFUlSM4aKJKkZQ0WS1IyhIklqxlCRJDVjqEiSmjFUJEnNGCqSpGYMFUlSM4aKJKkZQ0WS1IyhIklqxlCRJDVjqEiSmjFUJEnNGCqSpGYMFUlSM4aKJKkZQ0WS1IyhIklqppdQSfJEkoeSPJBkqqtdkGR/koPd88quniS3JZlO8mCSywa2s60bfzDJtj72RZL0ij6PVH6lqi6tqo3d8k3AgapaDxzolgGuAtZ3j+3AHTAXQsAO4C3A5cCO+SCSJPVjKZ3+2grc1b2+C7hmoL675nwdOD/JxcDbgf1VdaSqngf2A1sWu2lJ0iv6CpUC/jnJ/Um2d7WLquppgO75wq6+BnhqYN2Zrnai+nGSbE8ylWRqdna24W5Ikgat6Olz31pVh5JcCOxP8u8nGZshtTpJ/fhi1U5gJ8DGjRuHjpEknblejlSq6lD3fBj4AnNzIs90p7Xong93w2eASwZWXwscOkldktSTRQ+VJD+e5LXzr4HNwMPAHmD+Cq5twL3d6z3A9d1VYJuA73anx/YBm5Os7CboN3c1SVJP+jj9dRHwhSTzn/+3VfVPSe4D7klyA/AkcG03fi9wNTANvAi8B6CqjiS5BbivG3dzVR1ZvN2QJB1r0UOlqh4HfmlI/TngyiH1Am48wbZ2Abta9yhJWpildEmxJGnCGSqSpGYMFUlSM4aKJKkZQ0WS1IyhIklqxlCRJDVjqEiSmjFUJEnNGCqSpGYMFUlSM4aKJKkZQ0WS1Exfv/woaQyevPkX+m5BS9BP//FDi/ZZHqlIkpoxVCRJzRgqkqRmDBVJUjOGiiSpGUNFktSMoSJJasZQkSQ1Y6hIkpoxVCRJzRgqkqRmDBVJUjOGiiSpGUNFktSMoSJJasZQkSQ1Y6hIkpoxVCRJzRgqkqRmDBVJUjOGiiSpGUNFktSMoSJJambiQyXJliTfTjKd5Ka++5Gk5WyiQyXJ2cDtwFXABuC6JBv67UqSlq+JDhXgcmC6qh6vqpeAu4GtPfckScvWpIfKGuCpgeWZriZJ6sGKvhs4QxlSq+MGJduB7d3i/yT59li7Wj5WAc/23cRSkI9u67sFHc+/z3k7hv2rPG0/M8qgSQ+VGeCSgeW1wKFjB1XVTmDnYjW1XCSZqqqNffchDePfZz8m/fTXfcD6JK9Pci7wLmBPzz1J0rI10UcqVXU0yfuBfcDZwK6qeqTntiRp2ZroUAGoqr3A3r77WKY8pailzL/PHqTquHltSZIWZNLnVCRJS4ihogXx9jhaqpLsSnI4ycN997IcGSo6bd4eR0vcncCWvptYrgwVLYS3x9GSVVVfAY703cdyZahoIbw9jqShDBUtxEi3x5G0/BgqWoiRbo8jafkxVLQQ3h5H0lCGik5bVR0F5m+P8xhwj7fH0VKR5LPA14CfTTKT5Ia+e1pO/Ea9JKkZj1QkSc0YKpKkZgwVSVIzhookqRlDRZLUjKEiSWrGUJEkNWOoSD1KckuSDwws35rk9/rsSToTfvlR6lGSdcDnq+qyJGcBB4HLq+q5XhuTFmhF3w1Iy1lVPZHkuSRvAi4CvmWgaJIZKlL//gb4beB1wK5+W5HOjKe/pJ51d3p+CDgHWF9VL/fckrRgHqlIPauql5J8CfiOgaJJZ6hIPesm6DcB1/bdi3SmvKRY6lGSDcA0cKCqDvbdj3SmnFORJDXjkYokqRlDRZLUjKEiSWrGUJEkNWOoSJKaMVQkSc38P2E7ocOY2wiuAAAAAElFTkSuQmCC\n", 635 | "text/plain": [ 636 | "
" 637 | ] 638 | }, 639 | "metadata": { 640 | "needs_background": "light" 641 | }, 642 | "output_type": "display_data" 643 | } 644 | ], 645 | "source": [ 646 | "# 使用countplot对y列的唯一值进行画图显示\n", 647 | "sns.countplot(data_num['y'])" 648 | ] 649 | }, 650 | { 651 | "cell_type": "code", 652 | "execution_count": 17, 653 | "metadata": { 654 | "scrolled": true 655 | }, 656 | "outputs": [ 657 | { 658 | "name": "stdout", 659 | "output_type": "stream", 660 | "text": [ 661 | "0有:22356,1有:2961\n" 662 | ] 663 | } 664 | ], 665 | "source": [ 666 | "# 标签列y中存在两个唯一值,现在计算每个值的个数.\n", 667 | "data_num['y'].value_counts()\n", 668 | "value_0 = (data_num['y']==0).sum()\n", 669 | "value_1 = (data_num['y']==1).sum()\n", 670 | "print(\"0有:{},1有:{}\".format(value_0,value_1))" 671 | ] 672 | }, 673 | { 674 | "cell_type": "markdown", 675 | "metadata": {}, 676 | "source": [ 677 | "从以上的输出和图的显示来看,y标签列的每类的样本数不一致,因此为保抽取样本的平衡,对y==0的样本进行随机抽样。" 678 | ] 679 | }, 680 | { 681 | "cell_type": "code", 682 | "execution_count": 18, 683 | "metadata": { 684 | "scrolled": true 685 | }, 686 | "outputs": [], 687 | "source": [ 688 | "# 以上可知y中列存在0和1两个值,由于0包含的元素个数远远大于1的个数,现在使用sample从y为0的样本中随机抽取一些样本,\n", 689 | "# 要求0包含的样本数和1的样本数相同,且随机种子设定为22,\n", 690 | "# sample随机抽样方法:sample(n=样本数,random_state=随机种子数)\n", 691 | "data_num_0 = data_num[data_num['y']==0].sample(n=value_1, random_state=22)" 692 | ] 693 | }, 694 | { 695 | "cell_type": "code", 696 | "execution_count": 19, 697 | "metadata": {}, 698 | "outputs": [], 699 | "source": [ 700 | "# 得到数据集中y列中为1的样本\n", 701 | "data_num_1=data_num[data_num['y']!=0]" 702 | ] 703 | }, 704 | { 705 | "cell_type": "code", 706 | "execution_count": 20, 707 | "metadata": {}, 708 | "outputs": [ 709 | { 710 | "data": { 711 | "text/plain": [ 712 | "(5922, 11)" 713 | ] 714 | }, 715 | "execution_count": 20, 716 | "metadata": {}, 717 | "output_type": "execute_result" 718 | } 719 | ], 720 | "source": [ 721 | "# 将y为1的样本和随机抽取的y为0的样本进行合并,并且打印合并后数据集的维度\n", 722 | "data_num_sample = pd.concat([data_num_0,data_num_1], axis=0, ignore_index=False)\n", 723 | "data_num_sample.shape" 724 | ] 725 | }, 726 | { 727 | "cell_type": "code", 728 | "execution_count": 21, 729 | "metadata": {}, 730 | "outputs": [ 731 | { 732 | "data": { 733 | "text/html": [ 734 | "
\n", 735 | "\n", 748 | "\n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | "
defaulthousingloanagebalancecampaigndaydurationpdayspreviousy
count5922.0000005922.0000005922.0000005922.0000005922.0000005922.0000005922.0000005922.0000005922.0000005922.0000005922.000000
mean0.0138470.4689290.12664641.1703821616.0460992.47332015.489868378.14032452.9181020.8635600.500000
std0.1168640.4990760.33260512.0124573371.6011682.7459048.419602353.409237109.9765872.2841460.500042
min0.0000000.0000000.00000018.000000-1965.0000001.0000001.0000004.000000-1.0000000.0000000.000000
25%0.0000000.0000000.00000032.000000130.0000001.0000008.000000145.000000-1.0000000.0000000.000000
50%0.0000000.0000000.00000039.000000574.0000002.00000015.000000259.000000-1.0000000.0000000.500000
75%0.0000001.0000000.00000049.0000001854.7500003.00000021.000000492.00000077.7500001.0000001.000000
max1.0000001.0000001.00000095.000000102127.00000044.00000031.0000003881.000000854.00000058.0000001.000000
\n", 880 | "
" 881 | ], 882 | "text/plain": [ 883 | " default housing loan age balance \\\n", 884 | "count 5922.000000 5922.000000 5922.000000 5922.000000 5922.000000 \n", 885 | "mean 0.013847 0.468929 0.126646 41.170382 1616.046099 \n", 886 | "std 0.116864 0.499076 0.332605 12.012457 3371.601168 \n", 887 | "min 0.000000 0.000000 0.000000 18.000000 -1965.000000 \n", 888 | "25% 0.000000 0.000000 0.000000 32.000000 130.000000 \n", 889 | "50% 0.000000 0.000000 0.000000 39.000000 574.000000 \n", 890 | "75% 0.000000 1.000000 0.000000 49.000000 1854.750000 \n", 891 | "max 1.000000 1.000000 1.000000 95.000000 102127.000000 \n", 892 | "\n", 893 | " campaign day duration pdays previous \\\n", 894 | "count 5922.000000 5922.000000 5922.000000 5922.000000 5922.000000 \n", 895 | "mean 2.473320 15.489868 378.140324 52.918102 0.863560 \n", 896 | "std 2.745904 8.419602 353.409237 109.976587 2.284146 \n", 897 | "min 1.000000 1.000000 4.000000 -1.000000 0.000000 \n", 898 | "25% 1.000000 8.000000 145.000000 -1.000000 0.000000 \n", 899 | "50% 2.000000 15.000000 259.000000 -1.000000 0.000000 \n", 900 | "75% 3.000000 21.000000 492.000000 77.750000 1.000000 \n", 901 | "max 44.000000 31.000000 3881.000000 854.000000 58.000000 \n", 902 | "\n", 903 | " y \n", 904 | "count 5922.000000 \n", 905 | "mean 0.500000 \n", 906 | "std 0.500042 \n", 907 | "min 0.000000 \n", 908 | "25% 0.000000 \n", 909 | "50% 0.500000 \n", 910 | "75% 1.000000 \n", 911 | "max 1.000000 " 912 | ] 913 | }, 914 | "execution_count": 21, 915 | "metadata": {}, 916 | "output_type": "execute_result" 917 | } 918 | ], 919 | "source": [ 920 | "# 对数据集拆分为训练集和测试集之前,先检查下数据是否存在问题,调用describe,打印下数据的统计信息\n", 921 | "data_num_sample.describe()" 922 | ] 923 | }, 924 | { 925 | "cell_type": "code", 926 | "execution_count": 22, 927 | "metadata": {}, 928 | "outputs": [ 929 | { 930 | "data": { 931 | "text/plain": [ 932 | "Index(['default', 'housing', 'loan', 'age', 'balance', 'campaign', 'day',\n", 933 | " 'duration', 'pdays', 'previous'],\n", 934 | " dtype='object')" 935 | ] 936 | }, 937 | "execution_count": 22, 938 | "metadata": {}, 939 | "output_type": "execute_result" 940 | } 941 | ], 942 | "source": [ 943 | "# 将数据集中的y标签列数据存到y变量中,将其他的(特征)列存到X变量中,并且打印X的所有列的列名\n", 944 | "y = data_num_sample['y']\n", 945 | "X = data_num_sample.drop(columns='y',axis=1)\n", 946 | "X.columns" 947 | ] 948 | }, 949 | { 950 | "cell_type": "code", 951 | "execution_count": 23, 952 | "metadata": {}, 953 | "outputs": [], 954 | "source": [ 955 | "# 将数据集X和y进行随机切分,得到训练集和测试集数据,随机种子设定为22,测试集占比1/4。\n", 956 | "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=22, test_size=1/4)" 957 | ] 958 | }, 959 | { 960 | "cell_type": "code", 961 | "execution_count": 24, 962 | "metadata": {}, 963 | "outputs": [ 964 | { 965 | "name": "stdout", 966 | "output_type": "stream", 967 | "text": [ 968 | "X_train的维度为(4441, 10),X_test的维度为(1481, 10)\n" 969 | ] 970 | } 971 | ], 972 | "source": [ 973 | "print(\"X_train的维度为{},X_test的维度为{}\".format(X_train.shape,X_test.shape))" 974 | ] 975 | }, 976 | { 977 | "cell_type": "markdown", 978 | "metadata": {}, 979 | "source": [ 980 | "## 四、选择模型" 981 | ] 982 | }, 983 | { 984 | "cell_type": "markdown", 985 | "metadata": {}, 986 | "source": [ 987 | "通过以上的特征处理和数据切分,我们将抽样后的数据集随机切分为训练集和测试集,这里,将依据训练集和测试集通过交叉验证的方法来选择模型,以下将对kNN模型、逻辑回归模型和决策树模型进行训练,通过评价指标选择一个较有的模型" 988 | ] 989 | }, 990 | { 991 | "cell_type": "markdown", 992 | "metadata": {}, 993 | "source": [ 994 | "train_test_model和predict_auc方法均是被调用方法,在下面的模型选择的方法中会调用这两个方法" 995 | ] 996 | }, 997 | { 998 | "cell_type": "code", 999 | "execution_count": 25, 1000 | "metadata": {}, 1001 | "outputs": [], 1002 | "source": [ 1003 | "def train_test_model(clf, X_train, y_train, cv_scores, param):\n", 1004 | " \"\"\"\n", 1005 | " 功能:\n", 1006 | " 依据训练集,对模型进行训练,得到交叉验证后的评分值\n", 1007 | " 参数:\n", 1008 | " clf:模型\n", 1009 | " param:模型参数\n", 1010 | " X_train:训练集\n", 1011 | " y_train:训练样本对应的标签\n", 1012 | " cv_scores:字`典类型,将得到的最终评分值存到字典中。\n", 1013 | "\n", 1014 | " \"\"\"\n", 1015 | " # 使用10折交叉验证,roc_auc作为评价指标,对clf进行评分计算,并且对得到的评分计算均值,并且将参数和评分进行打印,\n", 1016 | " # 比如:参数为5,评分均值为0.76342,则打印输出为:参数=5,验证集上的AUC=0.76342\n", 1017 | " val_scores = cross_val_score(clf, X_train, y_train, cv=10, scoring='roc_auc')\n", 1018 | " score_mean = val_scores.mean()\n", 1019 | " print(score_mean)\n", 1020 | " print(\"参数={},验证集上的AUC={}\".format(param, score_mean))\n", 1021 | " \n", 1022 | " \n", 1023 | " \n", 1024 | " # 经过以上操作后,还需要将评分均值存到字典cv_scores中,其中,键为模型参数,值为得到的评分均值\n", 1025 | " # 比如:,则在字典中表现为{5:0.76342}\n", 1026 | " cv_scores[param] = score_mean" 1027 | ] 1028 | }, 1029 | { 1030 | "cell_type": "code", 1031 | "execution_count": 26, 1032 | "metadata": {}, 1033 | "outputs": [], 1034 | "source": [ 1035 | "def predict_auc(model,X_train,y_train,X_test,y_test):\n", 1036 | " \"\"\"\n", 1037 | " 功能:\n", 1038 | " 使用训练数据训练模型,使用训练好的模型对测试数据进行预测,进而得到模型的AUC评分\n", 1039 | " 参数:\n", 1040 | " model:设置了最优参数的模型\n", 1041 | " X_train:训练集\n", 1042 | " y_train:训练数据对应的标签\n", 1043 | " X_test:测试集\n", 1044 | " y_test:测试数据对应的标签\n", 1045 | " 返回值\n", 1046 | " 返回模型的AUC值\n", 1047 | " \"\"\"\n", 1048 | " # 设置最优参数后,对整个训练集进行训练,然后通过predict对测试集进行预测,并将结果存入变量y_pred,使用roc_auc_score()评分方法计算模型的AUC值\n", 1049 | " # 将得到的AUC值进行打印,打印输出格式如:模型AUC值:0.75177973\n", 1050 | " model.fit(X_train, y_train)\n", 1051 | " y_pred = model.predict(X_test)\n", 1052 | " model_auc = roc_auc_score(y_pred, y_test)\n", 1053 | " print('模型AUC值:{}'.format(model_auc))\n", 1054 | "\n", 1055 | " return model_auc\n", 1056 | " " 1057 | ] 1058 | }, 1059 | { 1060 | "cell_type": "markdown", 1061 | "metadata": {}, 1062 | "source": [ 1063 | "### 1.kNN模型\n", 1064 | "k近邻法(k-nearest neighbor,k-NN)是一种基本的分类方法,对于给定的数据集,若输入一个新的实例,在训练数据集中找到与该实例最邻近的k个实例,这个k个实例的多数属于某个类,那么就把该输入实例判定为这个类。" 1065 | ] 1066 | }, 1067 | { 1068 | "cell_type": "code", 1069 | "execution_count": 27, 1070 | "metadata": { 1071 | "scrolled": true 1072 | }, 1073 | "outputs": [ 1074 | { 1075 | "name": "stdout", 1076 | "output_type": "stream", 1077 | "text": [ 1078 | "0.8077509534923599\n", 1079 | "参数=5,验证集上的AUC=0.8077509534923599\n", 1080 | "0.8175383570005726\n", 1081 | "参数=7,验证集上的AUC=0.8175383570005726\n", 1082 | "0.7528462119421223\n", 1083 | "参数=2,验证集上的AUC=0.7528462119421223\n", 1084 | "0.7822092488933985\n", 1085 | "参数=3,验证集上的AUC=0.7822092488933985\n", 1086 | "0.8142750349747395\n", 1087 | "参数=6,验证集上的AUC=0.8142750349747395\n", 1088 | "最优的参数值:7\n", 1089 | "模型AUC值:0.7202998772646505\n" 1090 | ] 1091 | } 1092 | ], 1093 | "source": [ 1094 | "# 这里只对kNN模型的k参数进行选择,为k设定不同的值,进而得到不同的kNN模型,使用不同的kNN模型得到AUC值,进而得到最优模型的评分值\n", 1095 | "knn_parameters = [5,7,2,3,6]\n", 1096 | "knn_cv_scores = {}\n", 1097 | "for param in knn_parameters:\n", 1098 | " knn_clf = KNeighborsClassifier(n_neighbors=param)\n", 1099 | " train_test_model(knn_clf, X_train, y_train,knn_cv_scores,param)\n", 1100 | " \n", 1101 | "knn_best_para=max(knn_cv_scores,key=knn_cv_scores.get)\n", 1102 | "print('最优的参数值:{}'.format(knn_best_para))\n", 1103 | "\n", 1104 | "# 为模型设置最优参数,训练模型,对测试集进行预测,使用roc_auc_score计算模型的AUC值\n", 1105 | "knn_model= KNeighborsClassifier(n_neighbors=knn_best_para)\n", 1106 | "knn_model_auc = predict_auc(knn_model,X_train,y_train,X_test,y_test)" 1107 | ] 1108 | }, 1109 | { 1110 | "cell_type": "markdown", 1111 | "metadata": {}, 1112 | "source": [ 1113 | "### 2.逻辑回归模型\n", 1114 | "逻辑回归模型是一种分类模型,其模型输出的结果处于(0,1)之间,当输出结果大于给定的阈值时,则为A类,小于阈值为B类,由于具体的原理比较复杂,在此只对逻辑回归进行了简单的原理介绍" 1115 | ] 1116 | }, 1117 | { 1118 | "cell_type": "code", 1119 | "execution_count": 28, 1120 | "metadata": {}, 1121 | "outputs": [ 1122 | { 1123 | "name": "stdout", 1124 | "output_type": "stream", 1125 | "text": [ 1126 | "0.8623990756280264\n", 1127 | "参数=1,验证集上的AUC=0.8623990756280264\n", 1128 | "0.8623787155370144\n", 1129 | "参数=3,验证集上的AUC=0.8623787155370144\n", 1130 | "0.8624392095715059\n", 1131 | "参数=5,验证集上的AUC=0.8624392095715059\n", 1132 | "0.8617857855030111\n", 1133 | "参数=10,验证集上的AUC=0.8617857855030111\n", 1134 | "0.8622761466118669\n", 1135 | "参数=15,验证集上的AUC=0.8622761466118669\n", 1136 | "最优的参数值:5\n", 1137 | "模型AUC值:0.7704903918371834\n" 1138 | ] 1139 | } 1140 | ], 1141 | "source": [ 1142 | "# 这里只对逻辑回归模型的参数C设定不同的值,根据不同的值可以得到不同的模型,依据不同模型的评分,选择最优的参数值,进而得到最终模型的评分值\n", 1143 | "lr_parameters = [1,3,5,10,15]\n", 1144 | "lr_cv_scores = {}\n", 1145 | "for param in lr_parameters:\n", 1146 | " lr_clf = LogisticRegression(C=param)\n", 1147 | " train_test_model(lr_clf, X_train, y_train,lr_cv_scores,param)\n", 1148 | " \n", 1149 | "lr_best_para=max(lr_cv_scores,key=lr_cv_scores.get)\n", 1150 | "print('最优的参数值:{}'.format(lr_best_para))\n", 1151 | "\n", 1152 | "# 为模型设置最优参数,训练模型,对测试集进行预测,使用roc_auc_score计算模型的AUC值\n", 1153 | "lr_model= LogisticRegression(C=lr_best_para)\n", 1154 | "lr_model_auc = predict_auc(lr_model,X_train,y_train,X_test,y_test)" 1155 | ] 1156 | }, 1157 | { 1158 | "cell_type": "markdown", 1159 | "metadata": {}, 1160 | "source": [ 1161 | "### 3.决策树模型\n", 1162 | "决策树也是一种分类模型,显而易见,决策树可以理解为将数据按照某种规则生成一颗形似树的结构,以实现对数据的分类。" 1163 | ] 1164 | }, 1165 | { 1166 | "cell_type": "code", 1167 | "execution_count": 29, 1168 | "metadata": { 1169 | "scrolled": true 1170 | }, 1171 | "outputs": [ 1172 | { 1173 | "name": "stdout", 1174 | "output_type": "stream", 1175 | "text": [ 1176 | "0.7183915142959287\n", 1177 | "参数=1,验证集上的AUC=0.7183915142959287\n", 1178 | "0.8312521096369881\n", 1179 | "参数=3,验证集上的AUC=0.8312521096369881\n", 1180 | "0.8561861778610342\n", 1181 | "参数=5,验证集上的AUC=0.8561861778610342\n", 1182 | "0.8021781689000533\n", 1183 | "参数=10,验证集上的AUC=0.8021781689000533\n", 1184 | "0.7493300003167633\n", 1185 | "参数=15,验证集上的AUC=0.7493300003167633\n", 1186 | "最优的参数值:5\n", 1187 | "模型AUC值:0.7760305631026779\n" 1188 | ] 1189 | } 1190 | ], 1191 | "source": [ 1192 | "# 以下将对决策树模型的树的深度进行选择,以得到不同的模型,依据模型的评分值,得到最优的参数值,进而得到最终模型的评分值AUC\n", 1193 | "dt_parameters = [1,3,5,10,15]\n", 1194 | "dt_cv_scores = {}\n", 1195 | "for param in dt_parameters:\n", 1196 | " dt_clf = DecisionTreeClassifier(max_depth=param)\n", 1197 | " train_test_model(dt_clf, X_train, y_train,dt_cv_scores,param)\n", 1198 | " \n", 1199 | "dt_best_para=max(dt_cv_scores,key=dt_cv_scores.get)\n", 1200 | "print('最优的参数值:{}'.format(dt_best_para))\n", 1201 | "\n", 1202 | "# 为模型设置最优参数,训练模型,对测试集进行预测,使用roc_auc_score计算模型的AUC值\n", 1203 | "dt_model= DecisionTreeClassifier(max_depth=dt_best_para)\n", 1204 | "dt_model_auc = predict_auc(dt_model,X_train,y_train,X_test,y_test)" 1205 | ] 1206 | }, 1207 | { 1208 | "cell_type": "markdown", 1209 | "metadata": {}, 1210 | "source": [ 1211 | "对以上三个模型进行分析,决策树模型好,因为模型AUC值值最大。" 1212 | ] 1213 | }, 1214 | { 1215 | "cell_type": "markdown", 1216 | "metadata": {}, 1217 | "source": [ 1218 | "### 数据归一化" 1219 | ] 1220 | }, 1221 | { 1222 | "cell_type": "code", 1223 | "execution_count": 30, 1224 | "metadata": { 1225 | "scrolled": false 1226 | }, 1227 | "outputs": [ 1228 | { 1229 | "data": { 1230 | "text/html": [ 1231 | "
\n", 1232 | "\n", 1245 | "\n", 1246 | " \n", 1247 | " \n", 1248 | " \n", 1249 | " \n", 1250 | " \n", 1251 | " \n", 1252 | " \n", 1253 | " \n", 1254 | " \n", 1255 | " \n", 1256 | " \n", 1257 | " \n", 1258 | " \n", 1259 | " \n", 1260 | " \n", 1261 | " \n", 1262 | " \n", 1263 | " \n", 1264 | " \n", 1265 | " \n", 1266 | " \n", 1267 | " \n", 1268 | " \n", 1269 | " \n", 1270 | " \n", 1271 | " \n", 1272 | " \n", 1273 | " \n", 1274 | " \n", 1275 | " \n", 1276 | " \n", 1277 | " \n", 1278 | " \n", 1279 | " \n", 1280 | " \n", 1281 | " \n", 1282 | " \n", 1283 | " \n", 1284 | " \n", 1285 | " \n", 1286 | " \n", 1287 | " \n", 1288 | " \n", 1289 | " \n", 1290 | " \n", 1291 | " \n", 1292 | " \n", 1293 | " \n", 1294 | " \n", 1295 | " \n", 1296 | " \n", 1297 | " \n", 1298 | " \n", 1299 | " \n", 1300 | " \n", 1301 | " \n", 1302 | " \n", 1303 | " \n", 1304 | " \n", 1305 | " \n", 1306 | " \n", 1307 | " \n", 1308 | " \n", 1309 | " \n", 1310 | " \n", 1311 | " \n", 1312 | " \n", 1313 | " \n", 1314 | " \n", 1315 | " \n", 1316 | " \n", 1317 | " \n", 1318 | " \n", 1319 | " \n", 1320 | " \n", 1321 | " \n", 1322 | " \n", 1323 | " \n", 1324 | " \n", 1325 | " \n", 1326 | " \n", 1327 | " \n", 1328 | " \n", 1329 | " \n", 1330 | " \n", 1331 | " \n", 1332 | " \n", 1333 | " \n", 1334 | " \n", 1335 | " \n", 1336 | " \n", 1337 | " \n", 1338 | " \n", 1339 | " \n", 1340 | " \n", 1341 | " \n", 1342 | " \n", 1343 | " \n", 1344 | " \n", 1345 | " \n", 1346 | " \n", 1347 | " \n", 1348 | " \n", 1349 | " \n", 1350 | " \n", 1351 | " \n", 1352 | " \n", 1353 | " \n", 1354 | " \n", 1355 | " \n", 1356 | " \n", 1357 | " \n", 1358 | " \n", 1359 | " \n", 1360 | " \n", 1361 | " \n", 1362 | " \n", 1363 | " \n", 1364 | " \n", 1365 | " \n", 1366 | " \n", 1367 | "
defaulthousingloanagebalancecampaigndaydurationpdaysprevious
count4441.0000004441.0000004441.0000004441.0000004441.0000004441.0000004441.0000004441.0000004441.000004441.000000
mean0.0144110.4717410.12857541.1868951587.3064632.47421815.548750377.23530753.213240.833146
std0.1191920.4992570.33476612.0270613136.4789552.7517118.417989350.516914111.215892.134833
min0.0000000.0000000.00000018.000000-1965.0000001.0000001.0000004.000000-1.000000.000000
25%0.0000000.0000000.00000032.000000119.0000001.0000008.000000144.000000-1.000000.000000
50%0.0000000.0000000.00000039.000000577.0000002.00000015.000000260.000000-1.000000.000000
75%0.0000001.0000000.00000049.0000001853.0000003.00000021.000000489.00000064.000001.000000
max1.0000001.0000001.00000093.00000081204.00000044.00000031.0000003881.000000854.0000037.000000
\n", 1368 | "
" 1369 | ], 1370 | "text/plain": [ 1371 | " default housing loan age balance \\\n", 1372 | "count 4441.000000 4441.000000 4441.000000 4441.000000 4441.000000 \n", 1373 | "mean 0.014411 0.471741 0.128575 41.186895 1587.306463 \n", 1374 | "std 0.119192 0.499257 0.334766 12.027061 3136.478955 \n", 1375 | "min 0.000000 0.000000 0.000000 18.000000 -1965.000000 \n", 1376 | "25% 0.000000 0.000000 0.000000 32.000000 119.000000 \n", 1377 | "50% 0.000000 0.000000 0.000000 39.000000 577.000000 \n", 1378 | "75% 0.000000 1.000000 0.000000 49.000000 1853.000000 \n", 1379 | "max 1.000000 1.000000 1.000000 93.000000 81204.000000 \n", 1380 | "\n", 1381 | " campaign day duration pdays previous \n", 1382 | "count 4441.000000 4441.000000 4441.000000 4441.00000 4441.000000 \n", 1383 | "mean 2.474218 15.548750 377.235307 53.21324 0.833146 \n", 1384 | "std 2.751711 8.417989 350.516914 111.21589 2.134833 \n", 1385 | "min 1.000000 1.000000 4.000000 -1.00000 0.000000 \n", 1386 | "25% 1.000000 8.000000 144.000000 -1.00000 0.000000 \n", 1387 | "50% 2.000000 15.000000 260.000000 -1.00000 0.000000 \n", 1388 | "75% 3.000000 21.000000 489.000000 64.00000 1.000000 \n", 1389 | "max 44.000000 31.000000 3881.000000 854.00000 37.000000 " 1390 | ] 1391 | }, 1392 | "execution_count": 30, 1393 | "metadata": {}, 1394 | "output_type": "execute_result" 1395 | } 1396 | ], 1397 | "source": [ 1398 | "# 先观察下训练集的特征,查看特征的量纲具有什么特点\n", 1399 | "X_train.describe()" 1400 | ] 1401 | }, 1402 | { 1403 | "cell_type": "markdown", 1404 | "metadata": {}, 1405 | "source": [ 1406 | "回答:从以上的统计信息中,可以知道,特征与特征之间的量纲不一致,因此需要对特征值进行归一化处理。" 1407 | ] 1408 | }, 1409 | { 1410 | "cell_type": "markdown", 1411 | "metadata": {}, 1412 | "source": [ 1413 | "#### 归一化\n", 1414 | "对训练数据和测试集数据做归一化" 1415 | ] 1416 | }, 1417 | { 1418 | "cell_type": "code", 1419 | "execution_count": 31, 1420 | "metadata": {}, 1421 | "outputs": [], 1422 | "source": [ 1423 | "# 使用MinMaxScaler对训练集X_train和测试集X_test进行归一化\n", 1424 | "scaler = MinMaxScaler()\n", 1425 | "X_train_scaled = scaler.fit_transform(X_train.astype('float64'))\n", 1426 | "X_test_scaled = scaler.transform(X_test.astype('float64'))" 1427 | ] 1428 | }, 1429 | { 1430 | "cell_type": "markdown", 1431 | "metadata": {}, 1432 | "source": [ 1433 | "由于以下训练模型的代码以上的代码类似,若是感兴趣的同学,可以将以下给出的kNN模型,逻辑回归模型和决策树模型的代码删除掉,然后尝试自己写代码,完成模型的训练" 1434 | ] 1435 | }, 1436 | { 1437 | "cell_type": "markdown", 1438 | "metadata": {}, 1439 | "source": [ 1440 | "#### kNN模型" 1441 | ] 1442 | }, 1443 | { 1444 | "cell_type": "code", 1445 | "execution_count": 32, 1446 | "metadata": { 1447 | "scrolled": false 1448 | }, 1449 | "outputs": [ 1450 | { 1451 | "name": "stdout", 1452 | "output_type": "stream", 1453 | "text": [ 1454 | "0.8362348007709579\n", 1455 | "参数=5,验证集上的AUC=0.8362348007709579\n", 1456 | "0.844031883023167\n", 1457 | "参数=7,验证集上的AUC=0.844031883023167\n", 1458 | "0.795107142467294\n", 1459 | "参数=2,验证集上的AUC=0.795107142467294\n", 1460 | "0.8216005781567567\n", 1461 | "参数=3,验证集上的AUC=0.8216005781567567\n", 1462 | "0.8418551395164939\n", 1463 | "参数=6,验证集上的AUC=0.8418551395164939\n", 1464 | "最优的参数值:7\n", 1465 | "模型AUC值:0.7633043955039752\n" 1466 | ] 1467 | } 1468 | ], 1469 | "source": [ 1470 | "knn_scaler_parameters = [5,7,2,3,6]\n", 1471 | "knn_scaler_cv_scores = {}\n", 1472 | "for param in knn_scaler_parameters:\n", 1473 | " knn_scaler_clf = KNeighborsClassifier(n_neighbors=param)\n", 1474 | " train_test_model(knn_scaler_clf, X_train_scaled, y_train,knn_scaler_cv_scores,param)\n", 1475 | " \n", 1476 | "knn_scaler_best_para=max(knn_scaler_cv_scores,key=knn_scaler_cv_scores.get)\n", 1477 | "print('最优的参数值:{}'.format(knn_scaler_best_para))\n", 1478 | "\n", 1479 | "# 为模型设置最优参数,训练模型,对测试集进行预测,使用roc_auc_score计算模型的AUC值\n", 1480 | "knn_scaler_model= KNeighborsClassifier(n_neighbors=knn_scaler_best_para)\n", 1481 | "knn_scaler_model_auc = predict_auc(knn_scaler_model,X_train_scaled,y_train,X_test_scaled,y_test)" 1482 | ] 1483 | }, 1484 | { 1485 | "cell_type": "markdown", 1486 | "metadata": {}, 1487 | "source": [ 1488 | "#### 逻辑回归模型" 1489 | ] 1490 | }, 1491 | { 1492 | "cell_type": "code", 1493 | "execution_count": 33, 1494 | "metadata": { 1495 | "scrolled": true 1496 | }, 1497 | "outputs": [ 1498 | { 1499 | "name": "stdout", 1500 | "output_type": "stream", 1501 | "text": [ 1502 | "0.8576315184282677\n", 1503 | "参数=1,验证集上的AUC=0.8576315184282677\n", 1504 | "0.8610214502261355\n", 1505 | "参数=3,验证集上的AUC=0.8610214502261355\n", 1506 | "0.8617095314058393\n", 1507 | "参数=5,验证集上的AUC=0.8617095314058393\n", 1508 | "0.8620709066888409\n", 1509 | "参数=10,验证集上的AUC=0.8620709066888409\n", 1510 | "0.8622355155684189\n", 1511 | "参数=15,验证集上的AUC=0.8622355155684189\n", 1512 | "最优的参数值:15\n", 1513 | "模型AUC值:0.7691338914433311\n" 1514 | ] 1515 | } 1516 | ], 1517 | "source": [ 1518 | "lr_scaler_parameters = [1,3,5,10,15]\n", 1519 | "lr_scaler_cv_scores = {}\n", 1520 | "for param in lr_scaler_parameters:\n", 1521 | " lr_scaler_clf = LogisticRegression(C=param)\n", 1522 | " train_test_model(lr_scaler_clf, X_train_scaled, y_train,lr_scaler_cv_scores,param)\n", 1523 | " \n", 1524 | "lr_scaler_best_para=max(lr_scaler_cv_scores,key=lr_scaler_cv_scores.get)\n", 1525 | "print('最优的参数值:{}'.format(lr_scaler_best_para))\n", 1526 | "\n", 1527 | "# 为模型设置最优参数,训练模型,对测试集进行预测,使用roc_auc_score计算模型的AUC值\n", 1528 | "lr_scaler_model= LogisticRegression(C=lr_scaler_best_para)\n", 1529 | "lr_scaler_model_auc = predict_auc(lr_scaler_model,X_train_scaled,y_train,X_test_scaled,y_test)" 1530 | ] 1531 | }, 1532 | { 1533 | "cell_type": "markdown", 1534 | "metadata": {}, 1535 | "source": [ 1536 | "#### 决策树模型" 1537 | ] 1538 | }, 1539 | { 1540 | "cell_type": "code", 1541 | "execution_count": 34, 1542 | "metadata": { 1543 | "scrolled": true 1544 | }, 1545 | "outputs": [ 1546 | { 1547 | "name": "stdout", 1548 | "output_type": "stream", 1549 | "text": [ 1550 | "0.7183915142959287\n", 1551 | "参数=1,验证集上的AUC=0.7183915142959287\n", 1552 | "0.8312521096369881\n", 1553 | "参数=3,验证集上的AUC=0.8312521096369881\n", 1554 | "0.8566254157311546\n", 1555 | "参数=5,验证集上的AUC=0.8566254157311546\n", 1556 | "0.8051230940708141\n", 1557 | "参数=10,验证集上的AUC=0.8051230940708141\n", 1558 | "0.748734443042695\n", 1559 | "参数=15,验证集上的AUC=0.748734443042695\n", 1560 | "最优的参数值:5\n", 1561 | "模型AUC值:0.7767016204516204\n" 1562 | ] 1563 | } 1564 | ], 1565 | "source": [ 1566 | "dt_scaler_parameters = [1,3,5,10,15]\n", 1567 | "dt_scaler_cv_scores = {}\n", 1568 | "for param in dt_scaler_parameters:\n", 1569 | " dt_scaler_clf = DecisionTreeClassifier(max_depth=param)\n", 1570 | " train_test_model(dt_scaler_clf, X_train_scaled, y_train,dt_scaler_cv_scores,param)\n", 1571 | " \n", 1572 | "dt_scaler_best_para=max(dt_scaler_cv_scores,key=dt_scaler_cv_scores.get)\n", 1573 | "print('最优的参数值:{}'.format(dt_scaler_best_para))\n", 1574 | "\n", 1575 | "# 为模型设置最优参数,训练模型,对测试集进行预测,使用roc_auc_score计算模型的AUC值\n", 1576 | "dt_scaler_model= DecisionTreeClassifier(max_depth=dt_scaler_best_para)\n", 1577 | "dt_scaler_model_auc = predict_auc(dt_scaler_model,X_train_scaled,y_train,X_test_scaled,y_test)" 1578 | ] 1579 | }, 1580 | { 1581 | "cell_type": "markdown", 1582 | "metadata": {}, 1583 | "source": [ 1584 | "#### 将未归一化和归一化后的数据得到的模型AUC值进行合并" 1585 | ] 1586 | }, 1587 | { 1588 | "cell_type": "code", 1589 | "execution_count": 35, 1590 | "metadata": {}, 1591 | "outputs": [ 1592 | { 1593 | "data": { 1594 | "text/html": [ 1595 | "
\n", 1596 | "\n", 1609 | "\n", 1610 | " \n", 1611 | " \n", 1612 | " \n", 1613 | " \n", 1614 | " \n", 1615 | " \n", 1616 | " \n", 1617 | " \n", 1618 | " \n", 1619 | " \n", 1620 | " \n", 1621 | " \n", 1622 | " \n", 1623 | " \n", 1624 | " \n", 1625 | " \n", 1626 | " \n", 1627 | " \n", 1628 | " \n", 1629 | " \n", 1630 | " \n", 1631 | " \n", 1632 | " \n", 1633 | " \n", 1634 | "
Not Scaled (%)Scaled (%)
kNN0.7203000.763304
LR0.7704900.769134
DT0.7760310.776702
\n", 1635 | "
" 1636 | ], 1637 | "text/plain": [ 1638 | " Not Scaled (%) Scaled (%)\n", 1639 | "kNN 0.720300 0.763304\n", 1640 | "LR 0.770490 0.769134\n", 1641 | "DT 0.776031 0.776702" 1642 | ] 1643 | }, 1644 | "execution_count": 35, 1645 | "metadata": {}, 1646 | "output_type": "execute_result" 1647 | } 1648 | ], 1649 | "source": [ 1650 | "col_name = ['Not Scaled (%)', 'Scaled (%)']\n", 1651 | "row_name = ['kNN','LR','DT']\n", 1652 | "# 创建dataframe结构的变量models_auc_df,列索引设置为col_name,行索引设置为row_name,将未归一化和归一化的AUC值按照索引存放到对应的位置,\n", 1653 | "# 其中未归一化模型的AUC值分别为knn_model_auc,lr_model_auc,dt_model_auc,归一化后的模型AUC值分别为knn_scaler_model_auc,lr_scaler_model_auc,dt_scaler_model_auc\n", 1654 | "# 然后将数据models_auc_df的数据进行打印\n", 1655 | "\n", 1656 | "models_auc_df = pd.DataFrame([[knn_model_auc,knn_scaler_model_auc],[lr_model_auc,lr_scaler_model_auc],[dt_model_auc,dt_scaler_model_auc]],\n", 1657 | " columns=col_name,index=row_name)\n", 1658 | "models_auc_df" 1659 | ] 1660 | }, 1661 | { 1662 | "cell_type": "code", 1663 | "execution_count": 36, 1664 | "metadata": { 1665 | "scrolled": true 1666 | }, 1667 | "outputs": [ 1668 | { 1669 | "data": { 1670 | "text/plain": [ 1671 | "(array([0, 1, 2]), )" 1672 | ] 1673 | }, 1674 | "execution_count": 36, 1675 | "metadata": {}, 1676 | "output_type": "execute_result" 1677 | }, 1678 | { 1679 | "data": { 1680 | "text/plain": [ 1681 | "
" 1682 | ] 1683 | }, 1684 | "metadata": {}, 1685 | "output_type": "display_data" 1686 | }, 1687 | { 1688 | "data": { 1689 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXYAAAD6CAYAAAC1W2xyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAFxVJREFUeJzt3X94lXX9x/HnOyC3AENxgYqGICqKYLRMCvwejKnIpJr8UBDSBPyBadmVWlJRUqkVfS9/O90XFRVDwHUJqVRKWmE5EAgEUWTIZoM5pmvKiPT9/eMcGc4zzr2zsx98eD2ui8v7nPM+9/0ex+u1m8/53J/b3B0REQnHJ9q6ARERySwFu4hIYBTsIiKBUbCLiARGwS4iEhgFu4hIYBTsIiKBUbCLiARGwS4iEpiObXHQww47zHv37t0WhxYR2W+tWLHiLXfPSVXXJsHeu3dvSkpK2uLQIiL7LTPbEqUu0lCMmRWZ2XIzm9HI64eY2e/NrMTM7mlKoyIiklkpg93MCoAO7j4E6GNm/ZKUTQIedvdcoKuZ5Wa4TxERiSjKGXsMmJ/YXgoMTVJTBQwws27AUcDWjHQnIiJNFmWMvTNQntjeAQxOUvMXYBRwFbA+UfcRZjYNmAZw9NFHp9OriGTA7t27KSsro66urq1bkUZkZWXRq1cvOnXqlNb7owR7LZCd2O5C8rP8HwOXuXuNmV0DXAwU7l3g7oUfPpebm6tF4EXaSFlZGV27dqV3796YWVu3Iw24O1VVVZSVlXHMMcektY8oQzErqB9+GQSUJqk5BDjZzDoAXwQU3CLtVF1dHd27d1eot1NmRvfu3Zv1L6oowV4MTDKz2cA4YJ2ZzWpQ8wviZ+PvAIcC89LuSERanEK9fWvu55My2N29hvgXqC8Aw919tbvPaFDzD3c/yd27uHueu9c2qysRkQCUlpYmff71119v0eNGmsfu7tXuPt/dK1q0GxGRQNx8882sXLnyI89t27aN3bt388QTT/DII4+02LHb5MpTEWk/el+/JKP7K71p1J7tNWVvf+z1H37nCnJ69OSq63/EXbNvAuDya65Puq8N6/4JwAknndzo8d57711+cNWl1LxdTf8ju/HgrTc2eSgjNmYqyxbcG6l25q/vJjYkl9iXcuGIzyWtKS0tZevWrVx33XXcfvvtPPDAAyxbtoynn36ayZMnc/XVVzNx4kTOPfdcunbt2qReo1CwS7uX6eBJZe9gkpaxaN6DXPrta1PWvRIh2Bcv/C2DPv8FLr78am699mJKVr/MF045KWO9pmPu3LlMnz4dgFWrVjFlyhRefPFFOnfuvKfmwgsvpLi4mEmTJmX8+Ap2kYZmfrqVj/dO6x6vHeh7fH+WFD+25/F/du3ih9dcQeW2CnocfgQ//fUd3Dn7FzzzVPyX+uJFv+XeR3+XdF+f6Xk4Rbc/yhln53Pfr34EQF3dLi76zo8p+9d2uh3chfn33MwHHzhjpn2Pd9/bybG9j2LOb36SdH/v7dzJ5Kt+xPaqHZx8wrHc8fPvU/12DWMvvZb33/8Ax4kN2ffF9Zs2baJ///5AfPri7t27Wbp0KTNm1H89edppp3HjjTcq2PcbCgaRfTr/G1OYc9etDDsjD4CF8x7g2OP7c/MdRdw1+yYe/+1DXH39j+ndJ76CyVfHTWh0X7G8keyqq+OaqZP4w5cH8ZuZ36Xw4UUMOvE4Hr3rJub89nes3bCJQz59MN+6+HxGDPsiZ0+8km2VVfTI6f6x/RU+tIgBJ/Rl5nd/ScGU77Lm5Y08+ezfyB8xjG9PnUje+Zc36Wc988wzmTNnDqNHj2b06NHccMMNDB8+nOzsbHbu3NmkfUWl9dhFpNV1z/kMxxzbj5LlfwHg9Y2vcPLn4mfBAz+Xy+bXNkbe15bNm/hy7CvMf/p5KquqeWjh79nwWimnJoZjLho3mi+cchKdOnXkvnnFTLzyBna8/Q4763Yl3d8rm7bw+JPPEhszlde3lFNeUcnmN8oZdOJxAOQOOjFlT9nZ2dTWxicHjh8/npkzZ9KtWzdGjRrFwoULAdi8eTNHHXVU5J+zKRTsItImLpxyBSUv/BWAvsedwJqV8aW817xUQt/jTgDgoKws6na+B8SHNJJZNO9BnnlqCR06dGDA8X2p27WLE47tzYurXwbg57cWcd8jj1M0r5gxo77CvDt/TudPZSfdF8DxfT/Lt6dOYNmCe5l17RUcfWRPjj6yJ+s2bgJg1bpXUv5s55xzzp4AB3j11Vfp27cvBx10EB988AEA8+fPJz8/P+W+0qFgF5E20X/AQHJP+zIABRdMZtPG9Vx83jm8sXkTXx0bH3oZMmw4f3pqMd/4+lms/Pvfku5n4jcv43ePPcIlY/P5x6p1TBoziqkTvs7Kf24gNmYqK9duYNJ5o8g7/Yv84vY5nDHuUgDKK7Yn3d/UiQU8+czfOL3gEu6eu4CjjujBtAsLWLjkGWJjplLz73dT/mz5+fksWbKE7du3U1NTQ8+ePTnxxBMpLCxkxIgRbNy4kfLycgYOHJjOX11K1thvwZaUm5vrQd9oQ2PsGdXqs2KyGh/PbRGt/PmtX79+zxd7LS3ZdMeWNPATm1v1eI1Nd4T4mjzPPfccEyZ8/P+nu+++mwkTJnDwwQc3+v5kn5OZrUgsj75P+vJURKQF9OrVK2moA1x22WUtemwNxYiIBEbBLiISGAW7iEhgFOwiIgmlW99M+vzrW8pauZPmUbCLiAA333E/K/+5/iPPbausiq/G+IfneOTxJ9uos6bTrBiRA12mp+emmL6592qMPY/sxc/+9+4mr8Z4ydh8ih5bHK2dvVdjbETp1jfZ+mYF102/iNvnPMoDjy1m2YJ7eXrZciaPzefqKROYeOUNnJt3Ol27dG50P+3FARHsrT8PulUPJ7Jf2Xs1xpnfu4p1q19iwCmD27SnuQuWMP2icQCsWreRKRd8jRdXrfvIFaoXFpxD8VPPMmlMy1wtmkkaihGRVvWZnofzzFNL2LJ5EzN/eSsDThnMrro6rr3im3yj4GyuvGg8O3e+x3vv1nL5hWO4qGAkP7xmeqP727nzPb576Te4+LxzmP6DXwBQ/XYNI8ZfxvAx01i2fEXKnjZtKaN/vz5AYjXG//6XpX9ezsgzvrSn5rTBJ/PS2tTLCbQHCnYRaVWxvJFcOOVyrpk6iZt+dB3vv/8+Cx55gONPHMADi55ixMjRvLZhPZXbt3HBxVMpnFfMm2VvUFWZfAmAhQ/HV4acs/D3/Gv7W6x5eSOFDy8if8Qwnl1QSKeOTRuYOPP001j8x+fpdXgPRl/0HZ7964sAZGcd1OjCYe1NpGA3syIzW25mMxp5/XIzW5b4s8rM7slsmyISir1XY6yueosli+ZT+tpGBpzyeSC+RO+AUwbTsWMnFs2by/evmkbN29XU1dUl3V/ppld55qnFXDI2P/3VGLMOovbd+GJj4796FjOvuZRun+7KqK8MZeHv/wTA5q3lHHVEj0z8FbS4lMFuZgVAB3cfAvQxs34Na9z9LnePuXsMeB6Ido8pETng7L0a47HH92fXrjp6H3sca1fH7w96322/ZtG8B3n80bnkjRrNTbffR/anPtXo/nr37cfESy6n6LHF6a/GeMaXWbjkT3sev7r5Dfp+thcHffKTfPBBfD2t+U/8gfwRw5rzo7eaKGfsMWB+YnspMLSxQjM7Eujh7h9b4cvMpplZiZmVVFZWptOriARg79UY165aSf554znvgslsWLuGS8bms37tGvILxjNkWIyi23/D1PGjAdhekXyOecGEyfx12R+5+Lxz0l+NMe90lvzpL2x/awc1/66l52cO48Tj+lD48CJGDDuVjZu2UF6xnYGJfwW0dylXdzSzIuBWd19tZmcCg939pkZqfw78wd2f3dc+W3t1R60OuH/T55dZWt0xubI3t/Hc31cy4esjP/ba3Q8uYMLXz+bgrl0++sI+Vndsruas7hjljL0W+HDOT5fG3mNmnwCGA8si7FNEpF3pdUSPpKEOcNnkMR8P9XYsSrCvoH74ZRBQ2kjdMODv3hYLvIuIyB5Rgr0YmGRms4FxwDozm5Wk7izguUw2JyItQ+df7VtzP5+UEzzdvcbMYkAecIu7VwCrk9T9oFmdiEiryMrKoqqqiu7duzf5Un5pee5OVVUVWVnpX8Ieaea+u1dTPzNGRPZjvXr1oqysjNaYnbatemeLH2Nv662VZ9y9sz51TRqysrLo1atX2u8/INaKEZF6nTp14phjjmmVY43UjKY2oSUFREQCo2AXEQmMgl1EJDAKdhGRwCjYRUQCo2AXEQmMgl1EJDAKdhGRwCjYRUQCo2AXEQmMgl1EJDAKdhGRwCjYRUQCo2AXEQmMgl1EJDAKdhGRwEQKdjMrMrPlZjYjRd2dZnZuZloTEZF0pAx2MysAOrj7EKCPmfVrpG4Y0NPdn8hwjyIi0gRRzthj1N/vdCkwtGGBmXUC7gVKzeyrGetORESaLEqwdwbKE9s7gB5JaiYDLwO3AKea2bcaFpjZNDMrMbOS1riJrojIgSpKsNcC2YntLo2853NAobtXAA8BwxsWuHuhu+e6e25OTk66/YqISApRgn0F9cMvg4DSJDWvAX0S27nAlmZ3JiIiaekYoaYYeN7MjgBGAueb2Sx333uGTBHwf2Z2PtAJGJP5VkVEJIqUwe7uNWYWA/KAWxLDLasb1PwbGNsiHYqISJNEOWPH3aupnxkjIiLtmK48FREJjIJdRCQwCnYRkcAo2EVEAqNgFxEJjIJdRCQwCnYRkcAo2EVEAqNgFxEJjIJdRCQwCnYRkcAo2EVEAqNgFxEJjIJdRCQwCnYRkcAo2EVEAqNgFxEJjIJdRCQwkYLdzIrMbLmZzWjk9Y5m9oaZLUv8OTmzbYqISFQpg93MCoAO7j4E6GNm/ZKUDQTmuXss8eefmW5URESiiXLGHqP+RtZLgaFJak4D8s3sH4mz+4/dJNvMpplZiZmVVFZWpt2wiIjsW5Rg7wyUJ7Z3AD2S1LwIjHD3U4FOwDkNC9y90N1z3T03Jycn3X5FRCSFj51ZJ1ELZCe2u5D8l8Ead9+V2C4Bkg3XiIhIK4hyxr6C+uGXQUBpkpq5ZjbIzDoAXwNWZ6Y9ERFpqijBXgxMMrPZwDhgnZnNalDzU2AusApY7u5/zGybIiISVcqhGHevMbMYkAfc4u4VNDgjd/e1xGfGiIhIG4syxo67V1M/M0ZERNoxXXkqIhIYBbuISGAU7CIigVGwi4gERsEuIhIYBbuISGAU7CIigVGwi4gERsEuIhIYBbuISGAU7CIigVGwi4gERsEuIhIYBbuISGAU7CIigVGwi4gERsEuIhIYBbuISGAiBbuZFZnZcjObkaKuh5m9lJnWREQkHSmD3cwKgA7uPgToY2b99lH+KyA7U82JiEjTRTljj1F/I+ulwNBkRWZ2BvAuUNHI69PMrMTMSiorK9NoVUREoogS7J2B8sT2DqBHwwIz+yTwQ+D6xnbi7oXunuvuuTk5Oen0KiIiEUQJ9lrqh1e6NPKe64E73f3tTDUmIiLpiRLsK6gffhkElCapGQFMN7NlwClmdl9GuhMRkSbrGKGmGHjezI4ARgLnm9ksd98zQ8bdT/9w28yWufuUzLcqIiJRpAx2d68xsxiQB9zi7hXA6n3UxzLWnYiINFmUM3bcvZr6mTEiItKO6cpTEZHAKNhFRAKjYBcRCYyCXUQkMAp2EZHAKNhFRAKjYBcRCYyCXUQkMAp2EZHAKNhFRAKjYBcRCYyCXUQkMAp2EZHAKNhFRAKjYBcRCYyCXUQkMBkLdjM71MzyzOywTO1TRESaLlKwm1mRmS03sxmNvH4IsBg4FXjWzHIy2KOIiDRBymA3swKgg7sPAfqYWb8kZQOBa9z9Z8DTwODMtikiIlFFOWOPUX+/06XA0IYF7v5nd3/BzE4nfta+PGMdiohIk0QJ9s5AeWJ7B9AjWZGZGTAeqAZ2J3l9mpmVmFlJZWVlmu2KiEgqUYK9FshObHdp7D0eNx1YA4xO8nqhu+e6e25OjobgRURaSpRgX0H98MsgoLRhgZldZ2aTEw+7AW9npDsREWmyKMFeDEwys9nAOGCdmc1qUFOYqHkO6EB8LF5ERNpAx1QF7l5jZjEgD7jF3SuA1Q1qqhOvi4hIG0sZ7LAnuOenLBQRkTanJQVERAKjYBcRCYyCXUQkMAp2EZHAKNhFRAKjYBcRCYyCXUQkMAp2EZHAKNhFRAKjYBcRCYyCXUQkMAp2EZHAKNhFRAKjYBcRCYyCXUQkMAp2EZHAKNhFRAKjYBcRCUykYDezIjNbbmYzGnn902b2pJktNbPHzeyTmW1TRESiShnsZlYAdHD3IUAfM+uXpGwiMNvdzwQqgLMz26aIiEQV5WbWMepvZL0UGAq8uneBu9+518McYHvDnZjZNGAawNFHH51GqyIiEkWUoZjOQHliewfQo7FCMxsCHOLuLzR8zd0L3T3X3XNzcnLSalZERFKLcsZeC2QntrvQyC8DMzsUuA04LzOtiYhIOqKcsa8gPvwCMAgobViQ+LL0MeD77r4lY92JiEiTRQn2YmCSmc0GxgHrzGxWg5pLgMHADWa2zMzGZ7hPERGJKOVQjLvXmFkMyANucfcKYHWDmruAu1qkQxERaZIoY+y4ezX1M2NERKQd05WnIiKBUbCLiARGwS4iEhgFu4hIYBTsIiKBUbCLiARGwS4iEhgFu4hIYBTsIiKBUbCLiARGwS4iEhgFu4hIYBTsIiKBUbCLiARGwS4iEhgFu4hIYBTsIiKBUbCLiAQmUrCbWZGZLTezGfuo6WFmz2euNRERSUfKYDezAqCDuw8B+phZvyQ1hwAPAJ0z36KIiDRFlDP2GPU3sl4KDE1S8z4wHqhpbCdmNs3MSsyspLKysql9iohIRFGCvTNQntjeAfRoWODuNe7+zr524u6F7p7r7rk5OTlN71RERCKJEuy1QHZiu0vE94iISBuJEtIrqB9+GQSUtlg3IiLSbFGCvRiYZGazgXHAOjOb1bJtiYhIujqmKnD3GjOLAXnALe5eAaxupDaW0e5ERKTJUgY7gLtXUz8zRkRE2jF9ESoiEhgFu4hIYBTsIiKBUbCLiARGwS4iEhgFu4hIYBTsIiKBUbCLiARGwS4iEhgFu4hIYBTsIiKBUbCLiARGwS4iEhgFu4hIYBTsIiKBUbCLiARGwS4iEphIwW5mRWa23MxmNKdGRERaXspgN7MCoIO7DwH6mFm/dGpERKR1RDljj1F/v9OlwNA0a0REpBVEuZl1Z6A8sb0DGJxOjZlNA6YlHtaa2StNa3X/YXAY8FarHfAn1mqHOhDo89t/HQCf3WejFEUJ9logO7HdheRn+Slr3L0QKIzS1P7OzErcPbet+5D06PPbf+mzi4syFLOC+qGVQUBpmjUiItIKopyxFwPPm9kRwEjgfDOb5e4z9lFzWuZbFRGRKFKesbt7DfEvR18Ahrv76gahnqzmncy3ul85IIacAqbPb/+lzw4wd2/rHkREJIN05amISGCijLFLEmZ2EYC73594fD/wprv/wMxmJsp6N3zO3Wci7Ubic3nN3R9KPL6f+ASAOqAMmODuu9usQdmnxOd1CvBf4B7i0wFHAMcDq4Hvu/vyNmuwjeiMPbOmmllWhOekfftW4irqWuIhIe3blcBZwI+JXyh5PrDC3WMHYqiDgr3ZzOwkM3sW6AqsBSY2KEn2nLRzZmbEr8n4T1v3Iqm5exWwBDi9rXtpDxTszXM48DBwAfBv4A7g0gY1yZ6T9u024tdibAOeadtWpAmqgG5t3UR7oGBvniuJj8N+eJlvBbCB+NRP9vGctG/fAu4CNrmmje1PDiW+pMkBT8HePDcClyf++6HfAP/ToC7Zc9K+3QNcYmYd2roRSc3MuhG/OFL/wkLB3lx17r6V+Bn5aAB3fwn4895FyZ6TduWnZlZiZiVAPoC7VxMPifPatDOJ4jbgKeA6d9/Q1s20B7pASUQkMDpjFxEJjIJdRCQwCnYRkcAo2EVEAqNgFxEJjIJdRCQwCnYRkcD8P9xwrhKhMJvhAAAAAElFTkSuQmCC\n", 1690 | "text/plain": [ 1691 | "
" 1692 | ] 1693 | }, 1694 | "metadata": { 1695 | "needs_background": "light" 1696 | }, 1697 | "output_type": "display_data" 1698 | } 1699 | ], 1700 | "source": [ 1701 | "# 为未归一化和归一化的数据绘制分组柱状图,可视化的图和下图相似即可\n", 1702 | "# 对models_auc_df数据进行可视化,要求设置图例到右下角,标题为未归一化和归一化数据的模型AUC值比较\n", 1703 | "\n", 1704 | "#解决图例中文显示问题,设置字体样式\n", 1705 | "plt.rcParams['font.sans-serif']=['SimHei'] \n", 1706 | "plt.rcParams['axes.unicode_minus']=False \n", 1707 | "\n", 1708 | "#创建画布\n", 1709 | "plt.figure(figsize=(20,12), dpi=120)\n", 1710 | "\n", 1711 | "models_auc_df.plot(kind='bar')\n", 1712 | "\n", 1713 | "plt.xticks(rotation=360)" 1714 | ] 1715 | }, 1716 | { 1717 | "cell_type": "markdown", 1718 | "metadata": {}, 1719 | "source": [ 1720 | "依据对未归一化和归一化后的数据训练模型,通过对模型的AUC值进行比较,可以发现:归一化能有效提高KNN模型的AUC值,但是对逻辑回归和决策树的影响不大。" 1721 | ] 1722 | } 1723 | ], 1724 | "metadata": { 1725 | "kernelspec": { 1726 | "display_name": "Python 3", 1727 | "language": "python", 1728 | "name": "python3" 1729 | }, 1730 | "language_info": { 1731 | "codemirror_mode": { 1732 | "name": "ipython", 1733 | "version": 3 1734 | }, 1735 | "file_extension": ".py", 1736 | "mimetype": "text/x-python", 1737 | "name": "python", 1738 | "nbconvert_exporter": "python", 1739 | "pygments_lexer": "ipython3", 1740 | "version": "3.6.8" 1741 | } 1742 | }, 1743 | "nbformat": 4, 1744 | "nbformat_minor": 2 1745 | } 1746 | -------------------------------------------------------------------------------- /images/Random Forest.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/teamowu/Machine-Learning/adc76bda05eb8a1265e3e733f8e4c4d89b43893e/images/Random Forest.png -------------------------------------------------------------------------------- /images/jaccard系数.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/teamowu/Machine-Learning/adc76bda05eb8a1265e3e733f8e4c4d89b43893e/images/jaccard系数.png -------------------------------------------------------------------------------- /images/余弦相似性.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/teamowu/Machine-Learning/adc76bda05eb8a1265e3e733f8e4c4d89b43893e/images/余弦相似性.png -------------------------------------------------------------------------------- /images/曼哈顿距离.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/teamowu/Machine-Learning/adc76bda05eb8a1265e3e733f8e4c4d89b43893e/images/曼哈顿距离.png -------------------------------------------------------------------------------- /images/欧式距离.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/teamowu/Machine-Learning/adc76bda05eb8a1265e3e733f8e4c4d89b43893e/images/欧式距离.png -------------------------------------------------------------------------------- /关联分析/README.md: -------------------------------------------------------------------------------- 1 | # 关联分析 2 | 寻找最终能够**解释数据变量之间关系的规则**,来找出大量多源数据集中有用的关联规则,它是从大量数据中发现多种数据之间关系的一种方法。 3 | 另外,它也可以基于时间序列对多种数据间的关系进行挖掘。 4 | 5 | ## 常见的关联算法 6 | - Apriori 7 | - FP-Growth 8 | - PrefixSpan 9 | - SPADE 10 | - AprioriAll 11 | - AprioriSome 12 | 13 | ## 典型的销售应用场景 14 | - 购物篮分析 15 | - 优化商品布局,e.g.超市可以把关联度高的商品摆放在一起,便于顾客一起挑选。 16 | - 设计促销方案,e.g.两种关联度高的商品一起搭配购买可以享受价格优惠。 17 | - 快速商品推荐,通常在电商业务中使用。e.g.顾客浏览某一商品,页面上会推荐“经常一起购买的产品”或者90%的顾客也看了如下商品“等规则进行推荐。 18 | 19 | ## 关联分析中的关键指标 20 | - 支持度(support): 21 | - 置信度(confidence) 22 | - 提升度(Lift):当Lift>1, 应用关联规则比不应用关联规则能产生更好的结果;反之,规则具有负相关的作用,该规则为无效规则。 23 | - 做关联规则评估时,需要综合考虑支持度、置信度和提升度3个指标,支持度和置信度的值越大越好。 24 | 。 25 | **频繁规则 & 有效规则**: 26 | - 频繁规则:关联结果中支持度和置信度都比较高的规则 27 | - 有效规则:关联规则真正能促进规则中的前/后项的提升。 28 | - 频繁规则 != 有效规则 29 | 30 | ## 关联分析的更多应用场景 31 | **相同维度下的关联分析**: 32 | - 网站页面浏览关联分析 33 | - 广告流量关联分析 34 | - 用户关键字搜索关联分析 35 | 36 | **跨维度的关联分析**: 37 | - 不同场景下关联分析 38 | - 相同场景下的事件分析 39 | -------------------------------------------------------------------------------- /分类算法/README.md: -------------------------------------------------------------------------------- 1 | # 2.2 如何选择分类分析算法 2 | - 文本分类: **朴素贝叶斯**,如电子邮件中的垃圾邮件识别。 3 | - 若训练集较小,选择高偏差且低方差的分类算法效果更好,如**朴素贝叶斯、支持向量机,因为这类算法不容易过拟合**。 4 | - 如果关注的是算法模型的计算时间和模型易用性,那么支持向量机、人工神经网络不是好的选择。 5 | - 如果重视算法的准确率,应选择精度较高的方法,如**支持向量机或GBDT、XGBOOST等基于Boosting的集成方法**。 6 | - 如果注重效果的稳定性或模型鲁棒性,那么应选择**随机森林、组合投票模型等基于Bagging的集成方法**。 7 | - 如果想得到有关预测结果的概率信息,然后**基于预测概率做进一步的应用,那么使用逻辑回归是比较好的选择**。 8 | - 如果**担心离群点或数据不可分并且需要清晰的决策规则,那么选择决策树**。 9 | -------------------------------------------------------------------------------- /回归分析/README.md: -------------------------------------------------------------------------------- 1 | # 回归分析 2 | 如何选择回归分析算法? 3 | - 简单线性回归。适合数据集本身结构简单、分布规律有明显线性关系的场景。 4 | - 自变量数量少或降维后得到了可以使用的二维变量(包括预测变量)可以直接通过散点图发现自变量和因变量的相互关系,然后选择最佳回归方法。 5 | - 如果经过基本判断发现自变量间有较强的共线性关系,那么可以使用对多重共线性(自变量高度相关)能灵活处理的算法,例如岭回归。 6 | - 如果**数据集噪音较多,推荐使用主成分回归**,因为各主成分回归通过对参与回归的主成分的合理选择,可以去掉噪音。另外,对各个主成分间相互正交,能解决多元线性回归中的 7 | 共线性问题。这些都能有效地提高模型的抗干扰能力。 8 | - 如果高维度变量下,使用正则化回归方法效果最好,例如Lasso,Ridge和ElasticNet;或者使用逐步回归从中挑选出影响显著的自变量来建立回归模型。 9 | - 如果要同时验证多个算法,并想从中选择一个来做好的拟合,可以使用交叉检验做多个模型的效果对比,并通过R-square,Adjusted R-squre,AIC,BIC以及各种残差、误差 10 | 项指标做综合评估。 11 | - 如果注重模型的可解释性,那么容易理解的线性回归,指数回归,对数回归,二项或多项式回归要比核回归,支持向量机等更适合。 12 | - 集成或组合回归方法。一旦确认了几个方法,但又不确定该如何取舍,可以将多个回归模型做集成或组合方法使用,即同时将多个模型的结果通过加权、均值等方式确定最终输出结果值。 13 | -------------------------------------------------------------------------------- /聚类分析/README.md: -------------------------------------------------------------------------------- 1 | # 如何选择聚类分析算法 2 | 聚类算法有几十种之多,聚类算法的选择主要参考一下因素: 3 | - 如果数据量是高维的,那么选择**谱聚类**,它是子空间划分的一种。 4 | - 如果数据量为中小规模,例如**100万条以内**,k均值将是比较好的选择;如果**超过100万条**,可以考虑使用**MiniBatchKmeans**; 5 | - 数据集中有**噪点**(离群点),那么使用基于密度的**DBSCAN**可以有效应对这个问题。 6 | - 如果追求更高的**分类准确度**,那么选择**谱聚类**将比K均值准确度更好。 7 | -------------------------------------------------------------------------------- /聚类分析/客户特征的聚类与探索性分析.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 项目背景\n", 8 | "\n", 9 | "某天,业务部门拿到了一些关于客户的数据找到数据部门,苦于没有分析入手点,希望数据部门通过对这些数据的分析,给业务部门一些启示,或者提供后续分析或业务思考的建议。\n", 10 | "\n", 11 | "基于上述场景和需求,本次分析的交付需求如下:\n", 12 | "- 这是一次EDA任务,且业务方没有任何先验经验提供给数据部门。\n", 13 | "- 分析结果用于业务的知识启发或后续分析的深入应用。\n", 14 | "- 除数据统计和基本展示类的探索性分析以外的数据挖掘。\n", 15 | "\n", 16 | "#### 数据源特征:\n", 17 | "- USER_ID:用户ID列,整数型。该列作为用户唯一ID标志,这意味着该列不能作为聚类的特征,而只能作为用户聚类后的所属类的标记。\n", 18 | "- AVG_ORDERS:平均用户订单数量,浮点型。\n", 19 | "- AVG_MONEY:平均订单价值,即每单的订单价格,浮点型。\n", 20 | "- IS_ACTIVE:是否活跃,通过其他模型得到的结果,字符串型。\n", 21 | "- SEX:性别,以0,1,2来表示性别未知、男和女3个值。\n", 22 | "\n", 23 | "#### 分析思路:\n", 24 | "- 字符串型特征不能直接作训练,因为sklearn的对象一般都是数值型的向量矩阵或稀疏矩阵,而不能是原生字符串。\n", 25 | "- SEX本质是分类型变量,不能直接参与距离计算。\n", 26 | "- AVG_ORDERS和AVG_MONEY具有明显的量纲差异,需要作无量纲化处理。\n", 27 | "- 分割ID列。" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 57, 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "#导入包\n", 37 | "import pandas as pd\n", 38 | "import numpy as np\n", 39 | "import matplotlib.pyplot as plt\n", 40 | "%matplotlib inline\n", 41 | "from sklearn.preprocessing import MinMaxScaler\n", 42 | "from sklearn.cluster import KMeans\n", 43 | "from sklearn.metrics import calinski_harabaz_score,silhouette_score" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 15, 49 | "metadata": {}, 50 | "outputs": [], 51 | "source": [ 52 | "raw_data = pd.read_csv('cluster.txt')\n", 53 | "#数值型特征\n", 54 | "numeric_feature = raw_data.iloc[:,1:3]" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 18, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "name": "stdout", 64 | "output_type": "stream", 65 | "text": [ 66 | "[[0.64200477 0.62591687]\n", 67 | " [0.91169451 0.80440098]\n", 68 | " [0.69451074 0.39608802]\n", 69 | " ...\n", 70 | " [0.3221957 0.17359413]\n", 71 | " [0.42004773 0.31295844]\n", 72 | " [0.64916468 0.40831296]]\n" 73 | ] 74 | } 75 | ], 76 | "source": [ 77 | "#标准化\n", 78 | "scaler = MinMaxScaler()\n", 79 | "scaled_numeric_feature = scaler.fit_transform(numeric_feature)\n", 80 | "print(scaled_numeric_feature[:,:2])" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": 25, 86 | "metadata": {}, 87 | "outputs": [ 88 | { 89 | "data": { 90 | "text/plain": [ 91 | "KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,\n", 92 | " n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',\n", 93 | " random_state=0, tol=0.0001, verbose=0)" 94 | ] 95 | }, 96 | "execution_count": 25, 97 | "metadata": {}, 98 | "output_type": "execute_result" 99 | } 100 | ], 101 | "source": [ 102 | "#训练模型\n", 103 | "n_cluster = 3\n", 104 | "model_kmeans = KMeans(n_clusters = n_cluster, random_state=0)\n", 105 | "model_kmeans.fit(scaled_numeric_feature)" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": 27, 111 | "metadata": {}, 112 | "outputs": [ 113 | { 114 | "name": "stdout", 115 | "output_type": "stream", 116 | "text": [ 117 | "sample: 1000 \t features: 4\n" 118 | ] 119 | } 120 | ], 121 | "source": [ 122 | "#模型效果评估\n", 123 | "n_samples,n_features = raw_data.iloc[:,1:].shape #总样本数,总特征数\n", 124 | "print('sample: %d \\t features: %d' % (n_samples,n_features))" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 31, 130 | "metadata": {}, 131 | "outputs": [ 132 | { 133 | "name": "stdout", 134 | "output_type": "stream", 135 | "text": [ 136 | "\n", 137 | " unspuervised_score: \n", 138 | " ------------------------------------------------------------\n", 139 | " silh c&h\n", 140 | "0 0.634086 2860.821834\n" 141 | ] 142 | } 143 | ], 144 | "source": [ 145 | "#非监督式评估方法\n", 146 | "silhouette_s = silhouette_score(scaled_numeric_feature, model_kmeans.labels_, metric='euclidean')\n", 147 | "calinski_harabaz_s = calinski_harabaz_score(scaled_numeric_feature, model_kmeans.labels_) # Calinski和harabaz得分\n", 148 | "unspuervised_data = {'silh':[silhouette_s], 'c&h':[calinski_harabaz_s]}\n", 149 | "unspuervised_score = pd.DataFrame.from_dict(unspuervised_data)\n", 150 | "print(\"\\n\",'unspuervised_score:', '\\n', '-'*60)\n", 151 | "print(unspuervised_score)" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "上述结果中,显示了聚类的效果还不错。以silh为例,当其值>0.5时,说明聚类质量较优。优秀与否的基本原则是不同类别间是否具有显著的区分效应。" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 35, 164 | "metadata": {}, 165 | "outputs": [ 166 | { 167 | "name": "stdout", 168 | "output_type": "stream", 169 | "text": [ 170 | " USER_ID AVG_ORDERS AVG_MONEY IS_ACTIVE SEX labels\n", 171 | "0 1 3.58 40.43 活跃 1 2\n", 172 | "1 2 4.71 41.16 不活跃 1 2\n", 173 | "2 3 3.80 39.49 不活跃 2 1\n", 174 | "3 4 2.85 38.36 不活跃 1 0\n", 175 | "4 5 3.71 38.34 活跃 1 1\n" 176 | ] 177 | } 178 | ], 179 | "source": [ 180 | "#合并数据和特征\n", 181 | "kmeans_labels = pd.DataFrame(model_kmeans.labels_, columns = ['labels'])\n", 182 | "#组合原始数据和标签\n", 183 | "kmeans_data = pd.concat([raw_data, kmeans_labels], axis=1)\n", 184 | "print(kmeans_data.head())" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 41, 190 | "metadata": {}, 191 | "outputs": [ 192 | { 193 | "name": "stdout", 194 | "output_type": "stream", 195 | "text": [ 196 | " record_count record_rate\n", 197 | "labels \n", 198 | "0 332 0.332\n", 199 | "1 337 0.337\n", 200 | "2 331 0.331\n" 201 | ] 202 | } 203 | ], 204 | "source": [ 205 | "#计算不同聚类类别的样本量和占比\n", 206 | "label_count = kmeans_data.groupby(['labels'])['SEX'].count()\n", 207 | "label_count_ratio = label_count / kmeans_data.shape[0]\n", 208 | "kmeans_record_count = pd.concat([label_count,label_count_ratio], axis=1)\n", 209 | "kmeans_record_count.columns = ['record_count', 'record_rate']\n", 210 | "print(kmeans_record_count.head())" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": 44, 216 | "metadata": {}, 217 | "outputs": [ 218 | { 219 | "name": "stdout", 220 | "output_type": "stream", 221 | "text": [ 222 | " AVG_ORDERS AVG_MONEY\n", 223 | "labels \n", 224 | "0 2.022349 38.980602\n", 225 | "1 3.987389 39.028754\n", 226 | "2 3.958610 40.996254\n" 227 | ] 228 | } 229 | ], 230 | "source": [ 231 | "#计算不同聚类类别数值型特征\n", 232 | "kmeans_numeric_features = kmeans_data.groupby(['labels'])['AVG_ORDERS', 'AVG_MONEY'].mean()\n", 233 | "print(kmeans_numeric_features)" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": 52, 239 | "metadata": {}, 240 | "outputs": [], 241 | "source": [ 242 | "#计算不同聚类类别分类型特征\n", 243 | "active_list = []\n", 244 | "sex_gb_list = []\n", 245 | "unique_labels = np.unique(model_kmeans.labels_)\n", 246 | "for each_label in unique_labels:\n", 247 | " each_data = kmeans_data[kmeans_data['labels']==each_label]\n", 248 | " active_list.append(each_data.groupby(['IS_ACTIVE'])['USER_ID'].count()/each_data.shape[0])\n", 249 | " sex_gb_list.append(each_data.groupby(['SEX'])['USER_ID'].count()/each_data.shape[0])\n", 250 | "\n", 251 | "kmeans_active_pd = pd.DataFrame(active_list)\n", 252 | "kmeans_sex_gb_pd = pd.DataFrame(sex_gb_list)\n", 253 | "kmeans_string_features = pd.concat((kmeans_active_pd,kmeans_sex_gb_pd), axis=1)\n", 254 | "kmeans_string_features.index = unique_labels" 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": 53, 260 | "metadata": {}, 261 | "outputs": [], 262 | "source": [ 263 | "#合并所有类别的分析结果" 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": 55, 269 | "metadata": {}, 270 | "outputs": [ 271 | { 272 | "name": "stdout", 273 | "output_type": "stream", 274 | "text": [ 275 | " record_count record_rate AVG_ORDERS AVG_MONEY 不活跃 活跃 \\\n", 276 | "0 332 0.332 2.022349 38.980602 0.487952 0.512048 \n", 277 | "1 337 0.337 3.987389 39.028754 0.495549 0.504451 \n", 278 | "2 331 0.331 3.958610 40.996254 0.504532 0.495468 \n", 279 | "\n", 280 | " 0 1 2 \n", 281 | "0 0.003012 0.990964 0.006024 \n", 282 | "1 0.014837 0.014837 0.970326 \n", 283 | "2 0.984894 0.009063 0.006042 \n" 284 | ] 285 | } 286 | ], 287 | "source": [ 288 | "features_all = pd.concat((kmeans_record_count,kmeans_numeric_features,kmeans_string_features), axis=1)\n", 289 | "print(features_all.head())" 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": 59, 295 | "metadata": {}, 296 | "outputs": [ 297 | { 298 | "data": { 299 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAs0AAAH5CAYAAAB6TAOnAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAgAElEQVR4nOzde3xU1bn/8c/DVe43FVpRUrGiuAuIoICCoSpesN6KoFaKimLBctoi3ioq+aktltZaherxUqucUlH0KOApGJUICBIQuWxRQEFRW1Ek3FECrN8fewdCmCSTZGb2JPN9v17zYmbNXms/e1iZeWbN2mubcw4RERERESldragDEBERERFJd0qaRURERETKoaRZRERERKQcSppFRERERMqhpFlEREREpBxKmkVEREREyqGkWUREREQOYmY3m9l6M1trZhea2TVmtsvMvix2O9rM7jCzv4d1WptZgZm1ijj8pFDSLJIgZjbZzP4Y3l9gZpcXe+73ZvbX8P5AM/vczNaY2ZlxtHuDmf07fIO6KSwrevPaEN7+aIGi8gIzW2RmvcPts81sd4k3u15mNtbMtpvZV2b2hZndXGy/B71hJvr1kkAy+o2Z/d3MtoV94ggzc2Y2NnyutP70tZnVL1Z/bBn95kkzu6XY/l4ys+sS/uLIQcL/l18Ue1zhv9Hi/a1Y2Tlmti5sq6hPjA7/v3eZ2Zbwft8yYonZd0tLtKryOkjymVlXYDDQAbgMeAqoC7zinGtT7PYZ8ChwQZgo3wBMcs59E1XsyVQn6gBEapC+wJfh/TeA3sAL4eMzgIfMrA0wATgdaAC8YmbHOef2xmrQzDoADwA9gD3AYjN7M3z6FefcFWZ2ODAHmFtUDvwM+DnwqpkdF5Yvcc71KNF+P2CCc+52M8sC8s3sdaA2B94wOwCzzKytc66wEq+LlC3h/SbUGDgG+EFRQTn96XCCD8d/lmgnVr/ZDMwwsz8B3wdOA66M+4ilykokNRX5Gy3e3zCzFsD/AD8BPgTeM7M3nHN/BP5owQjiO865x8ppN2bfBZoQvldV5PgkcicBXzvndgFLzewu4LBYGzrnNpvZJGAYMJSgj9VIGmkWSQAzOwlYDzQPv22/SfABQjh6dwqQR/DBNMc5t8Y5txzYCZxYRtMXArOcc6udc2uBmUD/4hs45zYCrxN8cBaV7XXOPQ28S5AIlcs59wnwTtjO/jdM59xSoNQ3TKm8JPYbgI8J/h9PCu9D2f1pK3BjPHE751YCq8P2fgE87pz7Lp66kjAV/huN0d8ALgYWOefynXNbgVwql/SU1nelepoD9DCzR83s+865x4FtZWz/Z+AOYEH4WVIjKWkWSYyzCD4g5hJ84LwN/NDMmgGnAh+GP1edCKwrVu8+YEsZ7R4LfFrs8Xogq/gG4YdfNrAqRv2lwAnxHICZHQN0C9s55A3TOVfWG6ZUTrL6DYDPgaTZD8vK6k+vA8ebWVz9BXgQGAVcDfw1zjqSOJX5Gy3Z3wA8Dn7v+B0wrRLxlNZ3pRpyzn0K9ALaA6vNbEj41MXFptnMLVblC2AjkJ/iUFNKSbNIYvwYeCu8nRWOur1D8KbTm+CnS4BmBKOEADjn/hnOCSvNYUDxEbzdBD/PQ/jmBXxFMGIY64NuO8HP9ABdi73ZfVJsm1+a2VfAGmC8c25ZGW+YkljJ6jcAHxAkzCdyICkqqz8VEsxbHFainZj9xjn3GtAKyHXOfV3+oUoiVfJv9KD+FpY1J3ifKGp3vXPui0rEU1rfhdITLUljzrkVzrl+wFUE85brc/Cc5t7FNr8IKABGmFmNzS1r7IGJpIqZ1Qb6AE8D4zjwYVT0c2XxD5DdQP3w5KwvzWyzmZ1dRvM7Ofgn1/ocSJ5eAb5H8DN5rnPOxajfiAMfiEuKvdllFdtmAsFo43bg1aLCkm+Y4XxYSZAk9xuAjwgSqobArrCsrP4E8DjByHHxbUrrNxBM/3m3nDgkSSryN1pGfysk6AdF211mZudXMqRYfRdKT7QkTZnZfUVfxJxz04DZBPPTS3NbeFsD/DT5EUZDSbNI1Z0CrHPOtXbOHQE0Ds8Of4NgZKc7B07SWwf8wDn3tXOuDUHCUdYJuWs5eDpGO4r9TB8myhOAm0qp3wl4v7wDcM7tBP4GjIBS3zB/VF47UiHJ7DcAe4GmBD+ZFimvP31GMFpY2aRJUqQSf6Ol9bePCKbtFLmAYiePVlCsvivV03rgWjNrYGZHEvStmFPCLFilqalz7nXgLwTJc42kpFmk6n4MLCz2eGFYtphgPvEHzrkd4XMvA+ea2YnhG1GXctqeAfQzsw5mdizQj2KjwaFngN7hnGQAzKyWmV0dtv9SnMcxARhsZo2I/Ya5LM52JD7J7DdFVhFM0ygST396lCDZlvRW0b/R0vrbS8DZZvYjM2sNnEflT+CL1Xelevobwa+YHxP0lRyCXyWKT7X50sx+SpAkPwL7p201jOOXsGpJS86JVN1ZBEs2FVlIMD/1GTN7i+BkPACcc6vNbAQwC/iGA6saxOScW2PBerh5BF9y73TOrTKznsW22W5mzxKsYrCa4Gz4bwhO/jrXOfeNmUE4N7VY8+NK7OtTM5tDsFzd3whOCvyYYA5sjnNuTbwviMQlaf2mmA+Az4C2YTvl9qdwH8VPOjyk3zjnHopz/5I8Ff0bLau/DQamEkzluT9cHaXCnHN7S/bd0MUl+tBNzrkXK7MPSQ3n3B4OPb8B4O8xyg76v3TOdUxGTOnAYk+DFBERERGRIhppFkkDJUZhinzlnOuU8mCk2lC/kZLUJ0SSRyPNIiIiIiLl0ImAIiIiIiLlUNIsIiIiIlIOzWmWGu3www93WVlZUYchFfTuu+9uDNeSTRn1lepJfUXiEUU/AfWV6qisvqKkWWq0rKwsFi9eHHUYUkFm9mmq96m+Uj2pr0g8ougnoL5SHZXVVzQ9Q0RERESkHEqaSzCzsWa23cy+MrMvzOxmM7vGzHaVuArO0eH2V4fbfWxmlxZr5+6wjc/M7LLS2i5RvtnMZpuZV9EYSzw/2cz+WOzx6DDmXWa2Jbzft6zjEhEREZEDlDTHNsE5dyRwOsHlIZsBrzjn2hS7fWZmHYA/An3C2wQza2lm5wDXEVzW9DLgaTNrGattM+tcVA4cAbwJ/J+Z1a9IjMXaAehLcPUnAJxzf3TOtQGmALeF8c8Onz7kuCr8aomIiIjUcEqay+Cc+wR4Bygtgb0SmOqc+9g59wWwAjgNuAj4p3Nug3NuEfA+8ONS2u5QrKzQOXcvsIcg8a1IjB0AzOwkYD3Q3MxaxXWgIiIiIlImJc1lMLNjgG7Ad6Vs4gGrij0eCSwCjgWKTyRfD2SV0nbx+kWWASdUMMaids4C8oC5xJl4i4iIiEjZlDTH9ksz+wpYA4wHtgAXF5v3OzfcrjmwvaiSc26Nc24jcBgHJ9q7gQax2nbOLYux/+1A44rEWKydHwNvhbezSqtcTKzjEhEREZFilDTHNoFgZHg78GpYVnzub++wrJBiUzfM7AYzOxXYSZA4F6kflpXWdkmNKJaMxxujmdUmmFv9NDCO+JLmWMclIiIiIsUoaS6Fc24n8DdgRBmbfUQwFaPIIOBwYC0HT8doB6yrQNs/IpgHXdEYTwHWOedahwtzN9ZqGCIiIiJVp6S5bBOAwQQjv7FMAa4ys6PClTROJjgpbxpwhZm1NrPTgBOB2SXqTgAGm9n+ts2sjpndDhjBvOS4Ywzb+TGwsNhzCylxAqKIiIiIVJyS5jI45z4F5hBMwyg+9/dLM/upc24ecDfBSXf/AoY65zY5594AngR84EXgGudcQSlt/yws+iXwNXAmcK5zrrCCMf6MYDpGyaS5vCkahxxXPPsVERFJhQ0bNlBYGNdHoqShzz6rOSvZKmkuwTk31jl3e7HHlzrnHnfONSixnvGL4fN/c84dG95eLlbvXufcEc65ts65/y2n7bHOucbOuRbOufOdcx9XMsZznHPPFCsf55z7ebHH1zjnHiv2+O+lHZeIiEhF7Nmzh2OOOYbs7Gyys7NZsWIFO3fupEuXLmXW27JlC+effz79+vXj0ksvZffu3UyYMIHu3buzY8cOZs2aRd26dVN0FJlr3bp19O/fn969e3PzzTcf8rgsGzZsoHfvA6dF3XHHHZx77rk455g9u+QP7dWXkuY0VmIEuOi2POq4RERESlq+fDlXXnkleXl55OXl0bFjRwYOHMjmzZvLrPePf/yDUaNG8dprr9GmTRtmzpzJ0qVLuf7661m0aBGNGpU2Q1IS6bbbbuOuu+5i7ty5fP7555x22mkHPc7Ly4tZr6CggCFDhrBjx479ZV9//TVdu3blvffe45hjjknRESSfkuY0VmIEuOjWKeq4RERESnrnnXeYMWMGp556KkOHDmXPnj08/vjjZGVllVlvxIgRnHPOOUCQbB155JE45ygsLOS1117j/PPPT0H0snr1arp27QrAkUceSaNGjQ56vGXLlpj1ateuzZQpU2jatOn+Mucce/bsYc6cOZx55pnJDz5FlDSLiIhIlXXv3p3XX3+d/Px8CgsLmTVrFt///vfjrr9gwQIKCgro0aMH/fr1Y8aMGbRt25aLLrqoRv3En64GDBhATk4O06dPZ+bMmQwdOvSgx2edFfsUqaZNm9KsWbODyjzP49NPP6VWrVr06dOHDz74IBWHkHR1og5AREREqr9OnTpRv35w6YJu3bqxZs2auOtu2rSJkSNH8uKLwWk1gwYNol27dqxdu5b+/fvz4osv0revLnKbTGPGjGHevHmMHz+eIUOGHPK4cePyrrl2wG9+8xuOO+44NmzYwGWXXcarr77KiSeemMToU0MjzSIiNcymTZvIzc1l48aNUYciKZIOK0wMHjyYZcuWsXfvXl5++WU6d+4cV73du3dz+eWX8/vf/5527drtL1+zZg3t27enfv367Nu3L1lhSzFdunRh/fr1jBo1Kubjiti8eTNNmjSpUf9/SppFRGqQgoICLrzwQvLz8+nbty+LFi2K+wx4SZ0NGzZw8sknU1BQwAUXXEC3bt248cYbS90+1nbptsLE3XffzeDBg+nSpQs9e/bk7LPPPmSblStXMmbMmIPKnnrqKZYsWcL9999PdnY2U6ZMYevWrbRp04aOHTvy+OOPx2xLEm/8+PGMGjWKhg0bxnwc6/8vltWrV9O5c2dOPfVUHnnkkRozr9mcc1HHIJI03bp1c4sXL446DKkgM3vXOdctlfusKX3lrbfeon79+vTo0YPRo0fz1FNP8a9//YsePXowaNAghg8fTnZ2dtRhJkx17SuDBw9m0aJFjBgxglatWvGzn/2Mq666ilGjRtGt26GH8/DDDx+y3WOPPUb37t3p0KED33zzDT/9qZbZL00U/QRqzvtKJimrr2hOs4hIDVI0ojNnzhzy8/Np0aJFXGfAS+q8+eabNGrUiDZt2tCqVSt832fz5s189tlnHH300THrxNqu+AoT8Yz+iUjVaHqGiEgN45xjypQptGjRgmuvvTauM+AlNXbv3s29997LuHHjADjjjDP49NNPefjhhznxxBNp2bJlzHqxttMKEyKppaRZRKSGMTMmTpxIp06dOP744zn//PN58sknK3wGvCTeuHHjGDFiBM2bNwcgJyeHxx57jLvvvpsTTjiBp59+Oma9WNsNGjSIsWPH0rx58/0rTIhI8ihpFhGpQR544AGeffZZIDh7vXnz5lU6A14S6/XXX2fixIlkZ2ezdOlS1q1bx4oVK9i7dy8LFy7EzGLWKygoiLmdVpgQSR3NaRYRqUGGDRvGwIEDefLJJ/E8j379+jF27NiDzoCX6MyZM2f//ezsbB544AGuvfZaPv30U3r27MmVV17JypUrmTx5Mvfdd9/+be+4445Dtiu+wsSNN97I3XffnZSYs25/NSntJtMn4/pHHUK1ker/3+r8f6OkWUSkBmnRogW5ubkHleXk5EQUjZQlLy8PgPfff/+g8o4dOx6UMAOceuqph2wH7L/89NKlS5MTpIjsp+kZIiIiIiLlUNIs+5lZXiXrdTGzLgmM4ykzW2BmWkNJRERE0oKmZ6SQ5dgPgU5AVrFbO6At0IDg/8OA3cC3wGZgLfBxePsovK1097g9KQ2+bEUJc5V/HzSzy4DazrmeZvY3M/uhc25NVdsVERERqQolzUlkOXYUcFZ4+zFBchyPBuGtBfCDsH5xWy3H3gReA2a5e9zaCsVldhjw9zCezcBA59zOEtuMBfKcc3lmdk1YPAV4AWgKfANcDtwLXBrWGeycO8vMGgLPAkcCK5xzN4XP5wGLgE7OuXNLCS8beD68/xpwBqCkWURERCKlpDnBLMeOAK4FrgFOTNJumgKXhDcsx9YCrwCPuXvc6jjqDwOWOeeuMLNrAQ/Ij6NeR2Cfc66PmV0ENHbO3WFmqwCcc38v1r7vnBtrZi+ZWSfn3HKgB/Cwc+6WMvbRCPgivL8J6BpHXCIZQWe5p6/qtsKE/m9FKk5Jc4JYjmUDNwKXAfVSvPtjgd8Av7YcywUmAjPcPa60RTtPAIpWwf97HO03AHYBSwDfzF4jGP2dWcr2HYBeZpYNNAeOApYTJNIvlbOv7eH+ABqjefciIiKSBpSQVIHlWC3Lsastxz4AZgNXkPqE+aCQgH4Eo84fW46NthxrEGO7D4Hu4f3fAtfH2GY3cER4/7zw387A2865fgRTR3qH5buAhgAWrLi/CnjIOZcNjAHWh9ttj+MY3iWYklG0v0/iqCMiIiKSVEqaK+naf17bF3gPmEQwcptusoDxwCrLsZ9bzkGXmXoC6BrOMe5KcAwlTQNGmtljBPOXIUhg/8vM5gNtgMVheS5wmZm9TZBIPwGcb2ZzgF8An1Ug7peBwWb2IDAQqF6/eYqIiEiNpOkZFeT7fhbw0Chv1Fmvrn9159fffh11SOU5GngGGG45Ntzd45Y653YRJKQHCUeGi+77QJ8Y7R1yAp9zbhNwdoniMtsvjXNuazit4xzgD865LeXVEREREUk2jTRXgO/7w4CVwMVm1nhCrwmroo6pAnoAiy3H/mA5VjfqYCBYTaPE7RUA51yBc+5559yXUccoIiIiAhppjovv+42A/wZ+Vrz8xOYnnt6pZadVyzct7xBNZBVWG7gF6GU5NtDd4/4dZTDxjDyLiIiIpAONNJfD9/2TCNYW/lnJ58ys1iO9Htl5aK20dzrwnuVY36gDEREREakOlDSXwff9qwjWLy51veWW9Vue/NOsny5MXVQJcySQazl2e9SBiIiIiKQ7Jc2l8H3/euB/CJdSK8tvu/y2dR2rU5j8qBKuNvB7y7FHog5EREREJJ0paY7B9/0bgccJ1j0uV73a9bJu63zb/ORGlVS/tBz7S9RBiIiIiKQrJc0l+L4/AniUOBPmIoOOHdSleb3mm5ITVUr8l+XYn6MOQkRERCQdKWkuxvf94QSXoK5QwgxgZs0e6vnQisRHlVK/thx7MOogRERERNKNkuaQ7/vZQJXm9nZt1fX0Ds06fJyYiCLzG8uxX0YdhIiIiEg6UdIM+L7/feA5ghPjKs3M6kw8fWLypmjsBD4GdpRTVpbtwN5yt/qT5dipFQ1PREREpKbK+KTZ9/06wPNA60S017pB6+7ntT3v3US0dZBdwGTgC4KLYu8opSyWhQSnNe4GPiKerwb1gOctx1pWNWwRERGRmiDjk2ZgPMHFPhLm3m73NqtFrfLHcytiA3Au0AdoD/ynlLJYvgS6EiTX9eLeYztgkuVYhed3i4iIiNQ0GZ00+75/AfDrRLd7WO3DjvuV96u3E9poFnA08AlB8tu2lLLS7CWYxnFchfZ6AXBzxQIVERERqXkyNmn2fb8BMCFZ7V9z/DUdm9RtsiWhjTrgfaABB6ZYxCorqT2wGmgK/BNYV6G95liOHVOJaEVERERqjIxNmoHfAj9IVuO1rNbhfzj1D+8ltFED+hPMvl5VRllJHpANHAYcD6ys0F4bAg9VOFYRERGRGiQjk2bf948BRid7P6e3Pr1Xu8bt1ieksXnA0vD+twQJcKyy0mwCWhKMRrsK7/1Sy7HsCtcSERERqSEyMmkG7qPsFDMhzKzeo6c/+u+ENHYKsBz4G7CPYMpFrLJYvgUaA0cA7wLHViqCP+ikQBEREclUdaIOINV83z8BuDpV+zu68dE9erfpvWzul3M7V6mhBsDP4yiL5TAOJNTDKx1Bd+BS4KVKtyAiIiJSTWXiSPMIKnGZ7KoYf+r4ulRmUkT60ZUCRUREJCNlVNLs+34j4hubTahGdRt1vKHDDYldgi4afS3HTog6CBEREZFUy6ikGfgZ0CyKHd/U8ab2h9U+bGcU+06wyk/wEBEREammMi1pHhHVjmvXqv29+7vdnx/V/hNoiOVYw6iDEBEREUmljEmafd8/DajayXhVdM5R55zapkGb0i52XV00A66IOggRERGRVMqYpBm4MOoAzKzhX0//69qo40iAn0QdgIiIiEgqZVLSfHbUAQAc1/S4Xt0O71axa/Kln2zLsdIu2i0iIiJS42RE0uz7fjOCdYYjZ2b2UM+H9kQdRxU1B7pFHYSIiIhIqmRE0gz0JbiAdFpoVq9Zp6vaX7Ug6jiq6JyoAxARERFJlUxJmtMuwRvdaXTburXqfhd1HFWQFtNdRERERFIhU5Lm06IOoKS6teoefdfJd1Xn0ebTLMdSemVFERERkahkStLcPuoAYrmk3SWntKrf6uuo46ikw4DvRR2EiIiISCrU+KTZ9/3mBCeupR0za/JIr0c+jDqOKjg26gBEREREUqHGJ83AD6IOoCxeC6/XSS1OWhN1HJWU1q+tiIiISKJkQtIc92joli1bmD9/PgUFBZXe2caNGyksLIx7ezOrPbHXxG2V3mG0NNIsIiIiGSETkua4RkO3bNnCTTfdhO/7XHfddWzatGn/cxs3buTyyy8vte7kyZO54oor2LlzJ/Pnz6du3boVCrDVYa26Xtzu4vwKVUoPGmkWERGRjJAJSXNc85lXr17NLbfcwrBhwzj99NP54IMP9j/3pz/9iW+//bbUuqtWreKyyy7j/fffp0GDBpUK8u6T7z6ijtWJf4g6PTSNOgARERGRVMiEpDmuYd/u3bvTuXNnFi9ezIoVK+jcuTMACxcupEGDBhx++OGl1nXOsWfPHubPn88ZZ5xRqSDr1a73g9GdRs+vVOXo1Is6ABEREZFUqBN1ACkQ9zE655g5cyZNmzalTp06FBYW8t///d889NBD/OpXvyq1Xq9evXj55ZfJzs5m5MiRDBs2jFNPPbXCgV7Z/krvPzv/87bDVYv1j7/b+92/o46hJtqyZQtXXHEFe/fupVGjRkyZMoV69Q79fjJ06FBWrlxJ//79GTNmTNz1REREpOIyIWneF++GZsaYMWN45JFHyMvLY926dQwaNIimTcuehXDeeefxve99j88//5w+ffqQm5tbqaS5ltVaObrT6N4VrhidLVEHUBP94x//YNSoUZxzzjkMHz6cmTNnctFFFx20zUsvvcTevXtZsGAB1113HWvWrCE3N7fceiIiIlI5mZA0745no6eeeoojjjiCiy66iG3bttGkSRPeeecd8vPzee6551i1ahX33HMPOTk5MeuvX7+edu3asW3bNpxzlYlzB3B8ZSpGKK7XVipmxIgR++9//fXXHHnkkYdsk5eXx8CBAwHo168f8+bNi6ueiIiIVE4mzGku/Qy+YgYMGMD06dMZMmQI+/bto1evXjzzzDM8/fTTPP3003To0KHUhHn79u20atWK9u3bM3XqVHr06FGZOBcBrStTMUK7og6gJluwYAEFBQUx+9OOHTs46qijAGjZsiUbNmyIq56IiIhUTiaMNH8ez0bNmjXjiSeeKPX5p59+utTnGjduTK9evQCYOnVqBcMD4Eug4vM5orc+6gBqqk2bNjFy5EhefPHFmM83btyYXbuC7yzbt29n3759cdUTERGRysmEkeaPow4gDh8BDaMOohLWRh1ATbR7924uv/xyfv/739OuXbuY25xyyinMmzcPgGXLlpGVlRVXPREREakcJc3RWwX0ijqISloXdQA10VNPPcWSJUu4//77yc7OJicnhzFjxhy0zSWXXMKkSZMYNWoUzz//PP379z+k3pQpUyI6AhERkZonE6Zn/JtgXvNhUQdSih1U3y8vSpqTYPjw4QwfPrzMbZo2bUpeXh65ubnceuutNGvWLK56IiIiUjnVNVmLm+d5jvSdRrAI6Bp1EJW0D/g06iAyWYsWLRg4cCBt2rSJOhQREZEar8YnzaGVUQcQwx6gZdRBVMGHnudpyTkRERHJCJmSNL8ZdQAxLADaRx1EFbwedQAiIiIiqZIpSfOsqAMoYRtwQtRBVJGSZhEREckYmXAiIJ7nrfV9/2PSZ2T3XSA76iCqYA+QF3UQNUXW7a+mdH+fjOuf0v2JiIjUBJky0gzpM9r8BVDdL9WW73netqiDEBEREUkVJc2p9wnpu/xdvGZGHYCIiIhIKmVS0vwa8E3EMayk+l7IpMhe4O9RByEiIiKSShmTNHue9y3wZMRhFAIWcQxV9arneZ9FHYSIiIhIKmVM0hx6lGCkNAoLgc4R7TuRHo06ABEREZFUy6ik2fO8T4EZEey6EGhd1Ua2bNnC/PnzKSgoKLOsLBs3bqSwsLCyIawlfeaGi4iIiKRMRiXNoQkR7HM+kFWVBrZs2cJNN92E7/tcd911bNq0KWZZLJMnT+aKK65g586dzJ8/n7p161Y2jMfCy5KLiIiIZJSMWKe5OM/zXvd9fwnQNUW73AL8qKqNrF69mltuuYXOnTuzdetWPvjgA+rVq3dI2emnn35I3VWrVnHZZZfx/vvv06BBg8qG8BWamiEiIiIZKhNHmgFuTuG+lgItq9pI9+7d6dy5M4sXL2bFihV07tw5Zlkszjn27NnD/PnzOeOMMyobwv/zPG97pQ9AREREpBrLyKTZ87w84JUU7OozEnghE+ccM2fOpGnTptSpU6fUspJ69erFnDlzaN26NSNHjiQ/P7+iu14FPF6l4EVERESqsYxMmkO/AXYleR+fA/UT1ZiZMWbMGI4//njy8vJKLSvpvPPOY/jw4TRp0oQ+ffqQm2BL5toAACAASURBVJtb0V3f5Hlepc8eFBEREanuMjZp9jxvHfD/kriLFUDPRDX21FNPMW3aNAC2bdtGkyZNYpaVZv369Rx99NHUq1cP5yp0Lt8Uz/PeqELoIiIiItVexibNoT8BFZ6rEIUBAwYwffp0hgwZwr59++jVq1fMsli2b99Oq1ataN++PVOnTqVHj7hnjHwKDE/UMYiIiIhUVxm3ekZxnucV+r4/AHgXOCKBTS8ggaPMAM2aNeOJJ54otyyWxo0b70+op06dGu8udwOXe54X3wLQIiIiIjVYpo80E14S+goSd6XA3UDbBLUVpZs9z1sUdRAiIiIi6SDjk2YAz/PeBH6boOYWAEcnqK2oPO95XhQXgRERERFJS0qaQ57n/QGIe+5CKTYBXRIQTpQWA9dHHYSIiIhIOlHSfLCrgRlVqL8CaJagWKLwLnCO53nbog5EREREJJ0oaS7G87zvgMuA/61E9U+A2MtXVA/vESTMm6MORERERCTdKGkuIbyIx0DghQpW3QDUTXxEKbEMOFsrZYiIiIjEpqQ5Bs/z9gBXAv+Is8oy4LTkRZRUbwFneZ63KepARERERNKVkuZSeJ631/O8q4GbgbIuIe2oviPM4wkS5m+iDkREREQknSlpLofneQ8CvQnmLMcyH+iYsoASYxswwPO8Wz3PS9T61CIiIiI1lpLmOHietxA4GXi5xFPfAlkpD6hqfKC753kvRh2IiIiISHWhpDlOnudt9jzvUmAYUDSd4R3gqOiiqpBdwJ1AV8/zVkUdjIiIiEh1oqS5gjzPewL4IfBnoFPE4cTrReAkz/N+F64OIiIiIiIVoKS5EjzPK/A8bxTQA5hCcDJgOpoDnOF53gDP89ZFHYyIiIhIdaWkuQo8z1vjed4VgAf8BUiHdY53AU8CXTzPO9PzvLejDkhERESkuqsTdQA1ged5K4Ff+75/OzCAYN5z7xSH8THwGPCULlIiIiIiklhKmhPI87xvgf8B/sf3/R8C5wJ9gWygZYJ3t4vgwiSzgJme532Y4PZFREREJKSkOUk8z1sDrAEm+L5vBCcN9gVOAdoBxwBtgdpxNPcl8FF4WwMsAuaGSbqIiIiIJJmS5hTwPM8RXGp7WfFy3/drEyxZdwzQgCCBrgV8R7AG9DZgred521MasIiIiIgcRElzhMKr8a0PbyIiIiKSprR6hoiIiIhIOZQ0i4iIiIiUQ0mziIiIiEg5lDSLiIiIiJRDSbOIiIiISDmUNIuIiIiIlENJs4iIiIhIOZQ0i4iIiIiUQ0mziIiIiEg5lDSLiIiIiJRDSbOIiIiISDmUNIuIiIiIlENJs4iIiIhIOZQ0i4iIiFQDmzZtIjc3l40bNx7y3IYNGygsLIwgqvSXqNdGSbOIiIhIxIYOHUrPnj257777Yj5fUFDAhRdeSH5+Pn379uXrr79mwoQJdO/enR07djBr1izq1q2b4qhTo7zXpsiIESOYPn06QFJeGyXNIiIiIhF66aWX2Lt3LwsWLGDt2rWsWbPmkG2WL1/Ogw8+yJ133sm5557LkiVLWLp0Kddffz2LFi2iUaNGEUSefPG8NgBz587lyy+/5Cc/+QlAUl4bJc0iIiIiEcrLy2PgwIEA9OvXj3nz5h2yzZlnnkmPHj2YM2cO+fn59OzZE+cchYWFvPbaa5x//vmpDjsl4nltCgsLueGGG8jKyuKVV14BSMpro6RZREREJEI7duzgqKOOAqBly5Zs2LAh5nbOOaZMmUKLFi2oW7cu/fr1Y8aMGbRt25aLLrqI2bNnpzLslIjntXn22Wfp2LEjt956K/n5+TzyyCNJeW2UNIuIiIhEqHHjxuzatQuA7du3s2/fvpjbmRkTJ06kU6dOTJs2jUGDBjF27FiaN29O//79efHFF1MZdkrE89q89957DBs2jDZt2nD11Vcze/bspLw2SppFREREInTKKafsn3awbNkysrKyDtnmgQce4NlnnwVg8+bNNG/eHIA1a9bQvn176tevX2qyXZ3F89ocd9xxrF27FoDFixfTrl07IPGvjZJmERERkQhdcsklTJo0iVGjRvH8889z0kknMWbMmIO2GTZsGJMmTaJPnz7s3buXfv36sXXrVtq0aUPHjh15/PHHOfvssyM6guSJ57UZOnQos2fPpk+fPvz1r39l9OjRSXlt6lS5BRERERGptKZNm5KXl0dubi633norbdq0oXPnzgdt06JFC3Jzcw+pd8455wDBahE1UTyvTZMmTXjhhRcOqZvo10ZJs4iIiEjEWrRosX+VCDlYurw2mp4hIiIiIlIOJc0iIiIiIuXQ9AwRERERiVzW7a+mdH+fjOtfoe3NOZekUESiZ2ZfA59GHYdUWDvn3BGp3KH6SrWlviLxSHk/AfWVaqrUvqKkWURERESkHJrTLCIikmQWKHVKpJnVMrOEfCabWf1EtBO1mnIcUnNoTrOIiEjyHQW8YGbfhY/rAJ2AJeHj2sA4MzsamOucex/AzFoDxwEnA985554IyxcBW4BjgdFAE+BHQEfgVWBiKg6qLGb2C8o5lupwHCJFlDSLJImZtSSYy3YE8BZwtXNujZldBFwKXAf8BegKfAVc4ZzbHaOdxsCksJ2PgaHAk0Bn4Fvgc+AqIBc4jODD917n3DQzywPqA98BOOeyzWwXkA80BR5yzj1jZpcDvwbqAb92zr2d+FdE4pHAfvN3oKFzbqCZPUfQV35J7L70b+fcb81sbLEmBgEbwvu/Bn4HjHbOrTSzB4F3nHPPJ/TgazDn3OdAz6LHZjYaeNU59/vi25nZ8cCk8DW+CtgMtAPuAj4qtum68P92LLAdWE/QV85J6oFUzJuUfyzV4TiSwswaAf8DtCQ47rXAQA7+u7uc4DV60szmA4Odcx/HaKsu8FLY1lPOub9FGU/Y3onAOOfcxVHGYmbHAM8C+wj63Y2uknOTNT1DJHnOIUhi+wAzw8cAZwGzwn+znHNnAD7BG0AsI4E14Xb1Cd44AEY653oSfNAUXR90QPj8M2Z2eFh2uXMu2zmXHT7+wjl3JtCb4MML4D4gG/gZwZuRRCdR/QaCL1bF/y2tL91gZoeVqHt/Ub9xzi0FxgGjzKwpQd+ZWtkDzHRm1g64B8gN7xeV1wK2AmcALwKXhNt94pybC3xZbApHlpm9Dvwc+No5t5jwy3E6qMCxpPVxJNlgYEH4fvwd0I1D/+4eBK43sz4Ef7sxE1SCv+13nXOnAwPMrEmU8ZhZe2A80KwScSQ0FuBGYLhz7sfA0QS/ZFSKkmaR5DmP4KfF8wiSnaLEti/BqHA2kBeWPQLMLqWd04A54f15QPeiJ8zMgMbA/pFG59w6gpHkHuXE1xDYG97fTjDquM45V1YSJsmXqH4DsNvMWgGF4ePS+pJP8IWpVM65OUAWcDcw0Tm3L56DkYOFXzr+ASwi+AyeZGZFI3HHEnxRuiTc5nWC0bZ+YWL5OjAw/Lu/EvgQ+DFwlpn9FTjZzB4zs8fDpCVK8RzLINL/OJLpC+BSM/uhc+56gj5xEOfcNwR/438D7i+jrWyg6JefOQRJZpTxbAN+WokYEh6Lc+5O59wH4cNWwMbKBqWkWSR5ehKM4J4FLAS6mFlbYGf4x34EsNXMBgPTgctKaacJsCO8v5NgWgUECdMnBD9XvVmizjdA8/D+C2aWZ2aPhI+PMrO5BHMp/yssu4Dg5/7lZtYdiVKi+g3AMoLEZFn4uLS+NJFgNKa4O8N+k2dmtcOyPxOMTv+jsgeXyczsCGAGwQf8SoKfiy8D7jWzC5xzHxH8wlDLOTfIOXcWcDXwmnPubOfcWc655wADJoTNTgL+j2DqTTPgb865YWWMuqVEPMcCTCHNjyOZnHPTCf6mXjKzhwmm1sX6u3uDIF9bU0ZzjQgSTYBNQOso43HOfeWcq/QvBgl+bQAws0HA+865f1c2LiXNIklgZp2Awwl+ws4Cvg+8C9wGvBZutgVo4pybBIzlQJJb0laC0WQI3hi3hvdHAo8CH8eYn9WS4I0TDkzPGBk+/oLgw2wjsCw8Q/0I59wwgnmHkyp6vJIYCe43EHwxuoYDJ5uV1pe+JBjtyy5Wt/hPoUW/SLwPrHbOFSIVYmYnEPxSMNY596+icufcRoLR2Alm1tI5t5XgJ+cOZraaYBTtBDN73cz+FdbZx4FfiQY751YSTINYAlyRuqMqW3nHUl2OI1nM7IcEo/FdCL4MX03sv7tRBO8Dl5bR3HagQXi/MZXI7xIcT5UkOhYzKzrRtErTD5U0iyTHucDvwnnED4ePZwK/CP8FeDsshwNzTmNZyIFkpjfB1Isi/w0MLfatu+ikh67AgtIaDJPsiQSjOnUJRqNrA6sALd4enUT2GwiSj+4cSJrL6kt/Bs6sfOhSjjXAxc65kr8K4ZxbC3Rxzm0yszOA/xB86Z3vnPuxc66Hc+5sghN1i6sDXGhmXYBbCf6eu4YrcESuAseS1seRRNcDl4YJoE9wLsNBzKwHwVSH/wJuC6fmxPIuwRcOCN4XPok4nqpKWCxm1gL4J3Cdc25LVYJS0iySHOdyYMrEmxyYn7qdIHEBmAasM7MFQL8y2poAHBueHbwLeKHoCedcQdh+0dyxqQRvDleGz8GB6Rl5ZtahWLuTCH4a3gc8RpBkvwmMqfjhSoIkst9A8MG5mgNXJCurL71HsFpHkeI/hQ6q9BEJAM65vc654leGq0UwzaLo+a3hCggPAQ8QfHntF47Kvh7OA64FYGY3ECzd9h3BrwTXA0udc8sJks4pxU4EjkQ8x1IdjiPJ/gJcY8EqR6cSvCeX/LsbC/zBOfcfYDlQ2koUzwA5ZvYXguX6FpayXariqapExnI7cAzwSFi30oMDuiKgiIhIipnZU8CTzrkFxcraA6OcczeZWRuC5bquKfb8HOdcHzOrXzRf1MxOB853zo0ptt3FwAbn3DupOp6S4jkW4Jx0P47qxMy+TzDaPKuqI6oSm5JmkTQSfqsubourxBqXklnUb2qe8KfmelU5mSpd1KRjiVr4BeS5EsWrnHMlT+TNuHhSEYuSZhERERGRcmhOs4iIiIhIOZQ0i4iIiIiUQ0mziIiIiEg5lDSLiIiIiJSjTtQBiCTT4Ycf7rKysqIOQyro3Xff3eicOyKV+1RfqZ7UVyQeUfQTUF+pjsrqK0qapUbLyspi8eLFUYchFWRmn5a/VWKpr1RP6isSjyj6CaivVEdl9RVNzxARERERKYeSZhERERGRcihpFhEREREph5JmEREREZFyKGkWERERESmHkmYRERERkXIoaRYRERERKYeSZhERERGRcihpFklDmzZtIjc3l40bN0YdioiIiKCkWSTtFBQUcOGFF5Kfn0/fvn35+uuvD9lm3bp19O/fn969e3PzzTeXWiYiIiKJoaRZJM0sX76cBx98kDvvvJNzzz2XJUuWHLLNbbfdxl133cXcuXP5/PPPycvLi1kmIiIiiaGkWSTNnHnmmfTo0YM5c+aQn59Pz549D9lm9erVdO3aFYAjjzySLVu2xCwTEanutm3bpvczSYgNGzZQWFhY6fpKmkXSkHOOKVOm0KJFC+rWrXvI8wMGDCAnJ4fp06czc+ZMzjrrrJhlIiKpsmfPHo455hiys7PJzs5mxYoV3HPPPXTv3p2bbrqp3Pq7du3i2GOPBeCVV17hpJNO4rPPPuP//u//aNCgQbLDlzRScrrho48+ur9fdenShRtvvLHUukOHDqVnz57cd999AEyYMIHu3buzY8cOZs2aFfMzNV5KmkXSkJkxceJEOnXqxLRp0w55fsyYMZx//vk8+eSTDBkyhMaNG8csE5HMUNURtERYvnw5V155JXl5eeTl5bF7927mzZtHfn4+Rx55JK+//nqZ9e+77z7+85//ADBr1iweeOAB5s+fT2FhIfXq1UvFIUiaKDnd8MQTT9zfr3r37s0NN9wQs95LL73E3r17WbBgAWvXrmXNmjUsXbqU66+/nkWLFtGoUaMqxaWkWSTNPPDAAzz77LMAbN68mebNm8fcrkuXLqxfv55Ro0aVWSaZR6uvpL8NGzZw8sknV/gE3hEjRjB9+nQgsSNoifDOO+8wY8YMTj31VIYOHcobb7zBT3/6U8yMc889l7lz55Za98MPP2T58uWcdtppANSqVYtdu3Yxb948zjzzzFQdgqSJ0qYbfvHFF2zYsIFu3brFrJeXl8fAgQMB6NevH/PmzcM5R2FhIa+99hrnn39+leJS0iySZoYNG8akSZPo06cPe/fupW3btowZM+aQ7caPH8+oUaNo2LBhmWWSWUquvrJo0SKtqpKGRo8eza5duyp0Au/cuXP58ssv+clPfgKQ0BG0ROjevTuvv/46+fn5FBYWsmvXLo466igAWrZsyYYNG0qtO3r0aB5++OH9jwcOHMjDDz/Msccey80338zkyZOTHr+kj9KmG06cOJHhw4eXWm/Hjh2H9Ll+/foxY8YM2rZty0UXXcTs2bMrHVedStcUkaRo0aIFubm5B5UVzc0qLicnJ64yySxFq6/06NGDgoIC+vXrx7/+9S969OjBoEGDyMvLIzs7O+owM9qbb75Jo0aNaNOmTdwn8BYWFnLDDTdwwQUX8Morr3DxxRcfNIIW64t1qnXq1In69esD0K1bt/2JM8D27dvZt29fzHrPPvssZ555Jj/4wQ/2l/Xp04fJkyezcOFCNmzYwBtvvMFVV12V/IOQtDBmzBjmzZvH+PHj90833LdvH7Nnz+b+++8vtV7jxo0P6XODBg2iXbt2rF27lv79+/Piiy/St2/fSsWlkWYRkRqk5OorLVq00KoqaWT37t3ce++9jBs3Dih9RK2kZ599lo4dO3LrrbeSn5/PI488ktARtEQYPHgwy5YtY+/evbz88svs2LGDefPmAbBs2TKysrJi1ps5cybTpk0jOzubpUuXcuGFFwLw1ltvccYZZ1CnTh3MLFWHIWmi5HTDuXPnctppp5XZF0455ZSYfW7NmjW0b9+e+vXrl/rlLR5KmkVEapjiq69ce+21WlUljYwbN44RI0bsP1ch3hN433vvPYYNG0abNm24+uqrmT17NoMGDWLs2LE0b958/whalO6++24GDx5Mly5d6NmzJ2PGjOG9997jV7/6FePGjePKK69k06ZNXH/99QfVmzx5MnPnziUvL48uXbowY8YM9u3bR8OGDWndujXz58/nRz/6UURHJVEpOd1w1qxZ9OnTZ//zK1euPOQXlksuuYRJkyYxatQonn/+efr378/WrVtp06YNHTt25PHHH+fss8+udEzmnKt0ZZF0161bN7d48eKow5AKMrN3nXOxz/RIkprYV+666y48z+Ooo45i/PjxdO/ePS1+xk+k6tZX+vTpQ61awXjV0qVLGTBgAA899BC9e/fm7bffLvV8hIceeoh69eoxYsQIJk2axJIlS/jzn//MpEmTOP7443n33XfxfZ+//vWvlT6uZNi1axevvvoqXbt23b+cXBSi6CdQM99X0l1BQQG5ubn06dOHNm3aVLh+WX1Fc5pFRGqQBx54gO9973v8/Oc/37/6StHPnP/85z+jDi/jzZkzZ//97OxsnnzySe65556DRtRWrlzJ5MmTDzqXYejQoVx33XU899xzFBYWMnXq1ING0G688UbuvvvulB9PeRo0aMCAAQOiDkMySIsWLfavoJFoGmmWGq06fMvPuv3VlO7vk3H9U7q/yqhuo4fppKCggIEDB/Ldd9/heR4TJ05k7NixHHfccQwePDjq8BJOfUXioZFmiZdGmkVEMkSs1Ve0qoqISNUpaRYREZFSpfrXsESoDr+oSfWjpFlEREREIpfu0xWVNKeY7/u1gLZAVnhrFz5uQPD/YcBu4FtgM7AW+Aj4GPjM87y9KQ9aRETKVN1GYzUSK1JxSpqTzPf9JsCZwFlAX6AjULeSze32fX8d8A7wGpDred7XCQlUREREREqlpDkJfN8/FhgC9AO6kbjXuR7QIbwNAZzv+0sJEuhXPM9bkKD9iEgaSvefLkVEajIlzQni+34d4CLgF8DZBNMsks2Ak8PbbWECPRGY7HnezhTsX0RERCQjKGmuIt/3Dwf+C7ge+F7E4XQBngDG+77/NDDR87yPI45JREREpNpT0lxJvu83Am4GRgNNIg6npObAb4CRvu8/Cdzjed5XEcckIiIiUm3VijqA6shy7Koer/TI3bNvz12kX8JcXB2C6SIf+b4/OpxCIiIiIiIVpKS5AizHfmA5lgf8Y8eeHT3/7P95ftQxxakJMB5Y4vt+z6iDEREREalulDTHyXLsYmAJwfJxAExaM8nbsnvLluiiqrAfAXN8378l6kBEREREqhMlzeWwHKtjOfYn4GWCucL7OVzL0QtHL40mskqrA/zB9/0Xfd9vGnUwIiIiItWBkuYyWI61Bd4CRpW2zTtfvdNz7da1n6YuqoS5DFjs+/6Pog5EREREJN0paS6F5djxwEKgVzmb1hv+9vAvUxBSMvwQeMf3/X5RByIiIiKSzpQ0x2A5dgKQB3w/nu3/vfPfp7357zer2zSNIg2BV3zfPyfqQERERETSlZLmEizHTgRmU8ELldyWf1uDfW7fvuRElXSHESTOZ0cdiIiIiEg6UtJcjOXYSQQjzG0qWvfbvd92ePSDR99OeFCp0wCY5vv+WVEHIiIiIpJulDSHLMeOAGYCR1a2jcc/eLzDjsId2xMXVco1AKb7vt816kBERERE0omSZsByrDbwHNC2Ku3sY9+Rdyy6Y3FiojrUli1bmD9/PgUFBWWWlWXjxo0UFhaWtUkDYKrv+83L2khEREQkkyhpDtwH/DgRDc3+z+wen+/4/ItEtFXcli1buOmmm/B9n+uuu45NmzbFLItl8uTJXHHFFezcuZP58+dTt27d8nb3A+AZ3/ct0cchIiIiUh3ViTqAqFmOXQTclsAmDxvx9ohPp/WbdlQC22T16tXccsstdO7cma1bt/LBBx9Qr169Q8pOP/30Q+quWrWKyy67jPfff58GDRrEu8uLgFuAPyTwMERERESqpYweabYcywKeBRI6orpu27pe73z1jp/INrt3707nzp1ZvHgxK1asoHPnzjHLYnHOsWfPHubPn88ZZ5xRkd3e7/t+74QcgIiIiEg1ltFJMzABaJaMhn+z4Dc451wi23TOMXPmTJo2bUqdOnVKLSupV69ezJkzh9atWzNy5Ejy8/Pj3WUd4Enf9+sl5ABEREREqqmMTZotxy4B+ier/e17tnvPrHlmQSLbNDPGjBnD8ccfT15eXqllJZ133nkMHz6cJk2a0KdPH3Jzcyuy2+OB0VUMXURERKRay8ik2XKsPvBgsvfzkP9Qu2/3fLsrEW099dRTTJs2DYBt27bRpEmTmGWlWb9+PUcffTT16tWjEgPgd/q+n9A52iIiIiLVSUYmzcAvCVaISKq9bu9RY5eMXZiItgYMGMD06dMZMmQI+/bto1evXjHLYtm+fTutWrWiffv2TJ06lR49elR09w2Be6t6DCIiIiLVVcatnmE51gy4M1X7e/WzV7v95ke/2dC6QevWVWmnWbNmPPHEE+WWxdK4ceP9CfXUqVMrG8IQ3/f/5Hne+5VtQERERKS6ysSR5iFAixTur/HI+SNXp3B/yVIL+HXUQYiIiIhEIROT5hGp3uEHmz84fek3Sz9M9X6T4Crf95Oy2oiIiIhIOsuopNly7CygQwS7rjVy/shvI9hvojUkGKkXERERySgZlTQTwShzkc27N3d5Ye0LCTkpMGK/iDoAERERkVTLmKTZcuwogktDR+Z3S3/XZve+3bujjCEBTvR9v2/UQYiIiIikUsYkzcClRLxayB63p924peMSesGTiAyKOgARERGRVMqkpPnsqAMAeGHdC102fbvpm6jjqKK0eC1FREREUiUjkmbLsdpAukwpaPbrd35d3dc6bu/7ftIvDiMiIiKSLjIiaQZOBZpGHUSR9755r9eHmz/8OOo4qkijzSIiIpIxMiVpPifqAEqoc9PbNxVEHUQVKWkWERGRjJEpSfOZUQdQ0lffftXtX5/9692o46iC7KgDEBEREUmVTEmao7igSbnuWnxXs71u796o46ikI3V1QBEREckUNT5pthyrD3w/6jhi+W7fd8f9xf/L21HHUQXHRh2AiIiISCrU+KQZaAdY1EGU5u+r/95x6+6tW6KOo5K0goaIiIhkhExImuMfDd0JfAzsqMLetgMVmHDhcIffmn/re1XYY5Q00iwiIiIZIROS5vhGQ3cBk4EvgGc4OHHeDjxWRt2FwOPAbuAjoHbFAnx7w9u9Ptn2yfqK1UoLGmkWERGRjFDppNnM8ipZr4uZdansfmO019rM5paxSZu4GtoAnAv0AdoD/yn23GtAYRl1vwS6EiTc9eLaW0n1hr89/N+Vqhmt1lEHICIiIpIKUYw0dwlvVWZmLQjGhRuVsVl8aWwWcDTwCUHy2zYsXwvUBRqXU38vwdSO4+La2yE+3/F5j7f+89ayytWOTOW+IoiIiIhUM+UmzWZ2mJk9Z2bzzGyGmTWMsc1YM8sO718T3hqE288xs/81szpm9nvgduB2M3sj3L6hmU0Nt5tYrM08MxtvZrPKCG8vMAjYWsY2dcs7xv0c8D7QgGCKxR5gDuVfxqM9sJrgmoP/BNbFvceD3LLwlnrOuT3Avmpyq+BEFBEREZHqKZ6R5mHAMufcGcCLgBdn2x2Bfc65PsDTQGPn3B3AOGCcc+6sYu374XbfM7NOYXkPYIFz7tzSduCc2+qcS9zKEwb0J5h0sAqYB3QnSKLL4hFc6uMw4HhgZeV2f1KLkxxBIlqrGt1ERET+f3v3Hh9Fdf9//HW4yi0oqGAVoWAV7UgQxQIKRgURUUSL4p0W8UppLd6qoJFqW63f1gvQWsVq5SdeAFtFWxSECBggoCKsKKCgKCqCYLgKgXx+f8wEYthkN8nuzoZ9Px+PPLI7O3PmZaERNgAAIABJREFUc5LJ5r1nz8yK7PfiCT3tgYLg9tPAghjrl0TMd4GIc+4N/NnC28pZ/xjggmCOdFvg8GB5xMxeiqO+WCqajbzXHGBRcPt7/AC8Er/nT+HPW365gu03AM3wI69VrdCHuz5c5JxL28vjRRHfz1ZERESkhosnNH+EP94KcCcwJMo6O4FDgttnB9+zgbfN7CzgIKB7sHw70BAgCIjLgIfNLAcYCZRcRWJL3L2o2M641joRWAz8E3/iQTtgMPDL4KslcH45236PP+f5EOAdqnQhtot+fNG8pvWaZld+y1DF97MVERERqeHqxLHOE8C/gpHgb4HLo6zzCvA359yZwTrgn1L3gHNuBH6sXBgsnwa86Jy7HLgjaP8p59wv8ecmX1a1rpRrXVxrNQCuquDxX1bw2AH4IRvghviKKq2Oq1P0u+zfxXeVj/TyddgFiIiIiKRCzNBsZtuBi6Mszyl1O4J/sbay9pmPbGYb2PfUugrbj6PGitb9JN52wnJrh1vn1qtdL9rPL91V8ZRHERERkZolnpHmtBDlutCFZlbehInS0jo0Z9XNKryk3SXxnlyZblaGXYCIiIhIKtSY0FyZkecyPsWfpZyWV3p48GcPLqrlap0Wdh1VpJFmERERyQhpGSQTyXJtJ/B52HVE07px69VdD+3aNew6qkEjzSIiIpIR9vvQHPg47AKiGXvK2DXOuZr6qXprPc+r6ENlRERERPYbmRKa54RdQFndDu22pHXj1jV5lDkv7AJEREREUiVTQvMbYRdQ1v91+b+a9CEm0UwPuwARERGRVMmU0Dwf+C7sIkpcftTlc5vUbVJTr5hRYlrYBYiIiIikSkaEZsu13cCMsOsAqFur7o6bj7/5iLDrqKaPPc/7LOwiRERERFIlI0Jz4PWwCwC4s+Od8+rWqtsq7DqqSVMzREREJKNkUmh+FdgVZgEH1jtww8/b/LxjmDUkyHNhFyAiIiKSShkTmi3XvgReDrOGh7o8tMQ51zTMGhIg4nnerLCLEBEREUmljAnNgTFh7bhdVrtVJx58Yrew9p9Aj4VdgIiIiEiqZVRotlzLAyJh7Htst7HfOOfqhrHvBNoCjA+7CBEREZFUy6jQHBib6h32aNnj/cMbHf6z6rZTWFhIfn4+GzdurHBZRdavX09RUVFVS3hWnwIoIiIimSgTQ/N4YEMK92d/PvnP1f6o7MLCQoYOHUokEmHw4MFs2LAh6rJoJkyYwCWXXMK2bdvIz8+nbt0qDXjvBh6uTh9EREREaqo6YReQapZrW90ody/wUCr2N/jowfmN6jY6pbrtLF++nFtvvZXs7Gw2bdrEhx9+SL169fZZdsop++5q2bJlXHjhhXzwwQc0aNCgqiU86XneR9XqhIiIiEgNlYkjzeBP0ViR7J3Ur11/+7CfDvtxItrq3Lkz2dnZLFy4kCVLlpCdnR11WTRmxq5du8jPz+fUU0+tyu63AvdUo3wRERGRGi0jQ7PlWhHw22Tv5+4T7i6oU6vOjxLVnpkxdepUsrKyqFOnTrnLyurWrRuzZs2iRYsWDBs2jIKCgsru+l7P876qXvUiIiIiNVdGhmYAy7XXgH8nq/3m9ZuvO+/I8zolsk3nHCNHjuToo48mLy+v3GVlnX322dxwww00adKEHj16MG3atMrs9iPgr9UsXURERKRGy9jQHPg18F0yGn6k6yMfOueaJKq9J598kldeeQWAzZs306RJk6jLyrN69WpatWpFvXr1MLN4d1sE/NLzvCpfbkNERERkf5DRodly7QvgKiDuFBmP9k3bf9KhWYdqn/xX2oABA5gyZQqDBg2iuLiYbt26RV0WzZYtW2jevDnt2rVj0qRJdOnSJd7d3uZ53ryEdUJERESkhsq4q2eUZbk2xY1y9wF3JarNMaeM2eCca5eo9gCaNm3KE088EXNZNI0bN94TqCdNmhTvLid7nqdLzImIiIiQ4SPNpdwDTE1EQ70O7/VuiwYtOieirRB9DAwOuwgRERGRdKHQDFiuFQOXAauq047DFf/hpD80SkxVodkCXKRP/hMRERHZS6E5YLm2EegLfF3VNq479rr8BnUaHJO4qlJuK3CO53mLwi5EREREJJ0oNJdiufYhkAN8WdltG9RusPX6Y68/KuFFpc42oK/nebPDLkREREQk3Sg0l2G5tgw4Dfi8Mtv9/qTfL6jtardMTlVJtx04z/O8t8IuRERERCQdKTRHYbn2MX5w/iye9Q894NC1vQ/vXVNP/tsM9PM8b0bYhYiIiIikK4XmcliurQJOAd6Ote7obqOXO+dq4gmAS4HOnudND7sQERERkXSm0FwBy7U1+HOc76ecD0DxDvKWH3vgsQn9IJMUeR442fO8ZWEXIiIiIpLuMv7DTWKxXNsF3OFGubeAZ4BDSj8+utvozc65mvTiowi4xfO8R8MuRERERKSmqElhL1SWa1OBjsCeqQx9W/VdePABB58YXlWV9g7QTYFZREREpHIUmivBcu1Ly7VewIBartbKUSeOOijsmuK0Efg1/nSMhWEXIyIiIlLTKDRXgeXa5Ce7P3lc/dr1Hwc2hF1PBYqAR4CjPM8b7XlecdgFiYiIiNREmtNcRb84/Rc7gD9HIpF/4I/iXgO0CreqPdYBjwOPeZ73RdjFiIiIiNR0Cs3V5HleIXBvJBL5A9AHuBb/47hrh1BOATAGeNHzvB0h7F9ERERkv6TQnCDB1IfXgNcikcjhwCCgN9AFqJek3e4G5gOvA1M8z3svSfsRERERyWgKzUnged4a4I/AHyORSAP8D0k5PfjqBNSvYtNbgI+BBfhB+U3P876rfsUiIiIiUhGF5iTzPG87/mXqpgNEIhEHtACOBFoHX0cCDfCndDhgB/A9/kdcr8QPyis8z/s61fWLiIiIiEJzynmeZ8DXwVdByOWIiIiISBx0yTkRERERkRgUmkVEREREYlBoFhERERGJQaFZRERERCQGhWYRERERkRgUmkVEREREYlBoFhERERGJQaFZRERERCQGhWYRERERkRgUmkVEREREYlBoFhERERGJQaFZRERERCQGhWYRERGRGm7t2rUUFRWFXcYe6VZPIig0i4iIiIRs7dq1dO/evdzH16xZwxFHHEFOTg45OTmsW7eOMWPG0LlzZ7Zu3crrr79O3bp199t60kGdsAsQERERyWQbN25k0KBBbN26tdx15s+fz4gRI7jhhhv2LFu0aBFDhgxhwYIFNGrUaL+tJ11opFlEREQkRLVr1+aFF14gKyur3HXmzZvHuHHj6NSpE3feeScAZkZRURFvvPEGffr02W/rSRcKzSIiIiIhysrKomnTphWu06dPH/Ly8liwYAFz585l8eLFnHXWWbz66qscccQR9OvXj5kzZ+6X9aQLTc8QERERSXPdunWjfv36AJxwwgmsWLGCgQMH0rp1a1auXEnfvn2ZPHkyp59+ekbWkwoaaRYRERFJc7179+arr75i27ZtvPHGG3ieB8CKFSto164d9evXp7i4OGPrSQWFZhEREZE0MmPGDMaMGfODZbm5uZx++ul06dKF66+/nmOOOYZNmzbRsmVLjjvuOB5//HF69uyZEfWERdMzRERERNJAXl4eAGeccQZnnHHGDx47/fTT+eijj36wLCsri169egH+lSv293rCppFmEREREZEYFJpFRERERGJQaBYRERERiUGhWUREREQkBp0IKCIiIpIibX73Wsr3+en9fct9LNX1VFRLunNmFnYNIknjnFsHfBZ2HVJprc3skFTuUMdKjaVjReKR8uMEdKzUUOUeKwrNIiIiIiIxaE6ziIhIkjlfuVMinXO1nHMJ+Z/snKufiHbCtr/0Q/YfmtMsIiKSfIcDE51zO4L7dYAOwLvB/drA/c65VsBsM/sAwDnXAjgKOAHYYWZPBMsXAIVAW+AWoAlwPHAc8BowNhWdqohz7npi9KUm9EOkhEKzSJI455rhz2U7BHgLuMLMVjjn+gEXAIOBR4BOwDfAJWa2M0o7jYHxQTufAFcD44Bs4HvgC+AyYBpwAP4/33vN7BXnXB5QH9gBYGY5zrntQAGQBTxsZv9yzl0E3ATUA24ys7cT/xOReCTwuHkaaGhmFzvnnsc/Vn5F9GPpSzO70zl3T6kmBgJrg9s3AX8EbjGzpc65vwLzzOzFhHZ+P2ZmXwBdS+47524BXjOzP5Vezzl3NDA++BlfBnwHtAbuAj4uteqq4Hd7D7AFWI1/rPRKakcqZwax+1IT+pEUzrlGwP8DmuH3eyVwMT/8u7sI/2c0zjmXD1xpZp9Eaasu8FLQ1pNm9s8w6wnaOxa438zOD7MW59yRwDNAMf5xd51VcW6ypmeIJE8v/BDbA5ga3Ac4E3g9+N7GzE4FIvhPANEMA1YE69XHf+IAGGZmXfH/0fQMlg0IHv+Xc+7gYNlFZpZjZjnB/TVmdhrQHf+fF8B9QA5wOf6TkYQnUccN+C+sSn8v71i6xjl3QJlt/1By3JjZIuB+YLhzLgv/2JlU1Q5mOudcayAXmBbcLlleC9gEnApMBvoH631qZrOBr0tN4WjjnJsOXAWsM7OFBC+O00El+pLW/UiyK4G5wfPxDuAk9v27+yswxDnXA/9vN2pAxf/bfsfMTgEGOOeahFmPc64d8CDQtAp1JLQW4DrgBjM7A2iF/05GlSg0iyTP2fhvLZ6NH3ZKgu3p+KPCOUBesGw0MLOcdn4GzApuzwE6lzzgnHNAY2DPSKOZrcIfSe4So76GwO7g9hb8UcdVZlZRCJPkS9RxA7DTOdccKArul3csRfBfMJXLzGYBbYC7gbFmVhxPZ+SHghcdzwIL8P8Hj3fOlYzEtcV/odQ/WGc6/mjbWUGwnA5cHPzdXwp8BJwBnOmc+xtwgnPuMefc40FoCVM8fRlI+vcjmdYAFzjnfmJmQ/CPiR8ws2/x/8b/CfyhgrZygJJ3fmbhh8ww69kM/LwKNSS8FjMbYWYfBnebA+urWpRCs0jydMUfwT0TmA90dM4dAWwL/tgPATY5564EpgAXltNOE2BrcHsb/rQK8APTp/hvV80os823wIHB7YnOuTzn3Ojg/uHOudn4cyl/HSw7B//t/sXOuc5ImBJ13AC8jx9M3g/ul3csjcUfjSltRHDc5DnnagfLHsIfnX62qp3LZM65Q4BX8f/BL8V/u/hC4F7n3Dlm9jH+Owy1zGygmZ0JXAG8YWY9zexMM3secMCYoNnxwH/xp940Bf5pZtdWMOqWEvH0BXiBNO9HMpnZFPy/qZecc4/iT62L9nf3Jn5eW1FBc43wgybABqBFmPWY2TdmVuV3DBL8swHAOTcQ+MDMvqxqXQrNIkngnOsAHIz/FnYb4EfAO8DtwBvBaoVAEzMbD9zD3pBb1ib80WTwnxg3BbeHAX8HPokyP6sZ/hMn7J2eMSy4vwb/n9l64P3gDPVDzOxa/HmH4yvbX0mMBB834L8w+gV7TzYr71j6Gn+0L6fUtqXfCi15R+IDYLmZFSGV4pxrj/9OwT1m9r+S5Wa2Hn80doxzrpmZbcJ/y/kY59xy/FG09s656c65/wXbFLP3XaIrzWwp/jSId4FLUterisXqS03pR7I4536CPxrfEf/F8BVE/7sbjv88cEEFzW0BGgS3G1OFfJfgeqol0bU450pONK3W9EOFZpHk6A38MZhH/GhwfypwffAd4O1gOeydcxrNfPaGme74Uy9K/AO4utSr7pKTHjoBc8trMAjZY/FHderij0bXBpYBunh7eBJ53IAfPjqzNzRXdCw9BJxW9dIlhhXA+WZW9l0hzGwl0NHMNjjnTgW+wn/Rm29mZ5hZFzPriX+ibml1gHOdcx2B2/D/njsFV+AIXSX6ktb9SKIhwAVBAIzgn8vwA865LvhTHX4N3B5MzYnmHfwXHOA/L3wacj3VlbBanHMHAc8Bg82ssDpFKTSLJEdv9k6ZmMHe+alb8IMLwCvAKufcXOCsCtoaA7QNzg7eDkwsecDMNgbtl8wdm4T/5HBp8BjsnZ6R55w7plS74/HfGi4GHsMP2TOAkZXvriRIIo8b8P9xLmfvJ5JVdCy9h3+1jhKl3wodWOUeCQBmttvMSn8yXC38aRYlj28KroDwMPAA/ovXs4JR2enBPOBaAM65a/Av3bYD/12CIcAiM1uMHzpfKHUicCji6UtN6EeSPQL8wvlXOToZ/zm57N/dPcCfzewrYDFQ3pUo/gWMcs49gn+5vvnlrJeqeqorkbX8DjgSGB1sW+XBAX0ioIiISIo5554ExpnZ3FLL2gHDzWyoc64l/uW6flHq8Vlm1sM5V79kvqhz7hSgj5mNLLXe+cBaM5uXqv6UFU9fgF7p3o+axDn3I/zR5terO6Iq0Sk0i6SR4FV1aYVWhWtcSmbRcbP/Cd5qrledk6nSxf7Ul7AFL0CeL7N4mZmVPZE34+pJRS0KzSIiIiIiMWhOs4iIiIhIDArNIiIiIiIxKDSLiIiIiMSg0CwiIiIiEkOdsAsQSaaDDz7Y2rRpE3YZUknvvPPOejM7JJX71LFSM+lYkXiEcZyAjpWaqKJjRaFZ9mtt2rRh4cKFYZchleSc+yz2WomlY6Vm0rEi8QjjOAEdKzVRRceKpmeIiIiIiMSg0CwiIiIiEoNCs4iIiIhIDArNIiIiIiIxKDSLiIiIiMSg0CwiIiIiEoNCs4iIiIhIDArNIiIiIiIxKDSLiIiIiMSg0CwiIiKyn9i8eTOFhYVhl7FfUmgWSUMbNmxg2rRprF+/PuxSpIZau3YtJ5xwwp7b3bt3D7kiESlt1apV9O3bl+7du3PzzTezceNGzjnnHE466SSuu+66crfbtWsXRx55JDk5OeTk5LBkyRJefvllfvrTn/L555/z3//+lwYNGqSwJ5lDoVkkzWzcuJFzzz2XgoICTj/9dNatW7fPOmWfbEu78cYbmTJlSqrKlTR1yy23sH37djZu3MigQYPYunVr2CVJEhUXF/Pll1+GXYZUwu23385dd93F7Nmz+eKLLxg/fjyXX345CxcuZPPmzSxcuDDqdosXL+bSSy8lLy+PvLw8jj/+eF5//XUeeOAB8vPzKSoqol69einuTWZQaBZJM4sXL+avf/0rI0aMoHfv3rz77rv7rFP2yTYvLw+A2bNn8/XXX3PeeeeluGpJJzNmzKBRo0a0bNmS2rVr88ILL5CVlRV2WUL0UcLc3Fw6d+7M0KFDK9y2Y8eOe7abNm0aCxYsoH379ixYsIBp06ZRp06dFPVCEmH58uV06tQJgEMPPZSmTZsSiUT47rvv+Pzzz2nVqlXU7ebNm8err77KySefzNVXX82uXbuoVasW27dvZ86cOZx22mmp7EZGUWgWSTOnnXYaXbp0YdasWRQUFNC1a9d91in7ZFtYWEhRURHXXHMNbdq04eWXX0512ZImdu7cyb333sv9998PQFZWFk2bNg25KilRdpRw586dzJkzh4KCAg499FCmT58edbtvv/2W9u3b79muV69ezJw5k//7v/9j5syZfPPNNxx66KEp7k1q7K9zdAcMGMCoUaOYMmUKU6dOJScnh88++4xHH32UY489lmbNmkXdrnPnzkyfPp2CggKKior473//y8UXX8yjjz5K27Ztufnmm5kwYUKKe5MZFJpF0pCZ8cILL3DQQQdRt27dfR4v+2R75pln8swzz3Dcccdx2223UVBQwOjRo0OoXMJ2//33c+ONN3LggQeGXYpEUXaU8M033+TnP/85zjl69+7N7Nmzo243f/58CgoK6NatG/3792fz5s17RhdXrlzJ0UcfneKelK9kPn1F08hKy9Q5uiNHjqRPnz6MGzeOQYMGMWrUKB577DHuvvtu2rdvz1NPPRV1uw4dOnDYYYcBcNJJJ7FixQp69OjBhAkTaNWqFW3btuXNN99MZVcyhkKzSBpyzjF27Fg6dOjAK6+8ss/jZZ9sGzduzHvvvce1115Ly5YtueKKK5g5c2YIlUvYpk+fztixY8nJyWHRokUMGTIk7JKklLKjhNu3b+fwww8HoFmzZqxduzbqdm3btuX1118nPz+fDh068NRTT3Huuefy0EMP0bx5c5577jkefPDBVHalXCXz6cubRlZWJs/R7dixI6tXr2b48OFs3LiRJUuWsHv3bubPn49zLuo2V155Je+//z67d+/mP//5D9nZ2QC89dZbnHrqqdSpU6fcbaV6FJpF0swDDzzAM888A8B3331X7ohh6SdbgKOOOoqVK1cCsHDhQlq3bp2agiWtzJo1a0/46NixI+PGjQu7JCml7Chh48aN2b59OwBbtmyhuLg46nZt27blqKOO2rPdihUraN++PdOnT+fEE09k8+bNfPTRR6npRAVKz6ePNo0smkyeo/vggw8yfPhwGjZsyB133MG1115L06ZN2bBhA5deeilLly5l5MiRP9jm7rvv5sorr6Rjx4507dqVnj17UlxcTMOGDWnRogX5+fkcf/zxIfVo/+bMLOwaRJLmpJNOsvLOQE5XGzdu5OKLL2bHjh14nsfQoUN57rnnuO+++36wXm5uLkcddRRXXnkl4M/7Gzx4MGvXrqWoqIhJkybtGcGqaZxz75jZSancZ008VqTmHSsXX3wxI0aMwPM8evXqRU5ODt988w1jxozhqaee4quvvuLOO+/cZ7tbb72V7t27069fP6666ip69OjBkCFDmDx5Mn369OHXv/41QKgvknbu3Env3r3597//Tf/+/enZsyfbt2+nS5cuDB8+nPfee4/GjRvvs92CBQs44ogjOOyww7jqqqsYMGAABx54ICNGjODCCy9k7ty59O/fn8suu6zKtYVxnICeV2qiio4VnWorkmYOOuggpk2b9oNlZQMzwKhRo35wv0mTJkycODGptYlI9dx9991cdtllmBn9+vVj5MiRdO/end/85jdMnTqVqVOnsmHDBm677bYfBODhw4fTv39/7rzzTrp27cqgQYMA//yHhg0b/mCaR1jKzqcfOXIkc+bM4cEHH9wzjSyaDh06UL9+fWDvKHrJyWzz589n7dq1vPnmm9UKzSKJoNAsIiKSIp7nsXjx4h8smz59Oq+99hq/+c1v+PGPfwzsO2J82GGHMX/+/H3aGzBgAADPPvtskiqO3/Tp05kxYwZjx47dM5/+4YcfZvXq1Tz33HPlbnfllVfuGX3/z3/+s2ek/a233qJnz54sWrRIc3QlLSg0i4iIhKhBgwZ7wm9NNmvWrD23c3JyGDduHLm5uXvm7AIsXbqUCRMm/ODds7Kj79Hm6J5//vkp749IWZrTLPs1zSermWraPFUJj44ViYfmNEu8NKdZRGQ/0OZ3r6V0f5/e3zel+xPJBKn+Owb9LSeKQrNIyBSERGq+MIJQdeh5QKTydJ1mEREREZEYNNIcokgkcgDQBjgCOAD/9+GAncAO4DvgE8/zNoZVo4iIiIgoNKdEJBKpA3QGTgc64AflNsCh+CE51vYbgU+CrxXAPCDP87ytyalYRETEV9OmnoCmn0hyKDQnSSQS+SlwFnAG0APIqkZzBwEnBV8ldkYikXzgjeDrXc/zdCkUERERkSRQaE6gSCTSCLgUuB44Mcm7qwfkBF9/BD6PRCL/AB73PG9dkvctIiIiklEUmhMgEokcjx+Ur6B6I8rV0Qq4D7grEolMBMZ6njcvpFpERERE9isKzdUQTMH4I9Av7FpKqY8f3q+IRCLzgTs8z5sZck0iIiIiNZpCcxVEIpFDgN8CtwG1Qy6nIj8DZkQikVeBmz3PWx52QSIiIiI1ka7TXAmRSKRWJBL5FbAcuAP4IOSS4nUusDgSifw+uMydiIiIiFSCQnOcgtHl/wGjgQODxXWAmnLFivrAXcB7kUjkuLCLEREREalJFJrjEIlETgXew7+EXGnHAfmpr6ha2gMFkUjk0rALEREREakpNKe5ApFIxOHPW76P8n9WbYFtQMNU1ZUAjYAJkUikGzDc87yisAsSERERSWcaaS5HJBKpC0wE7qfiFxeHAQUpKSrxfgXMCqaeiIiIiEg5FJqjiEQi9YDJwM/j3ORk4KvkVZRUXfCvsHFw2IWIiIiIpCuF5jIikUh94CXgvEps1hD4JDkVpYSHgrOIiIhIuRSaSwkux/ZvoG8VNj8FWJrYilLqeODNSCTSPOxCRERERNKNQnMgOOnvRaBPFZtwwK7EVRSKDvjBOayPAhcRERFJSwrNe91N5aZkRNMBmJuAWsKUDTwVdhEiIiIi6UShGYhEIr3xQ3MiHAHsSFBbP1BYWEh+fj4bN26scFlF1q9fT1FRzCvMXRiJRH5b9UpFRERE9i8ZH5ojkUhr4FkS97NoRRJGmwsLCxk6dCiRSITBgwezYcOGqMuimTBhApdccgnbtm0jPz+funXrxrPLB4LrOIuIiIhkvIz+cJPg0nITgUSf/NYJWAck7PrHy5cv59ZbbyU7O5tNmzbx4YcfUq9evX2WnXLKKftsu2zZMi688EI++OADGjRoEO8u6wIvRCKREzzPW5+ofoiIiIjURJk+0nwn0DkJ7WYBHyaywc6dO5Odnc3ChQtZsmQJ2dnZUZdFY2bs2rWL/Px8Tj311Mrs9gjgsUTULyIiIlKTZWxojkQiRwG3J3EXpwArEtmgmTF16lSysrKoU6dOucvK6tatG7NmzaJFixYMGzaMgoJKfYDhz4M53yIiIiIZK2NDM/AIcEAS268NbEpkg845Ro4cydFHH01eXl65y8o6++yzueGGG2jSpAk9evRg2rRpld316GAqi4iIiEhGysjQHIlEegLnpGBXJwILEtHQk08+ySuvvALA5s2badKkSdRl5Vm9ejWtWrWiXr16mFlld/8T4IaqVS4iIiJS82VcaA4+xOTBFO6yOQn40JMBAwYwZcoUBg0aRHFxMd26dYu6LJotW7bQvHlz2rVrx6RJk+jSpUtVShipDz0RERGRTJWJV884B+iYwv21Bd4CTqup8x+8AAAYB0lEQVROI02bNuWJJ56IuSyaxo0b7wnUkyZNqmoJBwM3AvdXtQERERGRmirjRprxg1+qdQC+C2G/iXZdJBLJxGNGREREMlxGBaBIJPJj4OwQdn0Q8H4I+020NkCfsIsQERERSbWMCs34J7OF1eduwKqQ9p1IOiFQREREMk7GhOZIJHIAMDjEEurif0pgTdcnEom0CbsIERERkVTKmNAM9CXxH5ddWScD74VcQ3XVAgaFXYSIiIhIKmVSaE6XT7VrCBSHXUQ1nRV2ASIiIiKplEmhuWfYBQSOAd4Ou4hqOlnXbBYREZFMkhGhORKJtAN+HHYdpRwDbAm7iGqoA+SEXYSIiIhIqmREaCZ9RplLHAosDLuIauoVdgEiIiIiqaLQHJ4uwBdhF1ENZ4ZdgIiIiEiqZEpoPj7sAqI4AFgddhHVcEwkEqkXdhEiIiIiqbDfh+ZIJOLwP8kuHXUDloRdRBXVAlqHXYSIiIhIKuz3oRn4EVA/7CIqUAuwsIuoorZhFyAiIiKSCpkQmuMOdoWFheTn57Nx48Yq72z9+vUUFRVVZpOfAvlV3mG40umKJCIiIiJJkwmhOa5gV1hYyNChQ4lEIgwePJgNGzbseWz9+vVcdNFF5W47YcIELrnkErZt20Z+fj5169atSo3bK7tRGtBIs4iIiGSEOmEXkAJHxrPS8uXLufXWW8nOzmbTpk18+OGHnHLKKQD85S9/4fvvvy9322XLlnHhhRfywQcf0KBBg6rU+CMgj5p37eNWYRcgIiIikgqZMNLcMJ6VOnfuTHZ2NgsXLmTJkiVkZ2cDMH/+fBo0aMDBBx9c7rZmxq5du8jPz+fUU0+tap2dgbVV3TgkVXqFICIiIlLTZEJojns03cyYOnUqWVlZ1KlTh6KiIv7xj39w0003Vbhdt27dmDVrFi1atGDYsGEUFBRUpc5GwIqqbBgiXXJOREREMkImTM+oHe+KzjlGjhzJ6NGjycvLY9WqVQwcOJCsrKwKtzv77LM57LDD+OKLL+jRowfTpk3j5JNPrnShG77fcMzleZfPNbMa8WImq17WmqXe0rDLEBEREUm6TAjNu+JZ6cknn+SQQw6hX79+bN68mSZNmjBv3jwKCgp4/vnnWbZsGbm5uYwaNSrq9qtXr6Z169Zs3rwZs6pdQe6383774Rdbv+hRpY1DsGbbmvVh1yAiIiKSCpkQmnfGs9KAAQO45ZZbmDx5Mj/5yU/o1q3bnhMBAX75y1+WG5i3bNlC8+bNadeuHb///e+5/vrrK13kx4Ufr3r323e7VnrDcMX1sxURERGp6TIhNG+OZ6WmTZvyxBNPlPv4U089Ve5jjRs3plu3bgBMmjSpkuX5bsy/8Rtq3nWPC8MuQERERCQVasTc2WpaFXYBseR9mbfoq21f/SzsOqog7X+2IiIiIomQCaH5k7ALqIiZ2e0Lbk/nj/muyMqwCxARERFJBYXmkI1bNi5/265tx4ZdRxVppFlEREQywn4fmj3PKwS+DbuOaL7f9f32sUvH1rR5zKVppFlEREQywn4fmgNpOdo86r1R83fb7h+FXUcVbQe+DrsIERERkVTIlND8btgFlLVu+7p1r65+9cSw66iGeZZbxQtSi4iIiNQwmRKa3wi7gLJ+M/c3HwJNwq6jGqaHXYCIiIhIqmRKaJ5BnJ8MmApLNy79eMnGJafEXjOtKTSLiIhIxsiI0BycDDg/7DpK/Cr/V98BtcOuoxq+AxaGXYSIiIhIqmREaA6kxRSNqV9MfWfd9+tOCruOapppuVYcdhEiIiIiqZJJofm/YRdQbMXFdy28qybPYy7xctgFiIiIiKRSxoRmz/MWAovCrOHvH/797e93f390mDUkwEbgxbCLEBEREUmljAnNgbFh7Xjbrm1bH//w8ZoemAGetlzbHnYRIiIiIqmUaaH5WfyR0pQbuXDkgmKKW4Sx7wQy4LGwixARERFJtYwKzZ7nbQf+mer9fr3t66+nrZl2crUb2ob/2YZbYyyryBZgd5UreNNybXmVtxYRERGpoTIqNAfGAim98sOw/GEfAw2r1ch2YAKwBvgXfkiOtiya+cDjwE7gY6pzsbvRVd5SREREpAbLuNDsed4qYHyq9rd4w+JlHxV+1K3aDa0FegM9gHbAV+Usi+ZroBN+uK5X5QrmWq69UuWtRURERGqwjAvNgRH4ExuSblj+sK0k4ufcBmgFfIoffo8oZ1l5duNP4ziqyhXcVuUtRURERGq4jAzNnuetAf6c7P288tkrCzbs2NApYQ0a8AHQgL1TLKItK6sdsBzIAp4DVlV6zxMt1+ZUeisRERGR/URGhubA/cCKZDW+23bvHvXuqGYJbdQBfYEWwLIKlpXlATnAAcDRwNJK7XUL8NtK1yoiIiKyH8nY0Ox53g7gxmS1/0jkkbd3Fu9sl7AG57D3o1m+xw/A0ZaVZwPQDH802iq155GWa2sqtYWIiIjIfiZjQzOA53nTgYcS3e6Woi2bn17+9LEJbfREYDH+BfOK8adcRFsWzfdAY+AQ4B2gbdx7/Y/l2iNVrllERERkP1En7ALSwG3415Y4LVEN/m7B7941LGHtAf6c5aviWBbNAewN1DfEvcdVwC/jXltERERkP5bRI80AnuftAgbiX3+i2tZsXfPlW1+9Vf0PMgnXDmCA5dp3YRciIiIikg4yPjQDeJ63FrgIKKpuW0Pzh67CHwOuyW6yXHs37CJERERE0oVCc8DzvLn4kxcqd5pcKQvXLVz6yaZPqv9BJuH6k+XaY2EXISIiIpJOFJpL8TzvSeA6qhicb5p3UxH+ReBqqj9brt0ZdhEiIiIi6UahuQzP854ABuNfjyJuE1dOnF+4szA7OVWlxF8s124PuwgRERGRdKTQHIXneU/jX5didzzrFxUXFf3p/T+1SGpRyfWw5dotYRchIiIikq4Umsvhed6zwCXA1ljr/mXJX/KLiovaJL2oxCsCfmO5pk/8ExEREamAQnMFPM+bBHQGIuWtU7izsHDCxxOOT11VCfMlcLrl2qNhFyIiIiKS7hSaY/A870PgZGBctMdvm3/bIsOapbaqassDOlmuvR12ISIiIiI1gUJzHDzP2+553jXA5cCWkuWfbf7s8/xv8ruEV1mlFQH3AT0t19aGXYyIiIhITaHQXAme500AjgNeABiaP/QLoH6oRcVvFtDRcu0uy7W4TnAUEREREZ9CcyV5nve553mXfLr50+6fbfmsdtj1xGEVMNBy7TTLtaUVreicy6vKDpxzHZ1zHauybZS2mjrn/uece8M592/nXL1EtCsiIiJSHQrNVXRu13PnWK79DBhABScKhmg5MAw41nLtxSTvq2PwlQiXA381s7OAr4GzE9SuiIiISJXVCbuAms5ybTIw2Y1ypwDXAhcBDUIqpxj4LzAamGa5FvWTDZ1zBwBPA0cA3wEXm9m2MuvcA+SZWZ5z7hfB4heAiUAW8C1+X+8FLgi2udLMznTONQSeAQ4FlpjZ0ODxPGAB0MHMekerzcz+VuruIcA38XdfREREJDkUmhMkuBLF226U+w1wBXA1iRt9jSUCvAo8Ybm2Mo71rwXeN7NLnHO/BDygII7tjgOKzayHc64f0NjM7nDOLQMws6dLtR8xs3uccy855zqY2WKgC/Comd0aa0fOua7AQWY2L466RERERJJKoTnBLNe+A8YAY9wo1wI4vdTXTxK0mw3ANOB14A3LtTWV3L49MDm4/XQc6zcAtgPvAhHn3BvACmBqOesfA3RzzuUABwKHA4vxg/RLsXbmnGuGP1r+8zhqExEREUk6heYkCi7r9nzwhRvlDgdOBFoDRwbfS243AGrjzzPfAXwPbAZW4gfUj4PvK4CVlmvF1SjtI/wPbXkTuBN/CsQTZdbZiT89Avx5xf8GsoG3zexO59wEoHvQxnagOYBzzgHLgAIze8o5dy6wOmhnCzEEJ/5NBO4ws8+q3EMRERGRBFJoTqFgRLiyo8LJ8ATwr2CO8bf4J9+V9QrwN+fcmcE6AJ8CDzjnRuCH+oXB8mnAi865y4E7gvafCqZ+bAIuq0RtVwOdgBHBfv5uZi9UYnsRERGRhFNozkBmth24OMrynFK3I0CPKJvvcwKfmW0AepZZXGH7FdT2d+DvsdYTERERSSWFZglNlOtCF5rZ+WHUIiIiIlIRhWYJTTwjzyIiIiLpQB9uIiIiIiISg0KziIiIiEgMCs0iIiIiIjEoNIuIiIiIxKDQLCIiIiISg0KziIiIiEgMCs0iIiIiIjEoNIuIiIiIxKDQLCIiIiISg0KziIiIiEgMCs0iIiIikjBr166lqKgo7DISTqFZRERERPa4+uqr6dq1K/fdd1/Ux1etWkXfvn3p3r07N998MwBjxoyhc+fObN26lddff526deumsuQKJSrEKzSLiIiICAAvvfQSu3fvZu7cuaxcuZIVK1bss87tt9/OXXfdxezZs/niiy/Iy8tj0aJFDBkyhAULFtCoUaOE1hQrxJe48cYbmTJlCpCcEK/QLCIiIiIA5OXlcfHFFwNw1llnMWfOnH3WWb58OZ06dQLg0EMPpbCwEDOjqKiIN954gz59+iSsnnhCPMDs2bP5+uuvOe+88wCSEuIVmkVEREQEgK1bt3L44YcD0KxZM9auXbvPOgMGDGDUqFFMmTKFqVOncuaZZ3LWWWfx6quvcsQRR9CvXz9mzpyZkHriCfFFRUVcc801tGnThpdffhkgKSFeoVlEREREAGjcuDHbt28HYMuWLRQXF++zzsiRI+nTpw/jxo1j0KBBNG7cmIEDB3LPPfdw4IEH0rdvXyZPnpyQeuIJ8c888wzHHXcct912GwUFBYwePTopIV6hWUREREQAOPHEE/eM5r7//vu0adMm6nodO3Zk9erVDB8+fM+yFStW0K5dO+rXrx81bFdFPCH+vffe49prr6Vly5ZcccUVzJw5MykhXqFZRERERADo378/48ePZ/jw4bz44ov89Kc/ZeTIkfus9+CDDzJ8+HAaNmwIwKZNm2jZsiXHHXccjz/+OD179kxIPfGE+KOOOoqVK1cCsHDhQlq3bg0kPsTXqXYLIiIiIrJfyMrKIi8vj2nTpnHbbbfRsmVLsrOz91lv1KhR+2zXq1cvwD8JL1H69+9P9+7d+fLLL/nf//7H888/z8iRI39wJY2rr76awYMH8/zzz1NUVMSkSZN+EOKvu+467r777mrXotAsIiIiInscdNBBe06+C1s8Ib5JkyZMnDhxn20THeIVmkVEREQkbaVLiNecZhERERGRGBSaRURERERi0PQMERERkQzV5nevpXR/n97ft9zH0qmWaJyZJakUkfA559YBn4Vdh1RaazM7JJU71LFSY+lYkXik/DgBHSs1VLnHikKziIhIkjnnHFDbzHaV83gtADOr9sVknXP1zWxHddsJ2/7SD9l/aHqGiIhI8h0OTHTOlYTAOkAH4N3gfm3gfudcK2C2mX0A4JxrARwFnADsMLMnguULgEKgLXAL0AQ4HjgOeA0Ym4pOVcQ5dz0x+lIT+iFSQqFZRCSNOOea4b+dewjwFnCFma1wzvUDLgAGA48AnYBvgEvMbGeUdp4GGprZxc6554HvgV8B44O2PwGuBsYBX5rZnc65e0o1MRBYG9y+CfgjcIuZLXXO/RWYZ2YvJrTz+zEz+wLoWnLfOXcL8JqZ/an0es65o4Hxwc/4MuA7oDVwF/BxqVVXBb/be4AtwGr8Y6VXUjtSOTOI3Zea0A8RQFfPEEka51wz59xm59wBzrn5zrmfBMv7Oeeecr5HnXNznHMvOefqldNOY+fcv4P1/uWcq+Oce9o5955zbq5zbqJzrq5zLs85N885tyAIWATL5gbf84Jl251zbwXbDwqWXeScezvY9pQU/Ygkul7AAUAPYGpwH+BM4PXgexszOxWIABdV0FZ2me/DgBXBtvWBkgufXuOcO6DMtn8ws5zgaxFwPzDcOZcFdAcmVbWDmc451xrIBaYFt0uW1wI2AacCk4H+wXqfmtls4OuSaRxAG+fcdOAqYJ2ZLQTSZipDJfqS1v1IJudco+C5/S3n3Hjn3Cjn3Iclz9fOuY7OuT8454YE6+c759qV01Zd59yU4Hl8cNj1BI8f65x7OexanHNHBtvMcM497pxzVakJFJpFkilR4ae8oDPMzLrij870DJYNCB7/l3Pu4GDZRSXhJ7i/xsxOww8+dwXL7gNygMvxRxUlPGfjvyV9Nv5xUvK7PR2Yhv97yguWjQZmVtDWTudcc6AouP8zYFZwew7QObgdwf/dl8vMZgFtgLuBsYmYe5uJghcdzwIL8P8Hj3fOnR883Bb/uaJ/sM504P8BZwXBcjpwcfBP/1LgI+AM4Ezn3N+AE5xzjwXBoNwwkyLx9GUg6d+PZLoSmBs8H+8ATmLfF6t/BYY453rg/x/4pJy2hgHvmNkpwADnXJMw6wl+bw8CTatQR0JrAa4DbjCzM4BW+NN/qkShWSR5EhV+ygs6JScXNQb2vD1vZquAAqBLjPoaAruD21vw36pfZWYVjVxK8nXFfxFzJjAf6OicOwLYZmbf4k+t2OScuxKYAlxYQVvv4weT94P7TYCtwe1tQFZweyz+P5bSRpQa1akdLHsI/0XZs1XtXCZzzh0CvAr8AVgKFOP//u51zp1jZh/jv8iuZWYDzexM4ArgDTPraWZnmtnzgAPGBM2OB/6LP/WmKfBPM7u2ggCREvH0BXiBNO9Hkq0BLnDO/cTMhuC/kPqB4G9+JvBP/OOmPDlAyXSpWfghM8x6NgM/r0INCa/FzEaY2YfB3ebA+qoWpdAskjyJCj/lBZ3RwKf4805nlNnmW+DA4PbEIPiMDu4f7pybjX8C0q+DZefgz5Fd7JzrjITCOdcBOBh/6kMb4EfAO8DtwBvBaoVAEzMbD9zD3t9zNO8Cv2DvyWab8F9kATQK7gN8jT/al1Nq29KjOiUvrj4AlptZEVIpzrn2+C+W7zGz/5UsN7P1+KOxY5xzzcxsE/7o2THOueX4gaC9c266c+5/wTbF7H3Be6WZLcWfBvEucEnqelWxWH2pKf1IFjObgv9C9CXn3KP4J4NGe7H6Jn5eW1FBc43wgybABqBFmPWY2TfVufJJgn82ADjnBgIfmNmXVa1LoVkkCRIcfsoLOsOAvwOf2L7XjmyG/8QJe6dnDAvur8EfAVoPvO+cqw8cYmbX4p+sM76y/ZWE6Q38MZhK82hwfypwffAd4O1gOeydq1yed/HfmSgJzfPZG4y7478jUeIh4LSqly4xrADON7OyL3Axs5VARzPb4Jw7FfgK/+8338zOMLMuZtYTKHveQx3gXOdcR+A2/FHaTs6/AkfoKtGXtO5Hsjj/PJepQEf8QZQriP5idTj+/48LKmhuC9AguN2YKuS7BNdTLYmuxTlXcnWWak0/VGgWSY5Ehp+Kgs4/gKtLverGOXck/qjx3PIaDEL2WPx/TnXxR6NrA8sAXbw9PL3Z+67BDPZO7dmCfxwAvAKscs7NBc6K0d6nwHL2frjCGKCtcy4f2A5MLFnRzN7Dv1pHidKjOgOr3CMBwMx2m1npD7mohT/NouTxTc65usDDwAP4f4dnBaOy04N5wLUAnHPX4F+6bQf+uwRDgEVmthg/dL5Q6pyGUMTTl5rQjyQbAlwQBMAI/jkwP+Cc64I/1eHXwO0VnMT2Dv4oPfj/Tz4NuZ7qSlgtzrmDgOeAwWZWWJ2i9OEmIkkQ/FO4xcwWOedOB4biv8JdAhxsZruDP/C/4b+S3gbMNLP7orTVGH/0twX+aFXJZcLGmdmc4K2rOcCN+E8su4G7zexN518xoz57z0a/Dv8yV0c5/2oJ7+LPfbsG/0SwYuBBM5uc8B+KiOzhnHsS/294bqll7YDhZjbUOdcSuN/MflHq8Vlm1sOV+tAP51/tpo+ZjSy13vnAWjObl6r+lBVPX4Be6d6PZHLO/Qj//ACH/87jMuA89l7q8e/AL4ERZvaOc+4J/Ofv/0RpqzX+nPDpQDegS6nR2JTXU6rNPNt7EnootTjnHsC/OsuyYFGumb1Vdr246lJoFhGp2YIXR6UVmtn50daVmiF4UV2vOvNC08X+1Jd0FgTNU4HXqzuiKtEpNIukEYUfEREpTzBq/3yZxcvMrOzVbzKunlTUotAsIiIiIhKDTgQUEREREYlBoVlEREREJAaFZhERERGRGBSaRURERERiUGgWEREREYnh/wPLUAx2pqMsdgAAAABJRU5ErkJggg==\n", 300 | "text/plain": [ 301 | "
" 302 | ] 303 | }, 304 | "metadata": {}, 305 | "output_type": "display_data" 306 | } 307 | ], 308 | "source": [ 309 | "# 可视化图形展示\n", 310 | "# part 1 全局配置\n", 311 | "fig = plt.figure(figsize=(10, 7))\n", 312 | "titles = ['RECORD_RATE','AVG_ORDERS','AVG_MONEY','IS_ACTIVE','SEX'] # 共用标题\n", 313 | "line_index,col_index = 3,5 # 定义网格数\n", 314 | "ax_ids = np.arange(1,16).reshape(line_index,col_index) # 生成子网格索引值\n", 315 | "plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签\n", 316 | " \n", 317 | "# part 2 画出三个类别的占比\n", 318 | "pie_fracs = features_all['record_rate'].tolist()\n", 319 | "for ind in range(len(pie_fracs)):\n", 320 | " ax = fig.add_subplot(line_index, col_index, ax_ids[:,0][ind])\n", 321 | " init_labels = ['','',''] # 初始化空label标签\n", 322 | " init_labels[ind] = 'cluster_{0}'.format(ind) # 设置标签\n", 323 | " init_colors = ['lightgray', 'lightgray', 'lightgray']\n", 324 | " init_colors[ind] = 'g' # 设置目标面积区别颜色\n", 325 | " ax.pie(x=pie_fracs, autopct='%3.0f %%',labels=init_labels,colors=init_colors)\n", 326 | " ax.set_aspect('equal') # 设置饼图为圆形\n", 327 | " if ind == 0:\n", 328 | " ax.set_title(titles[0])\n", 329 | " \n", 330 | "# part 3 画出AVG_ORDERS均值\n", 331 | "avg_orders_label = 'AVG_ORDERS'\n", 332 | "avg_orders_fraces = features_all[avg_orders_label]\n", 333 | "for ind, frace in enumerate(avg_orders_fraces):\n", 334 | " ax = fig.add_subplot(line_index, col_index, ax_ids[:,1][ind])\n", 335 | " ax.bar(x=unique_labels,height=[0,avg_orders_fraces[ind],0])# 画出柱形图\n", 336 | " ax.set_ylim((0, max(avg_orders_fraces)*1.2))\n", 337 | " ax.set_xticks([])\n", 338 | " ax.set_yticks([])\n", 339 | " if ind == 0:# 设置总标题\n", 340 | " ax.set_title(titles[1])\n", 341 | " # 设置每个柱形图的数值标签和x轴label\n", 342 | " ax.text(unique_labels[1],frace+0.4,s='{:.2f}'.format(frace),ha='center',va='top')\n", 343 | " ax.text(unique_labels[1],-0.4,s=avg_orders_label,ha='center',va='bottom')\n", 344 | " \n", 345 | "# part 4 画出AVG_MONEY均值\n", 346 | "avg_money_label = 'AVG_MONEY'\n", 347 | "avg_money_fraces = features_all[avg_money_label]\n", 348 | "for ind, frace in enumerate(avg_money_fraces):\n", 349 | " ax = fig.add_subplot(line_index, col_index, ax_ids[:,2][ind])\n", 350 | " ax.bar(x=unique_labels,height=[0,avg_money_fraces[ind],0])# 画出柱形图\n", 351 | " ax.set_ylim((0, max(avg_money_fraces)*1.2))\n", 352 | " ax.set_xticks([])\n", 353 | " ax.set_yticks([])\n", 354 | " if ind == 0:# 设置总标题\n", 355 | " ax.set_title(titles[2])\n", 356 | " # 设置每个柱形图的数值标签和x轴label\n", 357 | " ax.text(unique_labels[1],frace+4,s='{:.0f}'.format(frace),ha='center',va='top')\n", 358 | " ax.text(unique_labels[1],-4,s=avg_money_label,ha='center',va='bottom')\n", 359 | " \n", 360 | "# part 5 画出是否活跃\n", 361 | "axtivity_labels = ['不活跃','活跃']\n", 362 | "x_ticket = [i for i in range(len(axtivity_labels))]\n", 363 | "activity_data = features_all[axtivity_labels]\n", 364 | "ylim_max = np.max(np.max(activity_data))\n", 365 | "for ind,each_data in enumerate(activity_data.values):\n", 366 | " ax = fig.add_subplot(line_index, col_index, ax_ids[:,3][ind])\n", 367 | " ax.bar(x=x_ticket,height=each_data) # 画出柱形图\n", 368 | " ax.set_ylim((0, ylim_max*1.2))\n", 369 | " ax.set_xticks([])\n", 370 | " ax.set_yticks([]) \n", 371 | " if ind == 0:# 设置总标题\n", 372 | " ax.set_title(titles[3])\n", 373 | " # 设置每个柱形图的数值标签和x轴label\n", 374 | " activity_values = ['{:.1%}'.format(i) for i in each_data]\n", 375 | " for i in range(len(x_ticket)):\n", 376 | " ax.text(x_ticket[i],each_data[i]+0.05,s=activity_values[i],ha='center',va='top')\n", 377 | " ax.text(x_ticket[i],-0.05,s=axtivity_labels[i],ha='center',va='bottom')\n", 378 | " \n", 379 | "# part 6 画出性别分布\n", 380 | "sex_data = features_all.iloc[:,-3:]\n", 381 | "x_ticket = [i for i in range(len(sex_data))]\n", 382 | "sex_labels = ['SEX_{}'.format(i) for i in range(3)]\n", 383 | "ylim_max = np.max(np.max(sex_data))\n", 384 | "for ind,each_data in enumerate(sex_data.values):\n", 385 | " ax = fig.add_subplot(line_index, col_index, ax_ids[:,4][ind])\n", 386 | " ax.bar(x=x_ticket,height=each_data) # 画柱形图\n", 387 | " ax.set_ylim((0, ylim_max*1.2))\n", 388 | " ax.set_xticks([])\n", 389 | " ax.set_yticks([])\n", 390 | " if ind == 0: # 设置标题\n", 391 | " ax.set_title(titles[4]) \n", 392 | " # 设置每个柱形图的数值标签和x轴label\n", 393 | " sex_values = ['{:.1%}'.format(i) for i in each_data]\n", 394 | " for i in range(len(x_ticket)):\n", 395 | " ax.text(x_ticket[i],each_data[i]+0.1,s=sex_values[i],ha='center',va='top')\n", 396 | " ax.text(x_ticket[i],-0.1,s=sex_labels[i],ha='center',va='bottom')\n", 397 | " \n", 398 | "plt.tight_layout(pad=0.8) #设置默认的间距" 399 | ] 400 | }, 401 | { 402 | "cell_type": "markdown", 403 | "metadata": {}, 404 | "source": [ 405 | "# 结论\n", 406 | "\n", 407 | "聚类后,群体划分为3类:\n", 408 | "- cluster_0:显著和区分性特征是平均订单量少(仅为2.02),男性为主的客户群体;\n", 409 | "- cluster_1:平均订单量多(3.99),女性为主的客户\n", 410 | "- cluster_2:与cluster_1类似,但群体属于未知性别。\n", 411 | "\n", 412 | "鉴于平均订单价值和活跃程度在所有类别中的分布相对意志和均匀,无法达到区分的特性,也不具有表示该群体的显著性特征。因此忽略。\n", 413 | "\n", 414 | "最后,我们得到3类群体:**低价值的男性客户群体、高价值的女性客户群体以及高价值的未知性别客户群体。**\n", 415 | "\n", 416 | "**衍生的分析方向**:\n", 417 | "- 未知性别群体不应该有如此高的平均订单价值,更重要的是其样本量并不少。那么不太可能是随机发生的事件,很可能在某些方面,例如数据采集、客户体验、客户注册等方面存在某些问题,或者这类客户群体就是不愿意透露性别。可作为另一个EDA课题的开始\n", 418 | "- 第二类高价值的女性客户群体,可做用户喜欢和特征分析,例如看一下她们都是什么事件购买、客单价平均多少、集中品类、折扣力度喜欢、来源渠道、促销方式等是否有明显的集中化倾向。" 419 | ] 420 | } 421 | ], 422 | "metadata": { 423 | "kernelspec": { 424 | "display_name": "Python 3", 425 | "language": "python", 426 | "name": "python3" 427 | }, 428 | "language_info": { 429 | "codemirror_mode": { 430 | "name": "ipython", 431 | "version": 3 432 | }, 433 | "file_extension": ".py", 434 | "mimetype": "text/x-python", 435 | "name": "python", 436 | "nbconvert_exporter": "python", 437 | "pygments_lexer": "ipython3", 438 | "version": "3.6.8" 439 | } 440 | }, 441 | "nbformat": 4, 442 | "nbformat_minor": 2 443 | } 444 | --------------------------------------------------------------------------------