├── README.md
├── bank_telemarketing_data_analysis.ipynb
├── images
├── Random Forest.png
├── jaccard系数.png
├── 余弦相似性.png
├── 曼哈顿距离.png
└── 欧式距离.png
├── 关联分析
└── README.md
├── 分类算法
├── README.md
└── 用户流失预测分析与应用.ipynb
├── 回归分析
├── README.md
└── 大型促销活动前的销售预测.ipynb
└── 聚类分析
├── README.md
└── 客户特征的聚类与探索性分析.ipynb
/README.md:
--------------------------------------------------------------------------------
1 | # 机器学习(Machine Learning)
2 | - 监督学习(Supervised Learning):训练样本带有信息标记(**y值**),利用已有的训练样本信息学习数据的规律预测未知的新样本标签。
3 | - 回归分析(Regression)
4 | - 分类(Classification)
5 | - 无监督学习(Unsupervised Learning):训练样本的标记信息时未知的,目测是为了揭露训练样本的内在数学,结构和信息,为进一步的数据挖掘提供基础。
6 | - 聚类(Clustering)
7 | ## 1 回归
8 | 回归是研究自变量x对因变量y影响的一种数据分析方法。主要应用场景是**进行预测和空值**,例如,计划制定、KPI制定、目标制定等;也可以基于预测的数据与实际数据进行比对和分析,确定事件发展程度并给未来行动提供方向性指导。
9 |
10 | 回归分析可应用于分析自变量和因变量的影响关系(已知x,求y),也可以分析自变量对因变量的影响方向(正向or反向影响)。
11 |
12 | **常用的回归算法包括**:
13 | - 线性回归
14 | - 二项式回归
15 | - 对数回归
16 | - 指数回归
17 | - 核SVM
18 | - 岭回归
19 | - Lasso
20 |
21 | 优点:
22 | - 数据模式和结果便于理解,如线性回归用y=ax+b的形式表达
23 | - 基于函数公式的业务应用中,可直接代入法求解,应用起来容易。
24 |
25 | 缺点:
26 | - 只能分析少量变量间相互关系,无法处理海量变量间的相互作用关系,尤其是变量共同因素对因变量的影响程度。
27 |
28 | ### 1.1 注意回归变量之间的共线性问题
29 | 检验共线性的三个指标:
30 | - 容忍度:[0,1],每个自变量(x)作为因变量(y)进行回归建模得到的残差比例。值越小,说明共线性问题的可能性越大。
31 | - 方差膨胀因子:容忍度的倒数,值越大则共线性问题越明显,通常以10作为判断边界。VIF<10,不存在多重共线性;10<=VIF<=100,存在较强的多重共线性;VIF>=100,存在严重多重共线性
32 | - 特征值:对自变量进行主成分分析,如果多个维度的特征值等于0,则可能存在比较严重的共线性。
33 | - 相关系数:R>0.8:可能存在较强的相关性。
34 |
35 | **解决共线性的5种常用方法**:
36 | - 增大样本量
37 | - 岭回归法
38 | - 逐步回归法
39 | - 主成分回归
40 | - 人工去除
41 |
42 | ### 1.2 相关系数、判定系数和回归系数之间的关系
43 | 假设一回归方程:y = 42.738x + 169.94,其中R的平方 = 0.5252,如果对这两个变量作相关性分析,还会得到相关系数R=0.72468551874050.
44 |
45 | 回归系数:42.738,自变量x的**回归系数**;0.5252是该方程的**判定系数**;0.724....是两个变量的**相关性系数**。
46 | - 判定系数:自变量对因变量的方差解释程度的值;计算公式为:回归平方和与总离差平方和之比值
47 | - 相关系数:又称为解释系数,是衡量变量间的相关程度或密切程度的值,本质是线性相关性的判断。
48 |
49 | 三者间的关系:
50 | - 判定系数是**所有参与模型中自变量的对因变量联合影响程度**,而非某个自变量的影响程度。
51 | - 回归系数与相关系数的关系:回归系数>0,相关系数取值为(0,1)。说明两者正相关;如果系数小于0,相关系数取值为(-1,0),说明两者负相关。
52 |
53 | ## 2 分类算法
54 | - 一种对**离散型随机变量**建模或预测的监督学习算法。
55 | - 使用案例包括邮件过滤、金融欺诈和预测雇员异动等输出为类别的任务。
56 | - 分类算法通常适用于预测一个类别(或类别的概率)而不是连续的数值。
57 |
58 | ### 2.1 分类算法的应用
59 | - 预测
60 | - 提炼应用规则
61 | - 提取变量特征
62 | - 处理缺失值
63 |
64 | ### 2.2 (基础)决策树 Decision Tree
65 | 决策树是一个树结构(可以是二叉树或非二叉树)。
66 |
67 | 其每个非叶节点表示一个**特征属性**上的测试,每个分支代表这个特征属性在某个值域上的输出,而每个叶节点存放一个类别。
68 |
69 | 使用决策树进行决策的过程就是从**根节点开始**,测试待分类项中相应的特征属性,并按照其值选择输出分支,知道到达叶子节点,将**叶子节点**存放的类别作为决策结果。
70 |
71 | **优点:**
72 | - 适用任何类型的数据(类别变量更普遍)
73 | - 直观、决策树可以提供可视化,便于理解
74 | - 模型预测出的结果简单,可解释性强
75 | - 适用于小规模数据
76 |
77 | **缺点:**
78 | - 当数据中存在连续变量的属性时,决策树表现并不是很好
79 | - 不稳定性,一点点的扰动或者改动都可能改动整棵树
80 | - 特殊属性增加时,错误增加的比较快
81 | - 很容易在训练数据中生成复杂的树结构,造成过拟合。
82 |
83 | ### 2.3 随机森林 Random Forest
84 | 
85 |
86 | **优点:**
87 | - 随机森林不容易限于过拟合
88 | - 具有很好的抗噪声能力
89 | - 处理很高维度(feature多)的数据,并且不用做特征选择
90 | - 训练速度快
91 |
92 | ## 3 聚类
93 | - 一种无监督式机器学习(即**数据没有标注**)
94 | - 算法基于数据的内部结构寻找观察样本的自然族群(即集群)
95 | - 使用案例包括客户细分,新闻聚类,文章推荐等等。
96 |
97 | **用于衡量相似性的几个指标**:
98 | - 欧式距离 Euclidean distance
99 | - 定义:指在m维空间中两个点之间的真实距离,或者向量的自然长度(即该点到原点的距离)
100 | - 用途:
101 |
102 | 
103 |
104 | - 曼哈顿距离 Manhattan distance
105 | - 定义:就是表示两个点在标准坐标系上的绝对轴距之和。
106 | - 用途:
107 |
108 | 
109 |
110 | - 余弦相似性 cosine
111 | - 定义:通过计算两个向量的夹角余弦值来评估他们的相似度。
112 | - 用途:新闻分类
113 |
114 | 
115 |
116 | - Jaccard系数
117 | - 定义:给定两个集合A,B,Jaccard 系数定义为A与B交集的大小与A与B并集的大小的比值。
118 | - 用途:用于比较有限样本集之间的相似性与差异性。Jaccard系数值越大,样本相似度越高。
119 |
120 | 
121 |
122 | ### 3.1 层次聚类 Hierarchical Cluster Analysis(HCA)
123 | 层次聚类是一系列基于以下概念的聚类算法:
124 | - 最开始由一个数据点作为一个集群
125 | - 对于每个集群,基于相同的标准合并集群
126 | - 重复这一过程直到只留下一个集群,因此就得到了集群的层次结构。
127 |
128 | ### 3.2 K均值聚类 K-means Clustering Algorithm
129 | - 聚类的度量基于样本点之间的几何距离(即在坐标平面中的距离)
130 | - 集群是围绕在聚类中心的族群,而集群呈现出类球状并具有相似的大小
131 | - 对于给定的k值,算法先给出一个初始的分组方法,然后通过反复迭代的方法改变分组,使得每一次改进之后的分组方案较前一次好
132 |
133 | ### 3.3 DBSCAN
134 | - 基于密度的算法,它将样本点的密度区域组成一个集群
135 | - DBSCAN不需要假设集群为球状,并且它的性能是可拓展的
136 | - 不需要每个点都被分配到一个集群中,这降低了集群的异常数据。
137 |
138 |
139 |
--------------------------------------------------------------------------------
/bank_telemarketing_data_analysis.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Pyhton数据分析:银行电话营销数据分析"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "## 一、前言\n",
15 | "### 项目介绍:\n",
16 | "在我们的日常生活中,银行为我们的财产提供了基本的安全保障,便利了我们的生活,而我们在银行的一些记录信息,方便了银行对我们进行一些行为预测,本项目则根据客户的以往记录信息,预测客户是否办理存款业务。\n"
17 | ]
18 | },
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {},
22 | "source": [
23 | "### 数据集分析\n",
24 | "该项目的数据集对应的任务是「分类任务」,本数据集共包含25317行,18列数据,其中字段为y的列是标签列,包含0和1两个值,0表示不订购业务,1表示订购业务,其他列是特征列\n",
25 | "### 字段的描述\n",
26 | "ID:客户唯一标识
\n",
27 | "age:客户年龄
\n",
28 | "job:客户的职业
\n",
29 | "marital:婚姻状况
\n",
30 | "education:受教育水平
\n",
31 | "default:是否有违约记录
\n",
32 | "balance:每年账户的平均余额
\n",
33 | "housing:是否有住房贷款
\n",
34 | "loan:是否有个人贷款
\n",
35 | "contact:与客户联系的沟通方式
\n",
36 | "day:最后一次联系的时间(几号)
\n",
37 | "month:最后一次联系的时间(月份)
\n",
38 | "duration:最后一次联系的交流时长
\n",
39 | "campaign:在本次活动中,与该客户交流过的次数
\n",
40 | "pdays:距离上次活动最后一次联系该客户,过去了多久
\n",
41 | "previous:在本次活动之前,与该客户交流过的次数
\n",
42 | "poutcome:上一次活动的结果
\n",
43 | "y:客户是否会订购定期存款业务,0表示不订购,1表示订购
"
44 | ]
45 | },
46 | {
47 | "cell_type": "markdown",
48 | "metadata": {},
49 | "source": [
50 | "## 二、提出问题\n",
51 | "* 哪个分类模型更适合预测客户是否订购定期存款业务?\n",
52 | "\n",
53 | "### 分析流程\n",
54 | "* 查看数据\n",
55 | "* 特征处理\n",
56 | "* 选择模型\n",
57 | "* 数据归一化"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "metadata": {},
63 | "source": [
64 | "## 三、探索性数据分析\n",
65 | "### 导入必备的库"
66 | ]
67 | },
68 | {
69 | "cell_type": "code",
70 | "execution_count": 1,
71 | "metadata": {},
72 | "outputs": [],
73 | "source": [
74 | "#Basic library\n",
75 | "import pandas as pd\n",
76 | "import numpy as np\n",
77 | "import matplotlib.pyplot as plt\n",
78 | "import seaborn as sns\n",
79 | "\n",
80 | "#machine learning\n",
81 | "from sklearn.model_selection import train_test_split\n",
82 | "from sklearn.model_selection import cross_val_score\n",
83 | "from sklearn.metrics import roc_auc_score\n",
84 | "from sklearn.preprocessing import LabelEncoder, MinMaxScaler\n",
85 | "\n",
86 | "#Model\n",
87 | "from sklearn.neighbors import KNeighborsClassifier\n",
88 | "from sklearn.linear_model import LogisticRegression\n",
89 | "from sklearn.tree import DecisionTreeClassifier\n",
90 | "\n",
91 | "#igonore warnings\n",
92 | "import warnings\n",
93 | "warnings.filterwarnings('ignore')"
94 | ]
95 | },
96 | {
97 | "cell_type": "markdown",
98 | "metadata": {},
99 | "source": [
100 | "### 查看数据"
101 | ]
102 | },
103 | {
104 | "cell_type": "code",
105 | "execution_count": 2,
106 | "metadata": {
107 | "scrolled": true
108 | },
109 | "outputs": [
110 | {
111 | "data": {
112 | "text/html": [
113 | "
\n",
114 | "\n",
127 | "
\n",
128 | " \n",
129 | " \n",
130 | " | \n",
131 | " age | \n",
132 | " job | \n",
133 | " marital | \n",
134 | " education | \n",
135 | " default | \n",
136 | " balance | \n",
137 | " housing | \n",
138 | " loan | \n",
139 | " contact | \n",
140 | " day | \n",
141 | " month | \n",
142 | " duration | \n",
143 | " campaign | \n",
144 | " pdays | \n",
145 | " previous | \n",
146 | " poutcome | \n",
147 | " y | \n",
148 | "
\n",
149 | " \n",
150 | " ID | \n",
151 | " | \n",
152 | " | \n",
153 | " | \n",
154 | " | \n",
155 | " | \n",
156 | " | \n",
157 | " | \n",
158 | " | \n",
159 | " | \n",
160 | " | \n",
161 | " | \n",
162 | " | \n",
163 | " | \n",
164 | " | \n",
165 | " | \n",
166 | " | \n",
167 | " | \n",
168 | "
\n",
169 | " \n",
170 | " \n",
171 | " \n",
172 | " 1 | \n",
173 | " 43 | \n",
174 | " management | \n",
175 | " married | \n",
176 | " tertiary | \n",
177 | " no | \n",
178 | " 291 | \n",
179 | " yes | \n",
180 | " no | \n",
181 | " unknown | \n",
182 | " 9 | \n",
183 | " may | \n",
184 | " 150 | \n",
185 | " 2 | \n",
186 | " -1 | \n",
187 | " 0 | \n",
188 | " unknown | \n",
189 | " 0 | \n",
190 | "
\n",
191 | " \n",
192 | " 2 | \n",
193 | " 42 | \n",
194 | " technician | \n",
195 | " divorced | \n",
196 | " primary | \n",
197 | " no | \n",
198 | " 5076 | \n",
199 | " yes | \n",
200 | " no | \n",
201 | " cellular | \n",
202 | " 7 | \n",
203 | " apr | \n",
204 | " 99 | \n",
205 | " 1 | \n",
206 | " 251 | \n",
207 | " 2 | \n",
208 | " other | \n",
209 | " 0 | \n",
210 | "
\n",
211 | " \n",
212 | " 3 | \n",
213 | " 47 | \n",
214 | " admin. | \n",
215 | " married | \n",
216 | " secondary | \n",
217 | " no | \n",
218 | " 104 | \n",
219 | " yes | \n",
220 | " yes | \n",
221 | " cellular | \n",
222 | " 14 | \n",
223 | " jul | \n",
224 | " 77 | \n",
225 | " 2 | \n",
226 | " -1 | \n",
227 | " 0 | \n",
228 | " unknown | \n",
229 | " 0 | \n",
230 | "
\n",
231 | " \n",
232 | " 4 | \n",
233 | " 28 | \n",
234 | " management | \n",
235 | " single | \n",
236 | " secondary | \n",
237 | " no | \n",
238 | " -994 | \n",
239 | " yes | \n",
240 | " yes | \n",
241 | " cellular | \n",
242 | " 18 | \n",
243 | " jul | \n",
244 | " 174 | \n",
245 | " 2 | \n",
246 | " -1 | \n",
247 | " 0 | \n",
248 | " unknown | \n",
249 | " 0 | \n",
250 | "
\n",
251 | " \n",
252 | " 5 | \n",
253 | " 42 | \n",
254 | " technician | \n",
255 | " divorced | \n",
256 | " secondary | \n",
257 | " no | \n",
258 | " 2974 | \n",
259 | " yes | \n",
260 | " no | \n",
261 | " unknown | \n",
262 | " 21 | \n",
263 | " may | \n",
264 | " 187 | \n",
265 | " 5 | \n",
266 | " -1 | \n",
267 | " 0 | \n",
268 | " unknown | \n",
269 | " 0 | \n",
270 | "
\n",
271 | " \n",
272 | "
\n",
273 | "
"
274 | ],
275 | "text/plain": [
276 | " age job marital education default balance housing loan \\\n",
277 | "ID \n",
278 | "1 43 management married tertiary no 291 yes no \n",
279 | "2 42 technician divorced primary no 5076 yes no \n",
280 | "3 47 admin. married secondary no 104 yes yes \n",
281 | "4 28 management single secondary no -994 yes yes \n",
282 | "5 42 technician divorced secondary no 2974 yes no \n",
283 | "\n",
284 | " contact day month duration campaign pdays previous poutcome y \n",
285 | "ID \n",
286 | "1 unknown 9 may 150 2 -1 0 unknown 0 \n",
287 | "2 cellular 7 apr 99 1 251 2 other 0 \n",
288 | "3 cellular 14 jul 77 2 -1 0 unknown 0 \n",
289 | "4 cellular 18 jul 174 2 -1 0 unknown 0 \n",
290 | "5 unknown 21 may 187 5 -1 0 unknown 0 "
291 | ]
292 | },
293 | "execution_count": 2,
294 | "metadata": {},
295 | "output_type": "execute_result"
296 | }
297 | ],
298 | "source": [
299 | "data_all = pd.read_csv('./dataFile/train_set.csv',index_col='ID')\n",
300 | "data_all.head()"
301 | ]
302 | },
303 | {
304 | "cell_type": "code",
305 | "execution_count": 3,
306 | "metadata": {},
307 | "outputs": [
308 | {
309 | "name": "stdout",
310 | "output_type": "stream",
311 | "text": [
312 | "\n",
313 | "Int64Index: 25317 entries, 1 to 25317\n",
314 | "Data columns (total 17 columns):\n",
315 | "age 25317 non-null int64\n",
316 | "job 25317 non-null object\n",
317 | "marital 25317 non-null object\n",
318 | "education 25317 non-null object\n",
319 | "default 25317 non-null object\n",
320 | "balance 25317 non-null int64\n",
321 | "housing 25317 non-null object\n",
322 | "loan 25317 non-null object\n",
323 | "contact 25317 non-null object\n",
324 | "day 25317 non-null int64\n",
325 | "month 25317 non-null object\n",
326 | "duration 25317 non-null int64\n",
327 | "campaign 25317 non-null int64\n",
328 | "pdays 25317 non-null int64\n",
329 | "previous 25317 non-null int64\n",
330 | "poutcome 25317 non-null object\n",
331 | "y 25317 non-null int64\n",
332 | "dtypes: int64(8), object(9)\n",
333 | "memory usage: 3.5+ MB\n"
334 | ]
335 | }
336 | ],
337 | "source": [
338 | "#查看数据的基本信息\n",
339 | "data_all.info()"
340 | ]
341 | },
342 | {
343 | "cell_type": "code",
344 | "execution_count": 4,
345 | "metadata": {},
346 | "outputs": [
347 | {
348 | "data": {
349 | "text/plain": [
350 | "(25317, 17)"
351 | ]
352 | },
353 | "execution_count": 4,
354 | "metadata": {},
355 | "output_type": "execute_result"
356 | }
357 | ],
358 | "source": [
359 | "# 查看数据data_all的维度\n",
360 | "data_all.shape"
361 | ]
362 | },
363 | {
364 | "cell_type": "code",
365 | "execution_count": 6,
366 | "metadata": {
367 | "scrolled": true
368 | },
369 | "outputs": [
370 | {
371 | "data": {
372 | "text/plain": [
373 | "age False\n",
374 | "job False\n",
375 | "marital False\n",
376 | "education False\n",
377 | "default False\n",
378 | "balance False\n",
379 | "housing False\n",
380 | "loan False\n",
381 | "contact False\n",
382 | "day False\n",
383 | "month False\n",
384 | "duration False\n",
385 | "campaign False\n",
386 | "pdays False\n",
387 | "previous False\n",
388 | "poutcome False\n",
389 | "y False\n",
390 | "dtype: bool"
391 | ]
392 | },
393 | "execution_count": 6,
394 | "metadata": {},
395 | "output_type": "execute_result"
396 | }
397 | ],
398 | "source": [
399 | "# 查看每列数据是否包含缺失值。\n",
400 | "data_all.isnull().any()"
401 | ]
402 | },
403 | {
404 | "cell_type": "markdown",
405 | "metadata": {},
406 | "source": [
407 | "显然,数据集中不包含任何缺失值。"
408 | ]
409 | },
410 | {
411 | "cell_type": "markdown",
412 | "metadata": {},
413 | "source": [
414 | "### 特征处理"
415 | ]
416 | },
417 | {
418 | "cell_type": "code",
419 | "execution_count": 7,
420 | "metadata": {},
421 | "outputs": [
422 | {
423 | "data": {
424 | "text/plain": [
425 | "['job',\n",
426 | " 'marital',\n",
427 | " 'education',\n",
428 | " 'default',\n",
429 | " 'housing',\n",
430 | " 'loan',\n",
431 | " 'contact',\n",
432 | " 'month',\n",
433 | " 'poutcome']"
434 | ]
435 | },
436 | "execution_count": 7,
437 | "metadata": {},
438 | "output_type": "execute_result"
439 | }
440 | ],
441 | "source": [
442 | "# 获得data_all中列的数据类型是object的列的列名。\n",
443 | "data_obj_col = data_all.select_dtypes('object').columns.to_list()\n",
444 | "data_obj_col"
445 | ]
446 | },
447 | {
448 | "cell_type": "code",
449 | "execution_count": 9,
450 | "metadata": {
451 | "scrolled": true
452 | },
453 | "outputs": [
454 | {
455 | "data": {
456 | "text/plain": [
457 | "(25317, 9)"
458 | ]
459 | },
460 | "execution_count": 9,
461 | "metadata": {},
462 | "output_type": "execute_result"
463 | }
464 | ],
465 | "source": [
466 | "# 获得数据集中列的数据类型为object的所有数据,以及打印数据的维度\n",
467 | "data_obj=data_all[data_obj_col]\n",
468 | "data_obj.shape"
469 | ]
470 | },
471 | {
472 | "cell_type": "code",
473 | "execution_count": 10,
474 | "metadata": {},
475 | "outputs": [],
476 | "source": [
477 | "# 依据data_obj_col,获得数据集中列的数据类型为数值型的列的列。\n",
478 | "data_num_col = data_all.columns.difference(data_obj_col)"
479 | ]
480 | },
481 | {
482 | "cell_type": "code",
483 | "execution_count": 11,
484 | "metadata": {},
485 | "outputs": [
486 | {
487 | "data": {
488 | "text/plain": [
489 | "(25317, 8)"
490 | ]
491 | },
492 | "execution_count": 11,
493 | "metadata": {},
494 | "output_type": "execute_result"
495 | }
496 | ],
497 | "source": [
498 | "# 获得数据集中列的数据类型为数值型的所有数据,以及数据的维度\n",
499 | "data_num=data_all[data_num_col]\n",
500 | "data_num.shape"
501 | ]
502 | },
503 | {
504 | "cell_type": "code",
505 | "execution_count": 12,
506 | "metadata": {},
507 | "outputs": [
508 | {
509 | "data": {
510 | "text/plain": [
511 | "['age', 'balance', 'campaign', 'day', 'duration', 'pdays', 'previous', 'y']"
512 | ]
513 | },
514 | "execution_count": 12,
515 | "metadata": {},
516 | "output_type": "execute_result"
517 | }
518 | ],
519 | "source": [
520 | "# 打印data_num的列名\n",
521 | "data_num.columns.to_list()"
522 | ]
523 | },
524 | {
525 | "cell_type": "markdown",
526 | "metadata": {},
527 | "source": [
528 | "从以上输出的数据可知:\n",
529 | "* Object类型的列有9个\n",
530 | "* 数值类型的列有8个,数值类型的列名分别为:'age', 'balance', 'campaign', 'day', 'duration', 'pdays', 'previous','y'"
531 | ]
532 | },
533 | {
534 | "cell_type": "markdown",
535 | "metadata": {},
536 | "source": [
537 | "### 标签编码\n",
538 | "将object类型的列中只有两个值的列进行标签编码,将编码后的列添加到data_num数据集中"
539 | ]
540 | },
541 | {
542 | "cell_type": "code",
543 | "execution_count": 13,
544 | "metadata": {},
545 | "outputs": [
546 | {
547 | "data": {
548 | "text/plain": [
549 | "['default', 'housing', 'loan']"
550 | ]
551 | },
552 | "execution_count": 13,
553 | "metadata": {},
554 | "output_type": "execute_result"
555 | }
556 | ],
557 | "source": [
558 | "# 计算data_obj中每列中的唯一值;然后得到每一列中只有两个值的列名\n",
559 | "two_unique_cols = data_obj.nunique()[data_obj.nunique()==2].index.tolist()\n",
560 | "two_unique_cols"
561 | ]
562 | },
563 | {
564 | "cell_type": "code",
565 | "execution_count": 14,
566 | "metadata": {},
567 | "outputs": [],
568 | "source": [
569 | "# 对列中唯一值只有两个值的列进行标签编码,将标签编码后的数据存到data_num数据集中\n",
570 | "y = data_all[two_unique_cols].apply(LabelEncoder().fit_transform)\n",
571 | "data_num = pd.concat([y, data_num],ignore_index=False, sort=True, axis=1)"
572 | ]
573 | },
574 | {
575 | "cell_type": "code",
576 | "execution_count": 15,
577 | "metadata": {
578 | "scrolled": true
579 | },
580 | "outputs": [
581 | {
582 | "name": "stdout",
583 | "output_type": "stream",
584 | "text": [
585 | "data_num的维度是:(25317, 11)\n"
586 | ]
587 | },
588 | {
589 | "data": {
590 | "text/plain": [
591 | "Index(['default', 'housing', 'loan', 'age', 'balance', 'campaign', 'day',\n",
592 | " 'duration', 'pdays', 'previous', 'y'],\n",
593 | " dtype='object')"
594 | ]
595 | },
596 | "execution_count": 15,
597 | "metadata": {},
598 | "output_type": "execute_result"
599 | }
600 | ],
601 | "source": [
602 | "# 打印data_num的维度和列名\n",
603 | "print('data_num的维度是:{}'.format(data_num.shape))\n",
604 | "data_num.columns"
605 | ]
606 | },
607 | {
608 | "cell_type": "markdown",
609 | "metadata": {},
610 | "source": [
611 | "### 数据抽样\n",
612 | "由于建模的时候,样本不平衡会对模型的训练产生很大的影响,这里将采取简单的方法对数据进行抽样,以使y中每类的样本相对平衡"
613 | ]
614 | },
615 | {
616 | "cell_type": "code",
617 | "execution_count": 16,
618 | "metadata": {
619 | "scrolled": false
620 | },
621 | "outputs": [
622 | {
623 | "data": {
624 | "text/plain": [
625 | ""
626 | ]
627 | },
628 | "execution_count": 16,
629 | "metadata": {},
630 | "output_type": "execute_result"
631 | },
632 | {
633 | "data": {
634 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZUAAAEKCAYAAADaa8itAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAADwtJREFUeJzt3X+s3fVdx/HnCwo6dYRiC8MW7bI0xjqVsYY17h8cSVdItDiHGcmkTpIuC9MtMWboH3YBSaZumrEgSXUddJlD4japSWdtms3F7BcXR/jp0htEuBbphbINJRkpvv3jfu84a0/b09vPud97uM9HcnLO930+3+95f5Ob+8r3+/me70lVIUlSC2f13YAk6dXDUJEkNWOoSJKaMVQkSc0YKpKkZgwVSVIzhookqRlDRZLUjKEiSWpmRd8NLLZVq1bVunXr+m5DkibK/fff/2xVrT7VuGUXKuvWrWNqaqrvNiRpoiT5z1HGefpLktSMoSJJasZQkSQ1Y6hIkpoxVCRJzRgqkqRmDBVJUjOGiiSpGUNFktTMsvtG/Zl68x/s7rsFLUH3//n1fbcgLQkeqUiSmjFUJEnNGCqSpGYMFUlSM4aKJKkZQ0WS1IyhIklqxlCRJDVjqEiSmjFUJEnNGCqSpGYMFUlSM4aKJKkZQ0WS1IyhIklqxlCRJDVjqEiSmhlbqCS5JMmXkjyW5JEkH+jqFyTZn+Rg97yyqyfJbUmmkzyY5LKBbW3rxh9Msm2g/uYkD3Xr3JYk49ofSdKpjfNI5Sjw+1X1c8Am4MYkG4CbgANVtR440C0DXAWs7x7bgTtgLoSAHcBbgMuBHfNB1I3ZPrDeljHujyTpFMYWKlX1dFX9W/f6BeAxYA2wFbirG3YXcE33eiuwu+Z8HTg/ycXA24H9VXWkqp4H9gNbuvfOq6qvVVUBuwe2JUnqwaLMqSRZB7wJ+AZwUVU9DXPBA1zYDVsDPDWw2kxXO1l9ZkhdktSTsYdKkp8APgd8sKq+d7KhQ2q1gPqwHrYnmUoyNTs7e6qWJUkLNNZQSXIOc4Hymar6fFd+pjt1Rfd8uKvPAJcMrL4WOHSK+toh9eNU1c6q2lhVG1evXn1mOyVJOqFxXv0V4JPAY1X1FwNv7QHmr+DaBtw7UL++uwpsE/Dd7vTYPmBzkpXdBP1mYF/33gtJNnWfdf3AtiRJPVgxxm2/Ffgt4KEkD3S1PwI+AtyT5AbgSeDa7r29wNXANPAi8B6AqjqS5Bbgvm7czVV1pHv9PuBO4DXAF7uHJKknYwuVqvpXhs97AFw5ZHwBN55gW7uAXUPqU8Abz6BNSVJDfqNektSMoSJJasZQkSQ1Y6hIkpoxVCRJzRgqkqRmDBVJUjOGiiSpGUNFktSMoSJJasZQkSQ1Y6hIkpoxVCRJzRgqkqRmDBVJUjOGiiSpGUNFktSMoSJJasZQkSQ1Y6hIkpoxVCRJzRgqkqRmDBVJUjOGiiSpGUNFktSMoSJJasZQkSQ1Y6hIkpoxVCRJzRgqkqRmDBVJUjOGiiSpGUNFktSMoSJJasZQkSQ1Y6hIkpoxVCRJzYwtVJLsSnI4ycMDtQ8n+a8kD3SPqwfe+8Mk00m+neTtA/UtXW06yU0D9dcn+UaSg0n+Lsm549oXSdJoxnmkciewZUj9L6vq0u6xFyDJBuBdwM936/xVkrOTnA3cDlwFbACu68YC/Gm3rfXA88ANY9wXSdIIxhYqVfUV4MiIw7cCd1fV96vqP4Bp4PLuMV1Vj1fVS8DdwNYkAd4G/H23/l3ANU13QJJ02vqYU3l/kge702Mru9oa4KmBMTNd7UT1nwS+U1VHj6lLknq02KFyB/AG4FLgaeBjXT1DxtYC6kMl2Z5kKsnU7Ozs6XUsSRrZooZKVT1TVS9X1f8Bf83c6S2YO9K4ZGDoWuDQSerPAucnWXFM/USfu7OqNlbVxtWrV7fZGUnScRY1VJJcPLD468D8lWF7gHcl+ZEkrwfWA98E7gPWd1d6ncvcZP6eqirgS8A7u/W3Afcuxj5Ikk5sxamHLEySzwJXAKuSzAA7gCuSXMrcqaongPcCVNUjSe4BHgWOAjdW1cvddt4P7APOBnZV1SPdR3wIuDvJnwDfAj45rn2RJI1mbKFSVdcNKZ/wH39V3QrcOqS+F9g7pP44r5w+kyQtAX6jXpLUjKEiSWrGUJEkNWOoSJKaMVQkSc0YKpKkZgwVSVIzhookqRlDRZLUjKEiSWrGUJEkNWOoSJKaMVQkSc2MFCpJDoxSkyQtbye99X2SHwV+jLnfRFnJKz/jex7wU2PuTZI0YU71eyrvBT7IXIDczyuh8j3g9jH2JUmaQCcNlar6OPDxJL9bVZ9YpJ4kSRNqpF9+rKpPJPllYN3gOlW1e0x9SZIm0EihkuTTwBuAB4CXu3IBhook6QdG/Y36jcCGqqpxNiNJmmyjfk/lYeB142xEkjT5Rj1SWQU8muSbwPfni1X1a2PpSpI0kUYNlQ+PswlJ0qvDqFd//cu4G5EkTb5Rr/56gbmrvQDOBc4B/reqzhtXY5KkyTPqkcprB5eTXANcPpaOJEkTa0F3Ka6qfwDe1rgXSdKEG/X01zsGFs9i7nsrfmdFkvRDRr3661cHXh8FngC2Nu9GkjTRRp1Tec+4G5EkTb5Rf6RrbZIvJDmc5Jkkn0uydtzNSZImy6gT9Z8C9jD3uyprgH/sapIk/cCoobK6qj5VVUe7x53A6jH2JUmaQKOGyrNJ3p3k7O7xbuC5cTYmSZo8o4bK7wC/Cfw38DTwTsDJe0nSDxn1kuJbgG1V9TxAkguAjzIXNpIkAaMfqfzifKAAVNUR4E3jaUmSNKlGDZWzkqycX+iOVEY9ypEkLROjhsrHgK8muSXJzcBXgT872QpJdnXfa3l4oHZBkv1JDnbPK7t6ktyWZDrJg0kuG1hnWzf+YJJtA/U3J3moW+e2JDmdHZcktTdSqFTVbuA3gGeAWeAdVfXpU6x2J7DlmNpNwIGqWg8c6JYBrgLWd4/twB3wgyOiHcBbmLsr8o6BI6Y7urHz6x37WZKkRTbyKayqehR49DTGfyXJumPKW4Erutd3AV8GPtTVd1dVAV9Pcn6Si7ux+7s5HJLsB7Yk+TJwXlV9ravvBq4Bvjhqf5Kk9hZ06/szcFFVPQ3QPV/Y1dcATw2Mm+lqJ6vPDKkPlWR7kqkkU7Ozs2e8E5Kk4RY7VE5k2HxILaA+VFXtrKqNVbVx9WpvBCBJ47LYofJMd1qL7vlwV58BLhkYtxY4dIr62iF1SVKPFjtU9gDzV3BtA+4dqF/fXQW2Cfhud3psH7A5ycpugn4zsK9774Ukm7qrvq4f2JYkqSdj+65Jks8yN9G+KskMc1dxfQS4J8kNwJPAtd3wvcDVwDTwIt0tYKrqSJJbgPu6cTfPT9oD72PuCrPXMDdB7yS9JPVsbKFSVded4K0rh4wt4MYTbGcXsGtIfQp445n0KElqa6lM1EuSXgUMFUlSM4aKJKkZQ0WS1IyhIklqxlCRJDVjqEiSmjFUJEnNGCqSpGYMFUlSM4aKJKkZQ0WS1IyhIklqxlCRJDVjqEiSmjFUJEnNGCqSpGYMFUlSM4aKJKkZQ0WS1IyhIklqxlCRJDVjqEiSmjFUJEnNGCqSpGYMFUlSM4aKJKkZQ0WS1IyhIklqxlCRJDVjqEiSmjFUJEnNGCqSpGYMFUlSM4aKJKkZQ0WS1IyhIklqppdQSfJEkoeSPJBkqqtdkGR/koPd88quniS3JZlO8mCSywa2s60bfzDJtj72RZL0ij6PVH6lqi6tqo3d8k3AgapaDxzolgGuAtZ3j+3AHTAXQsAO4C3A5cCO+SCSJPVjKZ3+2grc1b2+C7hmoL675nwdOD/JxcDbgf1VdaSqngf2A1sWu2lJ0iv6CpUC/jnJ/Um2d7WLquppgO75wq6+BnhqYN2Zrnai+nGSbE8ylWRqdna24W5Ikgat6Olz31pVh5JcCOxP8u8nGZshtTpJ/fhi1U5gJ8DGjRuHjpEknblejlSq6lD3fBj4AnNzIs90p7Xong93w2eASwZWXwscOkldktSTRQ+VJD+e5LXzr4HNwMPAHmD+Cq5twL3d6z3A9d1VYJuA73anx/YBm5Os7CboN3c1SVJP+jj9dRHwhSTzn/+3VfVPSe4D7klyA/AkcG03fi9wNTANvAi8B6CqjiS5BbivG3dzVR1ZvN2QJB1r0UOlqh4HfmlI/TngyiH1Am48wbZ2Abta9yhJWpildEmxJGnCGSqSpGYMFUlSM4aKJKkZQ0WS1IyhIklqxlCRJDVjqEiSmjFUJEnNGCqSpGYMFUlSM4aKJKkZQ0WS1Exfv/woaQyevPkX+m5BS9BP//FDi/ZZHqlIkpoxVCRJzRgqkqRmDBVJUjOGiiSpGUNFktSMoSJJasZQkSQ1Y6hIkpoxVCRJzRgqkqRmDBVJUjOGiiSpGUNFktSMoSJJasZQkSQ1Y6hIkpoxVCRJzRgqkqRmDBVJUjOGiiSpGUNFktSMoSJJambiQyXJliTfTjKd5Ka++5Gk5WyiQyXJ2cDtwFXABuC6JBv67UqSlq+JDhXgcmC6qh6vqpeAu4GtPfckScvWpIfKGuCpgeWZriZJ6sGKvhs4QxlSq+MGJduB7d3i/yT59li7Wj5WAc/23cRSkI9u67sFHc+/z3k7hv2rPG0/M8qgSQ+VGeCSgeW1wKFjB1XVTmDnYjW1XCSZqqqNffchDePfZz8m/fTXfcD6JK9Pci7wLmBPzz1J0rI10UcqVXU0yfuBfcDZwK6qeqTntiRp2ZroUAGoqr3A3r77WKY8pailzL/PHqTquHltSZIWZNLnVCRJS4ihogXx9jhaqpLsSnI4ycN997IcGSo6bd4eR0vcncCWvptYrgwVLYS3x9GSVVVfAY703cdyZahoIbw9jqShDBUtxEi3x5G0/BgqWoiRbo8jafkxVLQQ3h5H0lCGik5bVR0F5m+P8xhwj7fH0VKR5LPA14CfTTKT5Ia+e1pO/Ea9JKkZj1QkSc0YKpKkZgwVSVIzhookqRlDRZLUjKEiSWrGUJEkNWOoSD1KckuSDwws35rk9/rsSToTfvlR6lGSdcDnq+qyJGcBB4HLq+q5XhuTFmhF3w1Iy1lVPZHkuSRvAi4CvmWgaJIZKlL//gb4beB1wK5+W5HOjKe/pJ51d3p+CDgHWF9VL/fckrRgHqlIPauql5J8CfiOgaJJZ6hIPesm6DcB1/bdi3SmvKRY6lGSDcA0cKCqDvbdj3SmnFORJDXjkYokqRlDRZLUjKEiSWrGUJEkNWOoSJKaMVQkSc38P2E7ocOY2wiuAAAAAElFTkSuQmCC\n",
635 | "text/plain": [
636 | ""
637 | ]
638 | },
639 | "metadata": {
640 | "needs_background": "light"
641 | },
642 | "output_type": "display_data"
643 | }
644 | ],
645 | "source": [
646 | "# 使用countplot对y列的唯一值进行画图显示\n",
647 | "sns.countplot(data_num['y'])"
648 | ]
649 | },
650 | {
651 | "cell_type": "code",
652 | "execution_count": 17,
653 | "metadata": {
654 | "scrolled": true
655 | },
656 | "outputs": [
657 | {
658 | "name": "stdout",
659 | "output_type": "stream",
660 | "text": [
661 | "0有:22356,1有:2961\n"
662 | ]
663 | }
664 | ],
665 | "source": [
666 | "# 标签列y中存在两个唯一值,现在计算每个值的个数.\n",
667 | "data_num['y'].value_counts()\n",
668 | "value_0 = (data_num['y']==0).sum()\n",
669 | "value_1 = (data_num['y']==1).sum()\n",
670 | "print(\"0有:{},1有:{}\".format(value_0,value_1))"
671 | ]
672 | },
673 | {
674 | "cell_type": "markdown",
675 | "metadata": {},
676 | "source": [
677 | "从以上的输出和图的显示来看,y标签列的每类的样本数不一致,因此为保抽取样本的平衡,对y==0的样本进行随机抽样。"
678 | ]
679 | },
680 | {
681 | "cell_type": "code",
682 | "execution_count": 18,
683 | "metadata": {
684 | "scrolled": true
685 | },
686 | "outputs": [],
687 | "source": [
688 | "# 以上可知y中列存在0和1两个值,由于0包含的元素个数远远大于1的个数,现在使用sample从y为0的样本中随机抽取一些样本,\n",
689 | "# 要求0包含的样本数和1的样本数相同,且随机种子设定为22,\n",
690 | "# sample随机抽样方法:sample(n=样本数,random_state=随机种子数)\n",
691 | "data_num_0 = data_num[data_num['y']==0].sample(n=value_1, random_state=22)"
692 | ]
693 | },
694 | {
695 | "cell_type": "code",
696 | "execution_count": 19,
697 | "metadata": {},
698 | "outputs": [],
699 | "source": [
700 | "# 得到数据集中y列中为1的样本\n",
701 | "data_num_1=data_num[data_num['y']!=0]"
702 | ]
703 | },
704 | {
705 | "cell_type": "code",
706 | "execution_count": 20,
707 | "metadata": {},
708 | "outputs": [
709 | {
710 | "data": {
711 | "text/plain": [
712 | "(5922, 11)"
713 | ]
714 | },
715 | "execution_count": 20,
716 | "metadata": {},
717 | "output_type": "execute_result"
718 | }
719 | ],
720 | "source": [
721 | "# 将y为1的样本和随机抽取的y为0的样本进行合并,并且打印合并后数据集的维度\n",
722 | "data_num_sample = pd.concat([data_num_0,data_num_1], axis=0, ignore_index=False)\n",
723 | "data_num_sample.shape"
724 | ]
725 | },
726 | {
727 | "cell_type": "code",
728 | "execution_count": 21,
729 | "metadata": {},
730 | "outputs": [
731 | {
732 | "data": {
733 | "text/html": [
734 | "\n",
735 | "\n",
748 | "
\n",
749 | " \n",
750 | " \n",
751 | " | \n",
752 | " default | \n",
753 | " housing | \n",
754 | " loan | \n",
755 | " age | \n",
756 | " balance | \n",
757 | " campaign | \n",
758 | " day | \n",
759 | " duration | \n",
760 | " pdays | \n",
761 | " previous | \n",
762 | " y | \n",
763 | "
\n",
764 | " \n",
765 | " \n",
766 | " \n",
767 | " count | \n",
768 | " 5922.000000 | \n",
769 | " 5922.000000 | \n",
770 | " 5922.000000 | \n",
771 | " 5922.000000 | \n",
772 | " 5922.000000 | \n",
773 | " 5922.000000 | \n",
774 | " 5922.000000 | \n",
775 | " 5922.000000 | \n",
776 | " 5922.000000 | \n",
777 | " 5922.000000 | \n",
778 | " 5922.000000 | \n",
779 | "
\n",
780 | " \n",
781 | " mean | \n",
782 | " 0.013847 | \n",
783 | " 0.468929 | \n",
784 | " 0.126646 | \n",
785 | " 41.170382 | \n",
786 | " 1616.046099 | \n",
787 | " 2.473320 | \n",
788 | " 15.489868 | \n",
789 | " 378.140324 | \n",
790 | " 52.918102 | \n",
791 | " 0.863560 | \n",
792 | " 0.500000 | \n",
793 | "
\n",
794 | " \n",
795 | " std | \n",
796 | " 0.116864 | \n",
797 | " 0.499076 | \n",
798 | " 0.332605 | \n",
799 | " 12.012457 | \n",
800 | " 3371.601168 | \n",
801 | " 2.745904 | \n",
802 | " 8.419602 | \n",
803 | " 353.409237 | \n",
804 | " 109.976587 | \n",
805 | " 2.284146 | \n",
806 | " 0.500042 | \n",
807 | "
\n",
808 | " \n",
809 | " min | \n",
810 | " 0.000000 | \n",
811 | " 0.000000 | \n",
812 | " 0.000000 | \n",
813 | " 18.000000 | \n",
814 | " -1965.000000 | \n",
815 | " 1.000000 | \n",
816 | " 1.000000 | \n",
817 | " 4.000000 | \n",
818 | " -1.000000 | \n",
819 | " 0.000000 | \n",
820 | " 0.000000 | \n",
821 | "
\n",
822 | " \n",
823 | " 25% | \n",
824 | " 0.000000 | \n",
825 | " 0.000000 | \n",
826 | " 0.000000 | \n",
827 | " 32.000000 | \n",
828 | " 130.000000 | \n",
829 | " 1.000000 | \n",
830 | " 8.000000 | \n",
831 | " 145.000000 | \n",
832 | " -1.000000 | \n",
833 | " 0.000000 | \n",
834 | " 0.000000 | \n",
835 | "
\n",
836 | " \n",
837 | " 50% | \n",
838 | " 0.000000 | \n",
839 | " 0.000000 | \n",
840 | " 0.000000 | \n",
841 | " 39.000000 | \n",
842 | " 574.000000 | \n",
843 | " 2.000000 | \n",
844 | " 15.000000 | \n",
845 | " 259.000000 | \n",
846 | " -1.000000 | \n",
847 | " 0.000000 | \n",
848 | " 0.500000 | \n",
849 | "
\n",
850 | " \n",
851 | " 75% | \n",
852 | " 0.000000 | \n",
853 | " 1.000000 | \n",
854 | " 0.000000 | \n",
855 | " 49.000000 | \n",
856 | " 1854.750000 | \n",
857 | " 3.000000 | \n",
858 | " 21.000000 | \n",
859 | " 492.000000 | \n",
860 | " 77.750000 | \n",
861 | " 1.000000 | \n",
862 | " 1.000000 | \n",
863 | "
\n",
864 | " \n",
865 | " max | \n",
866 | " 1.000000 | \n",
867 | " 1.000000 | \n",
868 | " 1.000000 | \n",
869 | " 95.000000 | \n",
870 | " 102127.000000 | \n",
871 | " 44.000000 | \n",
872 | " 31.000000 | \n",
873 | " 3881.000000 | \n",
874 | " 854.000000 | \n",
875 | " 58.000000 | \n",
876 | " 1.000000 | \n",
877 | "
\n",
878 | " \n",
879 | "
\n",
880 | "
"
881 | ],
882 | "text/plain": [
883 | " default housing loan age balance \\\n",
884 | "count 5922.000000 5922.000000 5922.000000 5922.000000 5922.000000 \n",
885 | "mean 0.013847 0.468929 0.126646 41.170382 1616.046099 \n",
886 | "std 0.116864 0.499076 0.332605 12.012457 3371.601168 \n",
887 | "min 0.000000 0.000000 0.000000 18.000000 -1965.000000 \n",
888 | "25% 0.000000 0.000000 0.000000 32.000000 130.000000 \n",
889 | "50% 0.000000 0.000000 0.000000 39.000000 574.000000 \n",
890 | "75% 0.000000 1.000000 0.000000 49.000000 1854.750000 \n",
891 | "max 1.000000 1.000000 1.000000 95.000000 102127.000000 \n",
892 | "\n",
893 | " campaign day duration pdays previous \\\n",
894 | "count 5922.000000 5922.000000 5922.000000 5922.000000 5922.000000 \n",
895 | "mean 2.473320 15.489868 378.140324 52.918102 0.863560 \n",
896 | "std 2.745904 8.419602 353.409237 109.976587 2.284146 \n",
897 | "min 1.000000 1.000000 4.000000 -1.000000 0.000000 \n",
898 | "25% 1.000000 8.000000 145.000000 -1.000000 0.000000 \n",
899 | "50% 2.000000 15.000000 259.000000 -1.000000 0.000000 \n",
900 | "75% 3.000000 21.000000 492.000000 77.750000 1.000000 \n",
901 | "max 44.000000 31.000000 3881.000000 854.000000 58.000000 \n",
902 | "\n",
903 | " y \n",
904 | "count 5922.000000 \n",
905 | "mean 0.500000 \n",
906 | "std 0.500042 \n",
907 | "min 0.000000 \n",
908 | "25% 0.000000 \n",
909 | "50% 0.500000 \n",
910 | "75% 1.000000 \n",
911 | "max 1.000000 "
912 | ]
913 | },
914 | "execution_count": 21,
915 | "metadata": {},
916 | "output_type": "execute_result"
917 | }
918 | ],
919 | "source": [
920 | "# 对数据集拆分为训练集和测试集之前,先检查下数据是否存在问题,调用describe,打印下数据的统计信息\n",
921 | "data_num_sample.describe()"
922 | ]
923 | },
924 | {
925 | "cell_type": "code",
926 | "execution_count": 22,
927 | "metadata": {},
928 | "outputs": [
929 | {
930 | "data": {
931 | "text/plain": [
932 | "Index(['default', 'housing', 'loan', 'age', 'balance', 'campaign', 'day',\n",
933 | " 'duration', 'pdays', 'previous'],\n",
934 | " dtype='object')"
935 | ]
936 | },
937 | "execution_count": 22,
938 | "metadata": {},
939 | "output_type": "execute_result"
940 | }
941 | ],
942 | "source": [
943 | "# 将数据集中的y标签列数据存到y变量中,将其他的(特征)列存到X变量中,并且打印X的所有列的列名\n",
944 | "y = data_num_sample['y']\n",
945 | "X = data_num_sample.drop(columns='y',axis=1)\n",
946 | "X.columns"
947 | ]
948 | },
949 | {
950 | "cell_type": "code",
951 | "execution_count": 23,
952 | "metadata": {},
953 | "outputs": [],
954 | "source": [
955 | "# 将数据集X和y进行随机切分,得到训练集和测试集数据,随机种子设定为22,测试集占比1/4。\n",
956 | "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=22, test_size=1/4)"
957 | ]
958 | },
959 | {
960 | "cell_type": "code",
961 | "execution_count": 24,
962 | "metadata": {},
963 | "outputs": [
964 | {
965 | "name": "stdout",
966 | "output_type": "stream",
967 | "text": [
968 | "X_train的维度为(4441, 10),X_test的维度为(1481, 10)\n"
969 | ]
970 | }
971 | ],
972 | "source": [
973 | "print(\"X_train的维度为{},X_test的维度为{}\".format(X_train.shape,X_test.shape))"
974 | ]
975 | },
976 | {
977 | "cell_type": "markdown",
978 | "metadata": {},
979 | "source": [
980 | "## 四、选择模型"
981 | ]
982 | },
983 | {
984 | "cell_type": "markdown",
985 | "metadata": {},
986 | "source": [
987 | "通过以上的特征处理和数据切分,我们将抽样后的数据集随机切分为训练集和测试集,这里,将依据训练集和测试集通过交叉验证的方法来选择模型,以下将对kNN模型、逻辑回归模型和决策树模型进行训练,通过评价指标选择一个较有的模型"
988 | ]
989 | },
990 | {
991 | "cell_type": "markdown",
992 | "metadata": {},
993 | "source": [
994 | "train_test_model和predict_auc方法均是被调用方法,在下面的模型选择的方法中会调用这两个方法"
995 | ]
996 | },
997 | {
998 | "cell_type": "code",
999 | "execution_count": 25,
1000 | "metadata": {},
1001 | "outputs": [],
1002 | "source": [
1003 | "def train_test_model(clf, X_train, y_train, cv_scores, param):\n",
1004 | " \"\"\"\n",
1005 | " 功能:\n",
1006 | " 依据训练集,对模型进行训练,得到交叉验证后的评分值\n",
1007 | " 参数:\n",
1008 | " clf:模型\n",
1009 | " param:模型参数\n",
1010 | " X_train:训练集\n",
1011 | " y_train:训练样本对应的标签\n",
1012 | " cv_scores:字`典类型,将得到的最终评分值存到字典中。\n",
1013 | "\n",
1014 | " \"\"\"\n",
1015 | " # 使用10折交叉验证,roc_auc作为评价指标,对clf进行评分计算,并且对得到的评分计算均值,并且将参数和评分进行打印,\n",
1016 | " # 比如:参数为5,评分均值为0.76342,则打印输出为:参数=5,验证集上的AUC=0.76342\n",
1017 | " val_scores = cross_val_score(clf, X_train, y_train, cv=10, scoring='roc_auc')\n",
1018 | " score_mean = val_scores.mean()\n",
1019 | " print(score_mean)\n",
1020 | " print(\"参数={},验证集上的AUC={}\".format(param, score_mean))\n",
1021 | " \n",
1022 | " \n",
1023 | " \n",
1024 | " # 经过以上操作后,还需要将评分均值存到字典cv_scores中,其中,键为模型参数,值为得到的评分均值\n",
1025 | " # 比如:,则在字典中表现为{5:0.76342}\n",
1026 | " cv_scores[param] = score_mean"
1027 | ]
1028 | },
1029 | {
1030 | "cell_type": "code",
1031 | "execution_count": 26,
1032 | "metadata": {},
1033 | "outputs": [],
1034 | "source": [
1035 | "def predict_auc(model,X_train,y_train,X_test,y_test):\n",
1036 | " \"\"\"\n",
1037 | " 功能:\n",
1038 | " 使用训练数据训练模型,使用训练好的模型对测试数据进行预测,进而得到模型的AUC评分\n",
1039 | " 参数:\n",
1040 | " model:设置了最优参数的模型\n",
1041 | " X_train:训练集\n",
1042 | " y_train:训练数据对应的标签\n",
1043 | " X_test:测试集\n",
1044 | " y_test:测试数据对应的标签\n",
1045 | " 返回值\n",
1046 | " 返回模型的AUC值\n",
1047 | " \"\"\"\n",
1048 | " # 设置最优参数后,对整个训练集进行训练,然后通过predict对测试集进行预测,并将结果存入变量y_pred,使用roc_auc_score()评分方法计算模型的AUC值\n",
1049 | " # 将得到的AUC值进行打印,打印输出格式如:模型AUC值:0.75177973\n",
1050 | " model.fit(X_train, y_train)\n",
1051 | " y_pred = model.predict(X_test)\n",
1052 | " model_auc = roc_auc_score(y_pred, y_test)\n",
1053 | " print('模型AUC值:{}'.format(model_auc))\n",
1054 | "\n",
1055 | " return model_auc\n",
1056 | " "
1057 | ]
1058 | },
1059 | {
1060 | "cell_type": "markdown",
1061 | "metadata": {},
1062 | "source": [
1063 | "### 1.kNN模型\n",
1064 | "k近邻法(k-nearest neighbor,k-NN)是一种基本的分类方法,对于给定的数据集,若输入一个新的实例,在训练数据集中找到与该实例最邻近的k个实例,这个k个实例的多数属于某个类,那么就把该输入实例判定为这个类。"
1065 | ]
1066 | },
1067 | {
1068 | "cell_type": "code",
1069 | "execution_count": 27,
1070 | "metadata": {
1071 | "scrolled": true
1072 | },
1073 | "outputs": [
1074 | {
1075 | "name": "stdout",
1076 | "output_type": "stream",
1077 | "text": [
1078 | "0.8077509534923599\n",
1079 | "参数=5,验证集上的AUC=0.8077509534923599\n",
1080 | "0.8175383570005726\n",
1081 | "参数=7,验证集上的AUC=0.8175383570005726\n",
1082 | "0.7528462119421223\n",
1083 | "参数=2,验证集上的AUC=0.7528462119421223\n",
1084 | "0.7822092488933985\n",
1085 | "参数=3,验证集上的AUC=0.7822092488933985\n",
1086 | "0.8142750349747395\n",
1087 | "参数=6,验证集上的AUC=0.8142750349747395\n",
1088 | "最优的参数值:7\n",
1089 | "模型AUC值:0.7202998772646505\n"
1090 | ]
1091 | }
1092 | ],
1093 | "source": [
1094 | "# 这里只对kNN模型的k参数进行选择,为k设定不同的值,进而得到不同的kNN模型,使用不同的kNN模型得到AUC值,进而得到最优模型的评分值\n",
1095 | "knn_parameters = [5,7,2,3,6]\n",
1096 | "knn_cv_scores = {}\n",
1097 | "for param in knn_parameters:\n",
1098 | " knn_clf = KNeighborsClassifier(n_neighbors=param)\n",
1099 | " train_test_model(knn_clf, X_train, y_train,knn_cv_scores,param)\n",
1100 | " \n",
1101 | "knn_best_para=max(knn_cv_scores,key=knn_cv_scores.get)\n",
1102 | "print('最优的参数值:{}'.format(knn_best_para))\n",
1103 | "\n",
1104 | "# 为模型设置最优参数,训练模型,对测试集进行预测,使用roc_auc_score计算模型的AUC值\n",
1105 | "knn_model= KNeighborsClassifier(n_neighbors=knn_best_para)\n",
1106 | "knn_model_auc = predict_auc(knn_model,X_train,y_train,X_test,y_test)"
1107 | ]
1108 | },
1109 | {
1110 | "cell_type": "markdown",
1111 | "metadata": {},
1112 | "source": [
1113 | "### 2.逻辑回归模型\n",
1114 | "逻辑回归模型是一种分类模型,其模型输出的结果处于(0,1)之间,当输出结果大于给定的阈值时,则为A类,小于阈值为B类,由于具体的原理比较复杂,在此只对逻辑回归进行了简单的原理介绍"
1115 | ]
1116 | },
1117 | {
1118 | "cell_type": "code",
1119 | "execution_count": 28,
1120 | "metadata": {},
1121 | "outputs": [
1122 | {
1123 | "name": "stdout",
1124 | "output_type": "stream",
1125 | "text": [
1126 | "0.8623990756280264\n",
1127 | "参数=1,验证集上的AUC=0.8623990756280264\n",
1128 | "0.8623787155370144\n",
1129 | "参数=3,验证集上的AUC=0.8623787155370144\n",
1130 | "0.8624392095715059\n",
1131 | "参数=5,验证集上的AUC=0.8624392095715059\n",
1132 | "0.8617857855030111\n",
1133 | "参数=10,验证集上的AUC=0.8617857855030111\n",
1134 | "0.8622761466118669\n",
1135 | "参数=15,验证集上的AUC=0.8622761466118669\n",
1136 | "最优的参数值:5\n",
1137 | "模型AUC值:0.7704903918371834\n"
1138 | ]
1139 | }
1140 | ],
1141 | "source": [
1142 | "# 这里只对逻辑回归模型的参数C设定不同的值,根据不同的值可以得到不同的模型,依据不同模型的评分,选择最优的参数值,进而得到最终模型的评分值\n",
1143 | "lr_parameters = [1,3,5,10,15]\n",
1144 | "lr_cv_scores = {}\n",
1145 | "for param in lr_parameters:\n",
1146 | " lr_clf = LogisticRegression(C=param)\n",
1147 | " train_test_model(lr_clf, X_train, y_train,lr_cv_scores,param)\n",
1148 | " \n",
1149 | "lr_best_para=max(lr_cv_scores,key=lr_cv_scores.get)\n",
1150 | "print('最优的参数值:{}'.format(lr_best_para))\n",
1151 | "\n",
1152 | "# 为模型设置最优参数,训练模型,对测试集进行预测,使用roc_auc_score计算模型的AUC值\n",
1153 | "lr_model= LogisticRegression(C=lr_best_para)\n",
1154 | "lr_model_auc = predict_auc(lr_model,X_train,y_train,X_test,y_test)"
1155 | ]
1156 | },
1157 | {
1158 | "cell_type": "markdown",
1159 | "metadata": {},
1160 | "source": [
1161 | "### 3.决策树模型\n",
1162 | "决策树也是一种分类模型,显而易见,决策树可以理解为将数据按照某种规则生成一颗形似树的结构,以实现对数据的分类。"
1163 | ]
1164 | },
1165 | {
1166 | "cell_type": "code",
1167 | "execution_count": 29,
1168 | "metadata": {
1169 | "scrolled": true
1170 | },
1171 | "outputs": [
1172 | {
1173 | "name": "stdout",
1174 | "output_type": "stream",
1175 | "text": [
1176 | "0.7183915142959287\n",
1177 | "参数=1,验证集上的AUC=0.7183915142959287\n",
1178 | "0.8312521096369881\n",
1179 | "参数=3,验证集上的AUC=0.8312521096369881\n",
1180 | "0.8561861778610342\n",
1181 | "参数=5,验证集上的AUC=0.8561861778610342\n",
1182 | "0.8021781689000533\n",
1183 | "参数=10,验证集上的AUC=0.8021781689000533\n",
1184 | "0.7493300003167633\n",
1185 | "参数=15,验证集上的AUC=0.7493300003167633\n",
1186 | "最优的参数值:5\n",
1187 | "模型AUC值:0.7760305631026779\n"
1188 | ]
1189 | }
1190 | ],
1191 | "source": [
1192 | "# 以下将对决策树模型的树的深度进行选择,以得到不同的模型,依据模型的评分值,得到最优的参数值,进而得到最终模型的评分值AUC\n",
1193 | "dt_parameters = [1,3,5,10,15]\n",
1194 | "dt_cv_scores = {}\n",
1195 | "for param in dt_parameters:\n",
1196 | " dt_clf = DecisionTreeClassifier(max_depth=param)\n",
1197 | " train_test_model(dt_clf, X_train, y_train,dt_cv_scores,param)\n",
1198 | " \n",
1199 | "dt_best_para=max(dt_cv_scores,key=dt_cv_scores.get)\n",
1200 | "print('最优的参数值:{}'.format(dt_best_para))\n",
1201 | "\n",
1202 | "# 为模型设置最优参数,训练模型,对测试集进行预测,使用roc_auc_score计算模型的AUC值\n",
1203 | "dt_model= DecisionTreeClassifier(max_depth=dt_best_para)\n",
1204 | "dt_model_auc = predict_auc(dt_model,X_train,y_train,X_test,y_test)"
1205 | ]
1206 | },
1207 | {
1208 | "cell_type": "markdown",
1209 | "metadata": {},
1210 | "source": [
1211 | "对以上三个模型进行分析,决策树模型好,因为模型AUC值值最大。"
1212 | ]
1213 | },
1214 | {
1215 | "cell_type": "markdown",
1216 | "metadata": {},
1217 | "source": [
1218 | "### 数据归一化"
1219 | ]
1220 | },
1221 | {
1222 | "cell_type": "code",
1223 | "execution_count": 30,
1224 | "metadata": {
1225 | "scrolled": false
1226 | },
1227 | "outputs": [
1228 | {
1229 | "data": {
1230 | "text/html": [
1231 | "\n",
1232 | "\n",
1245 | "
\n",
1246 | " \n",
1247 | " \n",
1248 | " | \n",
1249 | " default | \n",
1250 | " housing | \n",
1251 | " loan | \n",
1252 | " age | \n",
1253 | " balance | \n",
1254 | " campaign | \n",
1255 | " day | \n",
1256 | " duration | \n",
1257 | " pdays | \n",
1258 | " previous | \n",
1259 | "
\n",
1260 | " \n",
1261 | " \n",
1262 | " \n",
1263 | " count | \n",
1264 | " 4441.000000 | \n",
1265 | " 4441.000000 | \n",
1266 | " 4441.000000 | \n",
1267 | " 4441.000000 | \n",
1268 | " 4441.000000 | \n",
1269 | " 4441.000000 | \n",
1270 | " 4441.000000 | \n",
1271 | " 4441.000000 | \n",
1272 | " 4441.00000 | \n",
1273 | " 4441.000000 | \n",
1274 | "
\n",
1275 | " \n",
1276 | " mean | \n",
1277 | " 0.014411 | \n",
1278 | " 0.471741 | \n",
1279 | " 0.128575 | \n",
1280 | " 41.186895 | \n",
1281 | " 1587.306463 | \n",
1282 | " 2.474218 | \n",
1283 | " 15.548750 | \n",
1284 | " 377.235307 | \n",
1285 | " 53.21324 | \n",
1286 | " 0.833146 | \n",
1287 | "
\n",
1288 | " \n",
1289 | " std | \n",
1290 | " 0.119192 | \n",
1291 | " 0.499257 | \n",
1292 | " 0.334766 | \n",
1293 | " 12.027061 | \n",
1294 | " 3136.478955 | \n",
1295 | " 2.751711 | \n",
1296 | " 8.417989 | \n",
1297 | " 350.516914 | \n",
1298 | " 111.21589 | \n",
1299 | " 2.134833 | \n",
1300 | "
\n",
1301 | " \n",
1302 | " min | \n",
1303 | " 0.000000 | \n",
1304 | " 0.000000 | \n",
1305 | " 0.000000 | \n",
1306 | " 18.000000 | \n",
1307 | " -1965.000000 | \n",
1308 | " 1.000000 | \n",
1309 | " 1.000000 | \n",
1310 | " 4.000000 | \n",
1311 | " -1.00000 | \n",
1312 | " 0.000000 | \n",
1313 | "
\n",
1314 | " \n",
1315 | " 25% | \n",
1316 | " 0.000000 | \n",
1317 | " 0.000000 | \n",
1318 | " 0.000000 | \n",
1319 | " 32.000000 | \n",
1320 | " 119.000000 | \n",
1321 | " 1.000000 | \n",
1322 | " 8.000000 | \n",
1323 | " 144.000000 | \n",
1324 | " -1.00000 | \n",
1325 | " 0.000000 | \n",
1326 | "
\n",
1327 | " \n",
1328 | " 50% | \n",
1329 | " 0.000000 | \n",
1330 | " 0.000000 | \n",
1331 | " 0.000000 | \n",
1332 | " 39.000000 | \n",
1333 | " 577.000000 | \n",
1334 | " 2.000000 | \n",
1335 | " 15.000000 | \n",
1336 | " 260.000000 | \n",
1337 | " -1.00000 | \n",
1338 | " 0.000000 | \n",
1339 | "
\n",
1340 | " \n",
1341 | " 75% | \n",
1342 | " 0.000000 | \n",
1343 | " 1.000000 | \n",
1344 | " 0.000000 | \n",
1345 | " 49.000000 | \n",
1346 | " 1853.000000 | \n",
1347 | " 3.000000 | \n",
1348 | " 21.000000 | \n",
1349 | " 489.000000 | \n",
1350 | " 64.00000 | \n",
1351 | " 1.000000 | \n",
1352 | "
\n",
1353 | " \n",
1354 | " max | \n",
1355 | " 1.000000 | \n",
1356 | " 1.000000 | \n",
1357 | " 1.000000 | \n",
1358 | " 93.000000 | \n",
1359 | " 81204.000000 | \n",
1360 | " 44.000000 | \n",
1361 | " 31.000000 | \n",
1362 | " 3881.000000 | \n",
1363 | " 854.00000 | \n",
1364 | " 37.000000 | \n",
1365 | "
\n",
1366 | " \n",
1367 | "
\n",
1368 | "
"
1369 | ],
1370 | "text/plain": [
1371 | " default housing loan age balance \\\n",
1372 | "count 4441.000000 4441.000000 4441.000000 4441.000000 4441.000000 \n",
1373 | "mean 0.014411 0.471741 0.128575 41.186895 1587.306463 \n",
1374 | "std 0.119192 0.499257 0.334766 12.027061 3136.478955 \n",
1375 | "min 0.000000 0.000000 0.000000 18.000000 -1965.000000 \n",
1376 | "25% 0.000000 0.000000 0.000000 32.000000 119.000000 \n",
1377 | "50% 0.000000 0.000000 0.000000 39.000000 577.000000 \n",
1378 | "75% 0.000000 1.000000 0.000000 49.000000 1853.000000 \n",
1379 | "max 1.000000 1.000000 1.000000 93.000000 81204.000000 \n",
1380 | "\n",
1381 | " campaign day duration pdays previous \n",
1382 | "count 4441.000000 4441.000000 4441.000000 4441.00000 4441.000000 \n",
1383 | "mean 2.474218 15.548750 377.235307 53.21324 0.833146 \n",
1384 | "std 2.751711 8.417989 350.516914 111.21589 2.134833 \n",
1385 | "min 1.000000 1.000000 4.000000 -1.00000 0.000000 \n",
1386 | "25% 1.000000 8.000000 144.000000 -1.00000 0.000000 \n",
1387 | "50% 2.000000 15.000000 260.000000 -1.00000 0.000000 \n",
1388 | "75% 3.000000 21.000000 489.000000 64.00000 1.000000 \n",
1389 | "max 44.000000 31.000000 3881.000000 854.00000 37.000000 "
1390 | ]
1391 | },
1392 | "execution_count": 30,
1393 | "metadata": {},
1394 | "output_type": "execute_result"
1395 | }
1396 | ],
1397 | "source": [
1398 | "# 先观察下训练集的特征,查看特征的量纲具有什么特点\n",
1399 | "X_train.describe()"
1400 | ]
1401 | },
1402 | {
1403 | "cell_type": "markdown",
1404 | "metadata": {},
1405 | "source": [
1406 | "回答:从以上的统计信息中,可以知道,特征与特征之间的量纲不一致,因此需要对特征值进行归一化处理。"
1407 | ]
1408 | },
1409 | {
1410 | "cell_type": "markdown",
1411 | "metadata": {},
1412 | "source": [
1413 | "#### 归一化\n",
1414 | "对训练数据和测试集数据做归一化"
1415 | ]
1416 | },
1417 | {
1418 | "cell_type": "code",
1419 | "execution_count": 31,
1420 | "metadata": {},
1421 | "outputs": [],
1422 | "source": [
1423 | "# 使用MinMaxScaler对训练集X_train和测试集X_test进行归一化\n",
1424 | "scaler = MinMaxScaler()\n",
1425 | "X_train_scaled = scaler.fit_transform(X_train.astype('float64'))\n",
1426 | "X_test_scaled = scaler.transform(X_test.astype('float64'))"
1427 | ]
1428 | },
1429 | {
1430 | "cell_type": "markdown",
1431 | "metadata": {},
1432 | "source": [
1433 | "由于以下训练模型的代码以上的代码类似,若是感兴趣的同学,可以将以下给出的kNN模型,逻辑回归模型和决策树模型的代码删除掉,然后尝试自己写代码,完成模型的训练"
1434 | ]
1435 | },
1436 | {
1437 | "cell_type": "markdown",
1438 | "metadata": {},
1439 | "source": [
1440 | "#### kNN模型"
1441 | ]
1442 | },
1443 | {
1444 | "cell_type": "code",
1445 | "execution_count": 32,
1446 | "metadata": {
1447 | "scrolled": false
1448 | },
1449 | "outputs": [
1450 | {
1451 | "name": "stdout",
1452 | "output_type": "stream",
1453 | "text": [
1454 | "0.8362348007709579\n",
1455 | "参数=5,验证集上的AUC=0.8362348007709579\n",
1456 | "0.844031883023167\n",
1457 | "参数=7,验证集上的AUC=0.844031883023167\n",
1458 | "0.795107142467294\n",
1459 | "参数=2,验证集上的AUC=0.795107142467294\n",
1460 | "0.8216005781567567\n",
1461 | "参数=3,验证集上的AUC=0.8216005781567567\n",
1462 | "0.8418551395164939\n",
1463 | "参数=6,验证集上的AUC=0.8418551395164939\n",
1464 | "最优的参数值:7\n",
1465 | "模型AUC值:0.7633043955039752\n"
1466 | ]
1467 | }
1468 | ],
1469 | "source": [
1470 | "knn_scaler_parameters = [5,7,2,3,6]\n",
1471 | "knn_scaler_cv_scores = {}\n",
1472 | "for param in knn_scaler_parameters:\n",
1473 | " knn_scaler_clf = KNeighborsClassifier(n_neighbors=param)\n",
1474 | " train_test_model(knn_scaler_clf, X_train_scaled, y_train,knn_scaler_cv_scores,param)\n",
1475 | " \n",
1476 | "knn_scaler_best_para=max(knn_scaler_cv_scores,key=knn_scaler_cv_scores.get)\n",
1477 | "print('最优的参数值:{}'.format(knn_scaler_best_para))\n",
1478 | "\n",
1479 | "# 为模型设置最优参数,训练模型,对测试集进行预测,使用roc_auc_score计算模型的AUC值\n",
1480 | "knn_scaler_model= KNeighborsClassifier(n_neighbors=knn_scaler_best_para)\n",
1481 | "knn_scaler_model_auc = predict_auc(knn_scaler_model,X_train_scaled,y_train,X_test_scaled,y_test)"
1482 | ]
1483 | },
1484 | {
1485 | "cell_type": "markdown",
1486 | "metadata": {},
1487 | "source": [
1488 | "#### 逻辑回归模型"
1489 | ]
1490 | },
1491 | {
1492 | "cell_type": "code",
1493 | "execution_count": 33,
1494 | "metadata": {
1495 | "scrolled": true
1496 | },
1497 | "outputs": [
1498 | {
1499 | "name": "stdout",
1500 | "output_type": "stream",
1501 | "text": [
1502 | "0.8576315184282677\n",
1503 | "参数=1,验证集上的AUC=0.8576315184282677\n",
1504 | "0.8610214502261355\n",
1505 | "参数=3,验证集上的AUC=0.8610214502261355\n",
1506 | "0.8617095314058393\n",
1507 | "参数=5,验证集上的AUC=0.8617095314058393\n",
1508 | "0.8620709066888409\n",
1509 | "参数=10,验证集上的AUC=0.8620709066888409\n",
1510 | "0.8622355155684189\n",
1511 | "参数=15,验证集上的AUC=0.8622355155684189\n",
1512 | "最优的参数值:15\n",
1513 | "模型AUC值:0.7691338914433311\n"
1514 | ]
1515 | }
1516 | ],
1517 | "source": [
1518 | "lr_scaler_parameters = [1,3,5,10,15]\n",
1519 | "lr_scaler_cv_scores = {}\n",
1520 | "for param in lr_scaler_parameters:\n",
1521 | " lr_scaler_clf = LogisticRegression(C=param)\n",
1522 | " train_test_model(lr_scaler_clf, X_train_scaled, y_train,lr_scaler_cv_scores,param)\n",
1523 | " \n",
1524 | "lr_scaler_best_para=max(lr_scaler_cv_scores,key=lr_scaler_cv_scores.get)\n",
1525 | "print('最优的参数值:{}'.format(lr_scaler_best_para))\n",
1526 | "\n",
1527 | "# 为模型设置最优参数,训练模型,对测试集进行预测,使用roc_auc_score计算模型的AUC值\n",
1528 | "lr_scaler_model= LogisticRegression(C=lr_scaler_best_para)\n",
1529 | "lr_scaler_model_auc = predict_auc(lr_scaler_model,X_train_scaled,y_train,X_test_scaled,y_test)"
1530 | ]
1531 | },
1532 | {
1533 | "cell_type": "markdown",
1534 | "metadata": {},
1535 | "source": [
1536 | "#### 决策树模型"
1537 | ]
1538 | },
1539 | {
1540 | "cell_type": "code",
1541 | "execution_count": 34,
1542 | "metadata": {
1543 | "scrolled": true
1544 | },
1545 | "outputs": [
1546 | {
1547 | "name": "stdout",
1548 | "output_type": "stream",
1549 | "text": [
1550 | "0.7183915142959287\n",
1551 | "参数=1,验证集上的AUC=0.7183915142959287\n",
1552 | "0.8312521096369881\n",
1553 | "参数=3,验证集上的AUC=0.8312521096369881\n",
1554 | "0.8566254157311546\n",
1555 | "参数=5,验证集上的AUC=0.8566254157311546\n",
1556 | "0.8051230940708141\n",
1557 | "参数=10,验证集上的AUC=0.8051230940708141\n",
1558 | "0.748734443042695\n",
1559 | "参数=15,验证集上的AUC=0.748734443042695\n",
1560 | "最优的参数值:5\n",
1561 | "模型AUC值:0.7767016204516204\n"
1562 | ]
1563 | }
1564 | ],
1565 | "source": [
1566 | "dt_scaler_parameters = [1,3,5,10,15]\n",
1567 | "dt_scaler_cv_scores = {}\n",
1568 | "for param in dt_scaler_parameters:\n",
1569 | " dt_scaler_clf = DecisionTreeClassifier(max_depth=param)\n",
1570 | " train_test_model(dt_scaler_clf, X_train_scaled, y_train,dt_scaler_cv_scores,param)\n",
1571 | " \n",
1572 | "dt_scaler_best_para=max(dt_scaler_cv_scores,key=dt_scaler_cv_scores.get)\n",
1573 | "print('最优的参数值:{}'.format(dt_scaler_best_para))\n",
1574 | "\n",
1575 | "# 为模型设置最优参数,训练模型,对测试集进行预测,使用roc_auc_score计算模型的AUC值\n",
1576 | "dt_scaler_model= DecisionTreeClassifier(max_depth=dt_scaler_best_para)\n",
1577 | "dt_scaler_model_auc = predict_auc(dt_scaler_model,X_train_scaled,y_train,X_test_scaled,y_test)"
1578 | ]
1579 | },
1580 | {
1581 | "cell_type": "markdown",
1582 | "metadata": {},
1583 | "source": [
1584 | "#### 将未归一化和归一化后的数据得到的模型AUC值进行合并"
1585 | ]
1586 | },
1587 | {
1588 | "cell_type": "code",
1589 | "execution_count": 35,
1590 | "metadata": {},
1591 | "outputs": [
1592 | {
1593 | "data": {
1594 | "text/html": [
1595 | "\n",
1596 | "\n",
1609 | "
\n",
1610 | " \n",
1611 | " \n",
1612 | " | \n",
1613 | " Not Scaled (%) | \n",
1614 | " Scaled (%) | \n",
1615 | "
\n",
1616 | " \n",
1617 | " \n",
1618 | " \n",
1619 | " kNN | \n",
1620 | " 0.720300 | \n",
1621 | " 0.763304 | \n",
1622 | "
\n",
1623 | " \n",
1624 | " LR | \n",
1625 | " 0.770490 | \n",
1626 | " 0.769134 | \n",
1627 | "
\n",
1628 | " \n",
1629 | " DT | \n",
1630 | " 0.776031 | \n",
1631 | " 0.776702 | \n",
1632 | "
\n",
1633 | " \n",
1634 | "
\n",
1635 | "
"
1636 | ],
1637 | "text/plain": [
1638 | " Not Scaled (%) Scaled (%)\n",
1639 | "kNN 0.720300 0.763304\n",
1640 | "LR 0.770490 0.769134\n",
1641 | "DT 0.776031 0.776702"
1642 | ]
1643 | },
1644 | "execution_count": 35,
1645 | "metadata": {},
1646 | "output_type": "execute_result"
1647 | }
1648 | ],
1649 | "source": [
1650 | "col_name = ['Not Scaled (%)', 'Scaled (%)']\n",
1651 | "row_name = ['kNN','LR','DT']\n",
1652 | "# 创建dataframe结构的变量models_auc_df,列索引设置为col_name,行索引设置为row_name,将未归一化和归一化的AUC值按照索引存放到对应的位置,\n",
1653 | "# 其中未归一化模型的AUC值分别为knn_model_auc,lr_model_auc,dt_model_auc,归一化后的模型AUC值分别为knn_scaler_model_auc,lr_scaler_model_auc,dt_scaler_model_auc\n",
1654 | "# 然后将数据models_auc_df的数据进行打印\n",
1655 | "\n",
1656 | "models_auc_df = pd.DataFrame([[knn_model_auc,knn_scaler_model_auc],[lr_model_auc,lr_scaler_model_auc],[dt_model_auc,dt_scaler_model_auc]],\n",
1657 | " columns=col_name,index=row_name)\n",
1658 | "models_auc_df"
1659 | ]
1660 | },
1661 | {
1662 | "cell_type": "code",
1663 | "execution_count": 36,
1664 | "metadata": {
1665 | "scrolled": true
1666 | },
1667 | "outputs": [
1668 | {
1669 | "data": {
1670 | "text/plain": [
1671 | "(array([0, 1, 2]), )"
1672 | ]
1673 | },
1674 | "execution_count": 36,
1675 | "metadata": {},
1676 | "output_type": "execute_result"
1677 | },
1678 | {
1679 | "data": {
1680 | "text/plain": [
1681 | ""
1682 | ]
1683 | },
1684 | "metadata": {},
1685 | "output_type": "display_data"
1686 | },
1687 | {
1688 | "data": {
1689 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXYAAAD6CAYAAAC1W2xyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAFxVJREFUeJzt3X94lXX9x/HnOyC3AENxgYqGICqKYLRMCvwejKnIpJr8UBDSBPyBadmVWlJRUqkVfS9/O90XFRVDwHUJqVRKWmE5EAgEUWTIZoM5pmvKiPT9/eMcGc4zzr2zsx98eD2ui8v7nPM+9/0ex+u1m8/53J/b3B0REQnHJ9q6ARERySwFu4hIYBTsIiKBUbCLiARGwS4iEhgFu4hIYBTsIiKBUbCLiARGwS4iEpiObXHQww47zHv37t0WhxYR2W+tWLHiLXfPSVXXJsHeu3dvSkpK2uLQIiL7LTPbEqUu0lCMmRWZ2XIzm9HI64eY2e/NrMTM7mlKoyIiklkpg93MCoAO7j4E6GNm/ZKUTQIedvdcoKuZ5Wa4TxERiSjKGXsMmJ/YXgoMTVJTBQwws27AUcDWjHQnIiJNFmWMvTNQntjeAQxOUvMXYBRwFbA+UfcRZjYNmAZw9NFHp9OriGTA7t27KSsro66urq1bkUZkZWXRq1cvOnXqlNb7owR7LZCd2O5C8rP8HwOXuXuNmV0DXAwU7l3g7oUfPpebm6tF4EXaSFlZGV27dqV3796YWVu3Iw24O1VVVZSVlXHMMcektY8oQzErqB9+GQSUJqk5BDjZzDoAXwQU3CLtVF1dHd27d1eot1NmRvfu3Zv1L6oowV4MTDKz2cA4YJ2ZzWpQ8wviZ+PvAIcC89LuSERanEK9fWvu55My2N29hvgXqC8Aw919tbvPaFDzD3c/yd27uHueu9c2qysRkQCUlpYmff71119v0eNGmsfu7tXuPt/dK1q0GxGRQNx8882sXLnyI89t27aN3bt388QTT/DII4+02LHb5MpTEWk/el+/JKP7K71p1J7tNWVvf+z1H37nCnJ69OSq63/EXbNvAuDya65Puq8N6/4JwAknndzo8d57711+cNWl1LxdTf8ju/HgrTc2eSgjNmYqyxbcG6l25q/vJjYkl9iXcuGIzyWtKS0tZevWrVx33XXcfvvtPPDAAyxbtoynn36ayZMnc/XVVzNx4kTOPfdcunbt2qReo1CwS7uX6eBJZe9gkpaxaN6DXPrta1PWvRIh2Bcv/C2DPv8FLr78am699mJKVr/MF045KWO9pmPu3LlMnz4dgFWrVjFlyhRefPFFOnfuvKfmwgsvpLi4mEmTJmX8+Ap2kYZmfrqVj/dO6x6vHeh7fH+WFD+25/F/du3ih9dcQeW2CnocfgQ//fUd3Dn7FzzzVPyX+uJFv+XeR3+XdF+f6Xk4Rbc/yhln53Pfr34EQF3dLi76zo8p+9d2uh3chfn33MwHHzhjpn2Pd9/bybG9j2LOb36SdH/v7dzJ5Kt+xPaqHZx8wrHc8fPvU/12DWMvvZb33/8Ax4kN2ffF9Zs2baJ///5AfPri7t27Wbp0KTNm1H89edppp3HjjTcq2PcbCgaRfTr/G1OYc9etDDsjD4CF8x7g2OP7c/MdRdw1+yYe/+1DXH39j+ndJ76CyVfHTWh0X7G8keyqq+OaqZP4w5cH8ZuZ36Xw4UUMOvE4Hr3rJub89nes3bCJQz59MN+6+HxGDPsiZ0+8km2VVfTI6f6x/RU+tIgBJ/Rl5nd/ScGU77Lm5Y08+ezfyB8xjG9PnUje+Zc36Wc988wzmTNnDqNHj2b06NHccMMNDB8+nOzsbHbu3NmkfUWl9dhFpNV1z/kMxxzbj5LlfwHg9Y2vcPLn4mfBAz+Xy+bXNkbe15bNm/hy7CvMf/p5KquqeWjh79nwWimnJoZjLho3mi+cchKdOnXkvnnFTLzyBna8/Q4763Yl3d8rm7bw+JPPEhszlde3lFNeUcnmN8oZdOJxAOQOOjFlT9nZ2dTWxicHjh8/npkzZ9KtWzdGjRrFwoULAdi8eTNHHXVU5J+zKRTsItImLpxyBSUv/BWAvsedwJqV8aW817xUQt/jTgDgoKws6na+B8SHNJJZNO9BnnlqCR06dGDA8X2p27WLE47tzYurXwbg57cWcd8jj1M0r5gxo77CvDt/TudPZSfdF8DxfT/Lt6dOYNmCe5l17RUcfWRPjj6yJ+s2bgJg1bpXUv5s55xzzp4AB3j11Vfp27cvBx10EB988AEA8+fPJz8/P+W+0qFgF5E20X/AQHJP+zIABRdMZtPG9Vx83jm8sXkTXx0bH3oZMmw4f3pqMd/4+lms/Pvfku5n4jcv43ePPcIlY/P5x6p1TBoziqkTvs7Kf24gNmYqK9duYNJ5o8g7/Yv84vY5nDHuUgDKK7Yn3d/UiQU8+czfOL3gEu6eu4CjjujBtAsLWLjkGWJjplLz73dT/mz5+fksWbKE7du3U1NTQ8+ePTnxxBMpLCxkxIgRbNy4kfLycgYOHJjOX11K1thvwZaUm5vrQd9oQ2PsGdXqs2KyGh/PbRGt/PmtX79+zxd7LS3ZdMeWNPATm1v1eI1Nd4T4mjzPPfccEyZ8/P+nu+++mwkTJnDwwQc3+v5kn5OZrUgsj75P+vJURKQF9OrVK2moA1x22WUtemwNxYiIBEbBLiISGAW7iEhgFOwiIgmlW99M+vzrW8pauZPmUbCLiAA333E/K/+5/iPPbausiq/G+IfneOTxJ9uos6bTrBiRA12mp+emmL6592qMPY/sxc/+9+4mr8Z4ydh8ih5bHK2dvVdjbETp1jfZ+mYF102/iNvnPMoDjy1m2YJ7eXrZciaPzefqKROYeOUNnJt3Ol27dG50P+3FARHsrT8PulUPJ7Jf2Xs1xpnfu4p1q19iwCmD27SnuQuWMP2icQCsWreRKRd8jRdXrfvIFaoXFpxD8VPPMmlMy1wtmkkaihGRVvWZnofzzFNL2LJ5EzN/eSsDThnMrro6rr3im3yj4GyuvGg8O3e+x3vv1nL5hWO4qGAkP7xmeqP727nzPb576Te4+LxzmP6DXwBQ/XYNI8ZfxvAx01i2fEXKnjZtKaN/vz5AYjXG//6XpX9ezsgzvrSn5rTBJ/PS2tTLCbQHCnYRaVWxvJFcOOVyrpk6iZt+dB3vv/8+Cx55gONPHMADi55ixMjRvLZhPZXbt3HBxVMpnFfMm2VvUFWZfAmAhQ/HV4acs/D3/Gv7W6x5eSOFDy8if8Qwnl1QSKeOTRuYOPP001j8x+fpdXgPRl/0HZ7964sAZGcd1OjCYe1NpGA3syIzW25mMxp5/XIzW5b4s8rM7slsmyISir1XY6yueosli+ZT+tpGBpzyeSC+RO+AUwbTsWMnFs2by/evmkbN29XU1dUl3V/ppld55qnFXDI2P/3VGLMOovbd+GJj4796FjOvuZRun+7KqK8MZeHv/wTA5q3lHHVEj0z8FbS4lMFuZgVAB3cfAvQxs34Na9z9LnePuXsMeB6Ido8pETng7L0a47HH92fXrjp6H3sca1fH7w96322/ZtG8B3n80bnkjRrNTbffR/anPtXo/nr37cfESy6n6LHF6a/GeMaXWbjkT3sev7r5Dfp+thcHffKTfPBBfD2t+U/8gfwRw5rzo7eaKGfsMWB+YnspMLSxQjM7Eujh7h9b4cvMpplZiZmVVFZWptOriARg79UY165aSf554znvgslsWLuGS8bms37tGvILxjNkWIyi23/D1PGjAdhekXyOecGEyfx12R+5+Lxz0l+NMe90lvzpL2x/awc1/66l52cO48Tj+lD48CJGDDuVjZu2UF6xnYGJfwW0dylXdzSzIuBWd19tZmcCg939pkZqfw78wd2f3dc+W3t1R60OuH/T55dZWt0xubI3t/Hc31cy4esjP/ba3Q8uYMLXz+bgrl0++sI+Vndsruas7hjljL0W+HDOT5fG3mNmnwCGA8si7FNEpF3pdUSPpKEOcNnkMR8P9XYsSrCvoH74ZRBQ2kjdMODv3hYLvIuIyB5Rgr0YmGRms4FxwDozm5Wk7izguUw2JyItQ+df7VtzP5+UEzzdvcbMYkAecIu7VwCrk9T9oFmdiEiryMrKoqqqiu7duzf5Un5pee5OVVUVWVnpX8Ieaea+u1dTPzNGRPZjvXr1oqysjNaYnbatemeLH2Nv662VZ9y9sz51TRqysrLo1atX2u8/INaKEZF6nTp14phjjmmVY43UjKY2oSUFREQCo2AXEQmMgl1EJDAKdhGRwCjYRUQCo2AXEQmMgl1EJDAKdhGRwCjYRUQCo2AXEQmMgl1EJDAKdhGRwCjYRUQCo2AXEQmMgl1EJDAKdhGRwEQKdjMrMrPlZjYjRd2dZnZuZloTEZF0pAx2MysAOrj7EKCPmfVrpG4Y0NPdn8hwjyIi0gRRzthj1N/vdCkwtGGBmXUC7gVKzeyrGetORESaLEqwdwbKE9s7gB5JaiYDLwO3AKea2bcaFpjZNDMrMbOS1riJrojIgSpKsNcC2YntLo2853NAobtXAA8BwxsWuHuhu+e6e25OTk66/YqISApRgn0F9cMvg4DSJDWvAX0S27nAlmZ3JiIiaekYoaYYeN7MjgBGAueb2Sx333uGTBHwf2Z2PtAJGJP5VkVEJIqUwe7uNWYWA/KAWxLDLasb1PwbGNsiHYqISJNEOWPH3aupnxkjIiLtmK48FREJjIJdRCQwCnYRkcAo2EVEAqNgFxEJjIJdRCQwCnYRkcAo2EVEAqNgFxEJjIJdRCQwCnYRkcAo2EVEAqNgFxEJjIJdRCQwCnYRkcAo2EVEAqNgFxEJjIJdRCQwkYLdzIrMbLmZzWjk9Y5m9oaZLUv8OTmzbYqISFQpg93MCoAO7j4E6GNm/ZKUDQTmuXss8eefmW5URESiiXLGHqP+RtZLgaFJak4D8s3sH4mz+4/dJNvMpplZiZmVVFZWpt2wiIjsW5Rg7wyUJ7Z3AD2S1LwIjHD3U4FOwDkNC9y90N1z3T03Jycn3X5FRCSFj51ZJ1ELZCe2u5D8l8Ead9+V2C4Bkg3XiIhIK4hyxr6C+uGXQUBpkpq5ZjbIzDoAXwNWZ6Y9ERFpqijBXgxMMrPZwDhgnZnNalDzU2AusApY7u5/zGybIiISVcqhGHevMbMYkAfc4u4VNDgjd/e1xGfGiIhIG4syxo67V1M/M0ZERNoxXXkqIhIYBbuISGAU7CIigVGwi4gERsEuIhIYBbuISGAU7CIigVGwi4gERsEuIhIYBbuISGAU7CIigVGwi4gERsEuIhIYBbuISGAU7CIigVGwi4gERsEuIhIYBbuISGAiBbuZFZnZcjObkaKuh5m9lJnWREQkHSmD3cwKgA7uPgToY2b99lH+KyA7U82JiEjTRTljj1F/I+ulwNBkRWZ2BvAuUNHI69PMrMTMSiorK9NoVUREoogS7J2B8sT2DqBHwwIz+yTwQ+D6xnbi7oXunuvuuTk5Oen0KiIiEUQJ9lrqh1e6NPKe64E73f3tTDUmIiLpiRLsK6gffhkElCapGQFMN7NlwClmdl9GuhMRkSbrGKGmGHjezI4ARgLnm9ksd98zQ8bdT/9w28yWufuUzLcqIiJRpAx2d68xsxiQB9zi7hXA6n3UxzLWnYiINFmUM3bcvZr6mTEiItKO6cpTEZHAKNhFRAKjYBcRCYyCXUQkMAp2EZHAKNhFRAKjYBcRCYyCXUQkMAp2EZHAKNhFRAKjYBcRCYyCXUQkMAp2EZHAKNhFRAKjYBcRCYyCXUQkMBkLdjM71MzyzOywTO1TRESaLlKwm1mRmS03sxmNvH4IsBg4FXjWzHIy2KOIiDRBymA3swKgg7sPAfqYWb8kZQOBa9z9Z8DTwODMtikiIlFFOWOPUX+/06XA0IYF7v5nd3/BzE4nfta+PGMdiohIk0QJ9s5AeWJ7B9AjWZGZGTAeqAZ2J3l9mpmVmFlJZWVlmu2KiEgqUYK9FshObHdp7D0eNx1YA4xO8nqhu+e6e25OjobgRURaSpRgX0H98MsgoLRhgZldZ2aTEw+7AW9npDsREWmyKMFeDEwys9nAOGCdmc1qUFOYqHkO6EB8LF5ERNpAx1QF7l5jZjEgD7jF3SuA1Q1qqhOvi4hIG0sZ7LAnuOenLBQRkTanJQVERAKjYBcRCYyCXUQkMAp2EZHAKNhFRAKjYBcRCYyCXUQkMAp2EZHAKNhFRAKjYBcRCYyCXUQkMAp2EZHAKNhFRAKjYBcRCYyCXUQkMAp2EZHAKNhFRAKjYBcRCUykYDezIjNbbmYzGnn902b2pJktNbPHzeyTmW1TRESiShnsZlYAdHD3IUAfM+uXpGwiMNvdzwQqgLMz26aIiEQV5WbWMepvZL0UGAq8uneBu9+518McYHvDnZjZNGAawNFHH51GqyIiEkWUoZjOQHliewfQo7FCMxsCHOLuLzR8zd0L3T3X3XNzcnLSalZERFKLcsZeC2QntrvQyC8DMzsUuA04LzOtiYhIOqKcsa8gPvwCMAgobViQ+LL0MeD77r4lY92JiEiTRQn2YmCSmc0GxgHrzGxWg5pLgMHADWa2zMzGZ7hPERGJKOVQjLvXmFkMyANucfcKYHWDmruAu1qkQxERaZIoY+y4ezX1M2NERKQd05WnIiKBUbCLiARGwS4iEhgFu4hIYBTsIiKBUbCLiARGwS4iEhgFu4hIYBTsIiKBUbCLiARGwS4iEhgFu4hIYBTsIiKBUbCLiARGwS4iEhgFu4hIYBTsIiKBUbCLiAQmUrCbWZGZLTezGfuo6WFmz2euNRERSUfKYDezAqCDuw8B+phZvyQ1hwAPAJ0z36KIiDRFlDP2GPU3sl4KDE1S8z4wHqhpbCdmNs3MSsyspLKysql9iohIRFGCvTNQntjeAfRoWODuNe7+zr524u6F7p7r7rk5OTlN71RERCKJEuy1QHZiu0vE94iISBuJEtIrqB9+GQSUtlg3IiLSbFGCvRiYZGazgXHAOjOb1bJtiYhIujqmKnD3GjOLAXnALe5eAaxupDaW0e5ERKTJUgY7gLtXUz8zRkRE2jF9ESoiEhgFu4hIYBTsIiKBUbCLiARGwS4iEhgFu4hIYBTsIiKBUbCLiARGwS4iEhgFu4hIYBTsIiKBUbCLiARGwS4iEhgFu4hIYBTsIiKBUbCLiARGwS4iEphIwW5mRWa23MxmNKdGRERaXspgN7MCoIO7DwH6mFm/dGpERKR1RDljj1F/v9OlwNA0a0REpBVEuZl1Z6A8sb0DGJxOjZlNA6YlHtaa2StNa3X/YXAY8FarHfAn1mqHOhDo89t/HQCf3WejFEUJ9logO7HdheRn+Slr3L0QKIzS1P7OzErcPbet+5D06PPbf+mzi4syFLOC+qGVQUBpmjUiItIKopyxFwPPm9kRwEjgfDOb5e4z9lFzWuZbFRGRKFKesbt7DfEvR18Ahrv76gahnqzmncy3ul85IIacAqbPb/+lzw4wd2/rHkREJIN05amISGCijLFLEmZ2EYC73594fD/wprv/wMxmJsp6N3zO3Wci7Ubic3nN3R9KPL6f+ASAOqAMmODuu9usQdmnxOd1CvBf4B7i0wFHAMcDq4Hvu/vyNmuwjeiMPbOmmllWhOekfftW4irqWuIhIe3blcBZwI+JXyh5PrDC3WMHYqiDgr3ZzOwkM3sW6AqsBSY2KEn2nLRzZmbEr8n4T1v3Iqm5exWwBDi9rXtpDxTszXM48DBwAfBv4A7g0gY1yZ6T9u024tdibAOeadtWpAmqgG5t3UR7oGBvniuJj8N+eJlvBbCB+NRP9vGctG/fAu4CNrmmje1PDiW+pMkBT8HePDcClyf++6HfAP/ToC7Zc9K+3QNcYmYd2roRSc3MuhG/OFL/wkLB3lx17r6V+Bn5aAB3fwn4895FyZ6TduWnZlZiZiVAPoC7VxMPifPatDOJ4jbgKeA6d9/Q1s20B7pASUQkMDpjFxEJjIJdRCQwCnYRkcAo2EVEAqNgFxEJjIJdRCQwCnYRkcD8P9xwrhKhMJvhAAAAAElFTkSuQmCC\n",
1690 | "text/plain": [
1691 | ""
1692 | ]
1693 | },
1694 | "metadata": {
1695 | "needs_background": "light"
1696 | },
1697 | "output_type": "display_data"
1698 | }
1699 | ],
1700 | "source": [
1701 | "# 为未归一化和归一化的数据绘制分组柱状图,可视化的图和下图相似即可\n",
1702 | "# 对models_auc_df数据进行可视化,要求设置图例到右下角,标题为未归一化和归一化数据的模型AUC值比较\n",
1703 | "\n",
1704 | "#解决图例中文显示问题,设置字体样式\n",
1705 | "plt.rcParams['font.sans-serif']=['SimHei'] \n",
1706 | "plt.rcParams['axes.unicode_minus']=False \n",
1707 | "\n",
1708 | "#创建画布\n",
1709 | "plt.figure(figsize=(20,12), dpi=120)\n",
1710 | "\n",
1711 | "models_auc_df.plot(kind='bar')\n",
1712 | "\n",
1713 | "plt.xticks(rotation=360)"
1714 | ]
1715 | },
1716 | {
1717 | "cell_type": "markdown",
1718 | "metadata": {},
1719 | "source": [
1720 | "依据对未归一化和归一化后的数据训练模型,通过对模型的AUC值进行比较,可以发现:归一化能有效提高KNN模型的AUC值,但是对逻辑回归和决策树的影响不大。"
1721 | ]
1722 | }
1723 | ],
1724 | "metadata": {
1725 | "kernelspec": {
1726 | "display_name": "Python 3",
1727 | "language": "python",
1728 | "name": "python3"
1729 | },
1730 | "language_info": {
1731 | "codemirror_mode": {
1732 | "name": "ipython",
1733 | "version": 3
1734 | },
1735 | "file_extension": ".py",
1736 | "mimetype": "text/x-python",
1737 | "name": "python",
1738 | "nbconvert_exporter": "python",
1739 | "pygments_lexer": "ipython3",
1740 | "version": "3.6.8"
1741 | }
1742 | },
1743 | "nbformat": 4,
1744 | "nbformat_minor": 2
1745 | }
1746 |
--------------------------------------------------------------------------------
/images/Random Forest.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teamowu/Machine-Learning/adc76bda05eb8a1265e3e733f8e4c4d89b43893e/images/Random Forest.png
--------------------------------------------------------------------------------
/images/jaccard系数.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teamowu/Machine-Learning/adc76bda05eb8a1265e3e733f8e4c4d89b43893e/images/jaccard系数.png
--------------------------------------------------------------------------------
/images/余弦相似性.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teamowu/Machine-Learning/adc76bda05eb8a1265e3e733f8e4c4d89b43893e/images/余弦相似性.png
--------------------------------------------------------------------------------
/images/曼哈顿距离.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teamowu/Machine-Learning/adc76bda05eb8a1265e3e733f8e4c4d89b43893e/images/曼哈顿距离.png
--------------------------------------------------------------------------------
/images/欧式距离.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teamowu/Machine-Learning/adc76bda05eb8a1265e3e733f8e4c4d89b43893e/images/欧式距离.png
--------------------------------------------------------------------------------
/关联分析/README.md:
--------------------------------------------------------------------------------
1 | # 关联分析
2 | 寻找最终能够**解释数据变量之间关系的规则**,来找出大量多源数据集中有用的关联规则,它是从大量数据中发现多种数据之间关系的一种方法。
3 | 另外,它也可以基于时间序列对多种数据间的关系进行挖掘。
4 |
5 | ## 常见的关联算法
6 | - Apriori
7 | - FP-Growth
8 | - PrefixSpan
9 | - SPADE
10 | - AprioriAll
11 | - AprioriSome
12 |
13 | ## 典型的销售应用场景
14 | - 购物篮分析
15 | - 优化商品布局,e.g.超市可以把关联度高的商品摆放在一起,便于顾客一起挑选。
16 | - 设计促销方案,e.g.两种关联度高的商品一起搭配购买可以享受价格优惠。
17 | - 快速商品推荐,通常在电商业务中使用。e.g.顾客浏览某一商品,页面上会推荐“经常一起购买的产品”或者90%的顾客也看了如下商品“等规则进行推荐。
18 |
19 | ## 关联分析中的关键指标
20 | - 支持度(support):
21 | - 置信度(confidence)
22 | - 提升度(Lift):当Lift>1, 应用关联规则比不应用关联规则能产生更好的结果;反之,规则具有负相关的作用,该规则为无效规则。
23 | - 做关联规则评估时,需要综合考虑支持度、置信度和提升度3个指标,支持度和置信度的值越大越好。
24 | 。
25 | **频繁规则 & 有效规则**:
26 | - 频繁规则:关联结果中支持度和置信度都比较高的规则
27 | - 有效规则:关联规则真正能促进规则中的前/后项的提升。
28 | - 频繁规则 != 有效规则
29 |
30 | ## 关联分析的更多应用场景
31 | **相同维度下的关联分析**:
32 | - 网站页面浏览关联分析
33 | - 广告流量关联分析
34 | - 用户关键字搜索关联分析
35 |
36 | **跨维度的关联分析**:
37 | - 不同场景下关联分析
38 | - 相同场景下的事件分析
39 |
--------------------------------------------------------------------------------
/分类算法/README.md:
--------------------------------------------------------------------------------
1 | # 2.2 如何选择分类分析算法
2 | - 文本分类: **朴素贝叶斯**,如电子邮件中的垃圾邮件识别。
3 | - 若训练集较小,选择高偏差且低方差的分类算法效果更好,如**朴素贝叶斯、支持向量机,因为这类算法不容易过拟合**。
4 | - 如果关注的是算法模型的计算时间和模型易用性,那么支持向量机、人工神经网络不是好的选择。
5 | - 如果重视算法的准确率,应选择精度较高的方法,如**支持向量机或GBDT、XGBOOST等基于Boosting的集成方法**。
6 | - 如果注重效果的稳定性或模型鲁棒性,那么应选择**随机森林、组合投票模型等基于Bagging的集成方法**。
7 | - 如果想得到有关预测结果的概率信息,然后**基于预测概率做进一步的应用,那么使用逻辑回归是比较好的选择**。
8 | - 如果**担心离群点或数据不可分并且需要清晰的决策规则,那么选择决策树**。
9 |
--------------------------------------------------------------------------------
/回归分析/README.md:
--------------------------------------------------------------------------------
1 | # 回归分析
2 | 如何选择回归分析算法?
3 | - 简单线性回归。适合数据集本身结构简单、分布规律有明显线性关系的场景。
4 | - 自变量数量少或降维后得到了可以使用的二维变量(包括预测变量)可以直接通过散点图发现自变量和因变量的相互关系,然后选择最佳回归方法。
5 | - 如果经过基本判断发现自变量间有较强的共线性关系,那么可以使用对多重共线性(自变量高度相关)能灵活处理的算法,例如岭回归。
6 | - 如果**数据集噪音较多,推荐使用主成分回归**,因为各主成分回归通过对参与回归的主成分的合理选择,可以去掉噪音。另外,对各个主成分间相互正交,能解决多元线性回归中的
7 | 共线性问题。这些都能有效地提高模型的抗干扰能力。
8 | - 如果高维度变量下,使用正则化回归方法效果最好,例如Lasso,Ridge和ElasticNet;或者使用逐步回归从中挑选出影响显著的自变量来建立回归模型。
9 | - 如果要同时验证多个算法,并想从中选择一个来做好的拟合,可以使用交叉检验做多个模型的效果对比,并通过R-square,Adjusted R-squre,AIC,BIC以及各种残差、误差
10 | 项指标做综合评估。
11 | - 如果注重模型的可解释性,那么容易理解的线性回归,指数回归,对数回归,二项或多项式回归要比核回归,支持向量机等更适合。
12 | - 集成或组合回归方法。一旦确认了几个方法,但又不确定该如何取舍,可以将多个回归模型做集成或组合方法使用,即同时将多个模型的结果通过加权、均值等方式确定最终输出结果值。
13 |
--------------------------------------------------------------------------------
/聚类分析/README.md:
--------------------------------------------------------------------------------
1 | # 如何选择聚类分析算法
2 | 聚类算法有几十种之多,聚类算法的选择主要参考一下因素:
3 | - 如果数据量是高维的,那么选择**谱聚类**,它是子空间划分的一种。
4 | - 如果数据量为中小规模,例如**100万条以内**,k均值将是比较好的选择;如果**超过100万条**,可以考虑使用**MiniBatchKmeans**;
5 | - 数据集中有**噪点**(离群点),那么使用基于密度的**DBSCAN**可以有效应对这个问题。
6 | - 如果追求更高的**分类准确度**,那么选择**谱聚类**将比K均值准确度更好。
7 |
--------------------------------------------------------------------------------
/聚类分析/客户特征的聚类与探索性分析.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 项目背景\n",
8 | "\n",
9 | "某天,业务部门拿到了一些关于客户的数据找到数据部门,苦于没有分析入手点,希望数据部门通过对这些数据的分析,给业务部门一些启示,或者提供后续分析或业务思考的建议。\n",
10 | "\n",
11 | "基于上述场景和需求,本次分析的交付需求如下:\n",
12 | "- 这是一次EDA任务,且业务方没有任何先验经验提供给数据部门。\n",
13 | "- 分析结果用于业务的知识启发或后续分析的深入应用。\n",
14 | "- 除数据统计和基本展示类的探索性分析以外的数据挖掘。\n",
15 | "\n",
16 | "#### 数据源特征:\n",
17 | "- USER_ID:用户ID列,整数型。该列作为用户唯一ID标志,这意味着该列不能作为聚类的特征,而只能作为用户聚类后的所属类的标记。\n",
18 | "- AVG_ORDERS:平均用户订单数量,浮点型。\n",
19 | "- AVG_MONEY:平均订单价值,即每单的订单价格,浮点型。\n",
20 | "- IS_ACTIVE:是否活跃,通过其他模型得到的结果,字符串型。\n",
21 | "- SEX:性别,以0,1,2来表示性别未知、男和女3个值。\n",
22 | "\n",
23 | "#### 分析思路:\n",
24 | "- 字符串型特征不能直接作训练,因为sklearn的对象一般都是数值型的向量矩阵或稀疏矩阵,而不能是原生字符串。\n",
25 | "- SEX本质是分类型变量,不能直接参与距离计算。\n",
26 | "- AVG_ORDERS和AVG_MONEY具有明显的量纲差异,需要作无量纲化处理。\n",
27 | "- 分割ID列。"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 57,
33 | "metadata": {},
34 | "outputs": [],
35 | "source": [
36 | "#导入包\n",
37 | "import pandas as pd\n",
38 | "import numpy as np\n",
39 | "import matplotlib.pyplot as plt\n",
40 | "%matplotlib inline\n",
41 | "from sklearn.preprocessing import MinMaxScaler\n",
42 | "from sklearn.cluster import KMeans\n",
43 | "from sklearn.metrics import calinski_harabaz_score,silhouette_score"
44 | ]
45 | },
46 | {
47 | "cell_type": "code",
48 | "execution_count": 15,
49 | "metadata": {},
50 | "outputs": [],
51 | "source": [
52 | "raw_data = pd.read_csv('cluster.txt')\n",
53 | "#数值型特征\n",
54 | "numeric_feature = raw_data.iloc[:,1:3]"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 18,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "name": "stdout",
64 | "output_type": "stream",
65 | "text": [
66 | "[[0.64200477 0.62591687]\n",
67 | " [0.91169451 0.80440098]\n",
68 | " [0.69451074 0.39608802]\n",
69 | " ...\n",
70 | " [0.3221957 0.17359413]\n",
71 | " [0.42004773 0.31295844]\n",
72 | " [0.64916468 0.40831296]]\n"
73 | ]
74 | }
75 | ],
76 | "source": [
77 | "#标准化\n",
78 | "scaler = MinMaxScaler()\n",
79 | "scaled_numeric_feature = scaler.fit_transform(numeric_feature)\n",
80 | "print(scaled_numeric_feature[:,:2])"
81 | ]
82 | },
83 | {
84 | "cell_type": "code",
85 | "execution_count": 25,
86 | "metadata": {},
87 | "outputs": [
88 | {
89 | "data": {
90 | "text/plain": [
91 | "KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,\n",
92 | " n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',\n",
93 | " random_state=0, tol=0.0001, verbose=0)"
94 | ]
95 | },
96 | "execution_count": 25,
97 | "metadata": {},
98 | "output_type": "execute_result"
99 | }
100 | ],
101 | "source": [
102 | "#训练模型\n",
103 | "n_cluster = 3\n",
104 | "model_kmeans = KMeans(n_clusters = n_cluster, random_state=0)\n",
105 | "model_kmeans.fit(scaled_numeric_feature)"
106 | ]
107 | },
108 | {
109 | "cell_type": "code",
110 | "execution_count": 27,
111 | "metadata": {},
112 | "outputs": [
113 | {
114 | "name": "stdout",
115 | "output_type": "stream",
116 | "text": [
117 | "sample: 1000 \t features: 4\n"
118 | ]
119 | }
120 | ],
121 | "source": [
122 | "#模型效果评估\n",
123 | "n_samples,n_features = raw_data.iloc[:,1:].shape #总样本数,总特征数\n",
124 | "print('sample: %d \\t features: %d' % (n_samples,n_features))"
125 | ]
126 | },
127 | {
128 | "cell_type": "code",
129 | "execution_count": 31,
130 | "metadata": {},
131 | "outputs": [
132 | {
133 | "name": "stdout",
134 | "output_type": "stream",
135 | "text": [
136 | "\n",
137 | " unspuervised_score: \n",
138 | " ------------------------------------------------------------\n",
139 | " silh c&h\n",
140 | "0 0.634086 2860.821834\n"
141 | ]
142 | }
143 | ],
144 | "source": [
145 | "#非监督式评估方法\n",
146 | "silhouette_s = silhouette_score(scaled_numeric_feature, model_kmeans.labels_, metric='euclidean')\n",
147 | "calinski_harabaz_s = calinski_harabaz_score(scaled_numeric_feature, model_kmeans.labels_) # Calinski和harabaz得分\n",
148 | "unspuervised_data = {'silh':[silhouette_s], 'c&h':[calinski_harabaz_s]}\n",
149 | "unspuervised_score = pd.DataFrame.from_dict(unspuervised_data)\n",
150 | "print(\"\\n\",'unspuervised_score:', '\\n', '-'*60)\n",
151 | "print(unspuervised_score)"
152 | ]
153 | },
154 | {
155 | "cell_type": "markdown",
156 | "metadata": {},
157 | "source": [
158 | "上述结果中,显示了聚类的效果还不错。以silh为例,当其值>0.5时,说明聚类质量较优。优秀与否的基本原则是不同类别间是否具有显著的区分效应。"
159 | ]
160 | },
161 | {
162 | "cell_type": "code",
163 | "execution_count": 35,
164 | "metadata": {},
165 | "outputs": [
166 | {
167 | "name": "stdout",
168 | "output_type": "stream",
169 | "text": [
170 | " USER_ID AVG_ORDERS AVG_MONEY IS_ACTIVE SEX labels\n",
171 | "0 1 3.58 40.43 活跃 1 2\n",
172 | "1 2 4.71 41.16 不活跃 1 2\n",
173 | "2 3 3.80 39.49 不活跃 2 1\n",
174 | "3 4 2.85 38.36 不活跃 1 0\n",
175 | "4 5 3.71 38.34 活跃 1 1\n"
176 | ]
177 | }
178 | ],
179 | "source": [
180 | "#合并数据和特征\n",
181 | "kmeans_labels = pd.DataFrame(model_kmeans.labels_, columns = ['labels'])\n",
182 | "#组合原始数据和标签\n",
183 | "kmeans_data = pd.concat([raw_data, kmeans_labels], axis=1)\n",
184 | "print(kmeans_data.head())"
185 | ]
186 | },
187 | {
188 | "cell_type": "code",
189 | "execution_count": 41,
190 | "metadata": {},
191 | "outputs": [
192 | {
193 | "name": "stdout",
194 | "output_type": "stream",
195 | "text": [
196 | " record_count record_rate\n",
197 | "labels \n",
198 | "0 332 0.332\n",
199 | "1 337 0.337\n",
200 | "2 331 0.331\n"
201 | ]
202 | }
203 | ],
204 | "source": [
205 | "#计算不同聚类类别的样本量和占比\n",
206 | "label_count = kmeans_data.groupby(['labels'])['SEX'].count()\n",
207 | "label_count_ratio = label_count / kmeans_data.shape[0]\n",
208 | "kmeans_record_count = pd.concat([label_count,label_count_ratio], axis=1)\n",
209 | "kmeans_record_count.columns = ['record_count', 'record_rate']\n",
210 | "print(kmeans_record_count.head())"
211 | ]
212 | },
213 | {
214 | "cell_type": "code",
215 | "execution_count": 44,
216 | "metadata": {},
217 | "outputs": [
218 | {
219 | "name": "stdout",
220 | "output_type": "stream",
221 | "text": [
222 | " AVG_ORDERS AVG_MONEY\n",
223 | "labels \n",
224 | "0 2.022349 38.980602\n",
225 | "1 3.987389 39.028754\n",
226 | "2 3.958610 40.996254\n"
227 | ]
228 | }
229 | ],
230 | "source": [
231 | "#计算不同聚类类别数值型特征\n",
232 | "kmeans_numeric_features = kmeans_data.groupby(['labels'])['AVG_ORDERS', 'AVG_MONEY'].mean()\n",
233 | "print(kmeans_numeric_features)"
234 | ]
235 | },
236 | {
237 | "cell_type": "code",
238 | "execution_count": 52,
239 | "metadata": {},
240 | "outputs": [],
241 | "source": [
242 | "#计算不同聚类类别分类型特征\n",
243 | "active_list = []\n",
244 | "sex_gb_list = []\n",
245 | "unique_labels = np.unique(model_kmeans.labels_)\n",
246 | "for each_label in unique_labels:\n",
247 | " each_data = kmeans_data[kmeans_data['labels']==each_label]\n",
248 | " active_list.append(each_data.groupby(['IS_ACTIVE'])['USER_ID'].count()/each_data.shape[0])\n",
249 | " sex_gb_list.append(each_data.groupby(['SEX'])['USER_ID'].count()/each_data.shape[0])\n",
250 | "\n",
251 | "kmeans_active_pd = pd.DataFrame(active_list)\n",
252 | "kmeans_sex_gb_pd = pd.DataFrame(sex_gb_list)\n",
253 | "kmeans_string_features = pd.concat((kmeans_active_pd,kmeans_sex_gb_pd), axis=1)\n",
254 | "kmeans_string_features.index = unique_labels"
255 | ]
256 | },
257 | {
258 | "cell_type": "code",
259 | "execution_count": 53,
260 | "metadata": {},
261 | "outputs": [],
262 | "source": [
263 | "#合并所有类别的分析结果"
264 | ]
265 | },
266 | {
267 | "cell_type": "code",
268 | "execution_count": 55,
269 | "metadata": {},
270 | "outputs": [
271 | {
272 | "name": "stdout",
273 | "output_type": "stream",
274 | "text": [
275 | " record_count record_rate AVG_ORDERS AVG_MONEY 不活跃 活跃 \\\n",
276 | "0 332 0.332 2.022349 38.980602 0.487952 0.512048 \n",
277 | "1 337 0.337 3.987389 39.028754 0.495549 0.504451 \n",
278 | "2 331 0.331 3.958610 40.996254 0.504532 0.495468 \n",
279 | "\n",
280 | " 0 1 2 \n",
281 | "0 0.003012 0.990964 0.006024 \n",
282 | "1 0.014837 0.014837 0.970326 \n",
283 | "2 0.984894 0.009063 0.006042 \n"
284 | ]
285 | }
286 | ],
287 | "source": [
288 | "features_all = pd.concat((kmeans_record_count,kmeans_numeric_features,kmeans_string_features), axis=1)\n",
289 | "print(features_all.head())"
290 | ]
291 | },
292 | {
293 | "cell_type": "code",
294 | "execution_count": 59,
295 | "metadata": {},
296 | "outputs": [
297 | {
298 | "data": {
299 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAs0AAAH5CAYAAAB6TAOnAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAgAElEQVR4nOzde3xU1bn/8c/DVe43FVpRUrGiuAuIoICCoSpesN6KoFaKimLBctoi3ioq+aktltZaherxUqucUlH0KOApGJUICBIQuWxRQEFRW1Ek3FECrN8fewdCmCSTZGb2JPN9v17zYmbNXms/e1iZeWbN2mubcw4RERERESldragDEBERERFJd0qaRURERETKoaRZRERERKQcSppFRERERMqhpFlEREREpBxKmkVEREREyqGkWUREREQOYmY3m9l6M1trZhea2TVmtsvMvix2O9rM7jCzv4d1WptZgZm1ijj8pFDSLJIgZjbZzP4Y3l9gZpcXe+73ZvbX8P5AM/vczNaY2ZlxtHuDmf07fIO6KSwrevPaEN7+aIGi8gIzW2RmvcPts81sd4k3u15mNtbMtpvZV2b2hZndXGy/B71hJvr1kkAy+o2Z/d3MtoV94ggzc2Y2NnyutP70tZnVL1Z/bBn95kkzu6XY/l4ys+sS/uLIQcL/l18Ue1zhv9Hi/a1Y2Tlmti5sq6hPjA7/v3eZ2Zbwft8yYonZd0tLtKryOkjymVlXYDDQAbgMeAqoC7zinGtT7PYZ8ChwQZgo3wBMcs59E1XsyVQn6gBEapC+wJfh/TeA3sAL4eMzgIfMrA0wATgdaAC8YmbHOef2xmrQzDoADwA9gD3AYjN7M3z6FefcFWZ2ODAHmFtUDvwM+DnwqpkdF5Yvcc71KNF+P2CCc+52M8sC8s3sdaA2B94wOwCzzKytc66wEq+LlC3h/SbUGDgG+EFRQTn96XCCD8d/lmgnVr/ZDMwwsz8B3wdOA66M+4ilykokNRX5Gy3e3zCzFsD/AD8BPgTeM7M3nHN/BP5owQjiO865x8ppN2bfBZoQvldV5PgkcicBXzvndgFLzewu4LBYGzrnNpvZJGAYMJSgj9VIGmkWSQAzOwlYDzQPv22/SfABQjh6dwqQR/DBNMc5t8Y5txzYCZxYRtMXArOcc6udc2uBmUD/4hs45zYCrxN8cBaV7XXOPQ28S5AIlcs59wnwTtjO/jdM59xSoNQ3TKm8JPYbgI8J/h9PCu9D2f1pK3BjPHE751YCq8P2fgE87pz7Lp66kjAV/huN0d8ALgYWOefynXNbgVwql/SU1nelepoD9DCzR83s+865x4FtZWz/Z+AOYEH4WVIjKWkWSYyzCD4g5hJ84LwN/NDMmgGnAh+GP1edCKwrVu8+YEsZ7R4LfFrs8Xogq/gG4YdfNrAqRv2lwAnxHICZHQN0C9s55A3TOVfWG6ZUTrL6DYDPgaTZD8vK6k+vA8ebWVz9BXgQGAVcDfw1zjqSOJX5Gy3Z3wA8Dn7v+B0wrRLxlNZ3pRpyzn0K9ALaA6vNbEj41MXFptnMLVblC2AjkJ/iUFNKSbNIYvwYeCu8nRWOur1D8KbTm+CnS4BmBKOEADjn/hnOCSvNYUDxEbzdBD/PQ/jmBXxFMGIY64NuO8HP9ABdi73ZfVJsm1+a2VfAGmC8c25ZGW+YkljJ6jcAHxAkzCdyICkqqz8VEsxbHFainZj9xjn3GtAKyHXOfV3+oUoiVfJv9KD+FpY1J3ifKGp3vXPui0rEU1rfhdITLUljzrkVzrl+wFUE85brc/Cc5t7FNr8IKABGmFmNzS1r7IGJpIqZ1Qb6AE8D4zjwYVT0c2XxD5DdQP3w5KwvzWyzmZ1dRvM7Ofgn1/ocSJ5eAb5H8DN5rnPOxajfiAMfiEuKvdllFdtmAsFo43bg1aLCkm+Y4XxYSZAk9xuAjwgSqobArrCsrP4E8DjByHHxbUrrNxBM/3m3nDgkSSryN1pGfysk6AdF211mZudXMqRYfRdKT7QkTZnZfUVfxJxz04DZBPPTS3NbeFsD/DT5EUZDSbNI1Z0CrHPOtXbOHQE0Ds8Of4NgZKc7B07SWwf8wDn3tXOuDUHCUdYJuWs5eDpGO4r9TB8myhOAm0qp3wl4v7wDcM7tBP4GjIBS3zB/VF47UiHJ7DcAe4GmBD+ZFimvP31GMFpY2aRJUqQSf6Ol9bePCKbtFLmAYiePVlCsvivV03rgWjNrYGZHEvStmFPCLFilqalz7nXgLwTJc42kpFmk6n4MLCz2eGFYtphgPvEHzrkd4XMvA+ea2YnhG1GXctqeAfQzsw5mdizQj2KjwaFngN7hnGQAzKyWmV0dtv9SnMcxARhsZo2I/Ya5LM52JD7J7DdFVhFM0ygST396lCDZlvRW0b/R0vrbS8DZZvYjM2sNnEflT+CL1Xelevobwa+YHxP0lRyCXyWKT7X50sx+SpAkPwL7p201jOOXsGpJS86JVN1ZBEs2FVlIMD/1GTN7i+BkPACcc6vNbAQwC/iGA6saxOScW2PBerh5BF9y73TOrTKznsW22W5mzxKsYrCa4Gz4bwhO/jrXOfeNmUE4N7VY8+NK7OtTM5tDsFzd3whOCvyYYA5sjnNuTbwviMQlaf2mmA+Az4C2YTvl9qdwH8VPOjyk3zjnHopz/5I8Ff0bLau/DQamEkzluT9cHaXCnHN7S/bd0MUl+tBNzrkXK7MPSQ3n3B4OPb8B4O8xyg76v3TOdUxGTOnAYk+DFBERERGRIhppFkkDJUZhinzlnOuU8mCk2lC/kZLUJ0SSRyPNIiIiIiLl0ImAIiIiIiLlUNIsIiIiIlIOzWmWGu3www93WVlZUYchFfTuu+9uDNeSTRn1lepJfUXiEUU/AfWV6qisvqKkWWq0rKwsFi9eHHUYUkFm9mmq96m+Uj2pr0g8ougnoL5SHZXVVzQ9Q0RERESkHEqaSzCzsWa23cy+MrMvzOxmM7vGzHaVuArO0eH2V4fbfWxmlxZr5+6wjc/M7LLS2i5RvtnMZpuZV9EYSzw/2cz+WOzx6DDmXWa2Jbzft6zjEhEREZEDlDTHNsE5dyRwOsHlIZsBrzjn2hS7fWZmHYA/An3C2wQza2lm5wDXEVzW9DLgaTNrGattM+tcVA4cAbwJ/J+Z1a9IjMXaAehLcPUnAJxzf3TOtQGmALeF8c8Onz7kuCr8aomIiIjUcEqay+Cc+wR4Bygtgb0SmOqc+9g59wWwAjgNuAj4p3Nug3NuEfA+8ONS2u5QrKzQOXcvsIcg8a1IjB0AzOwkYD3Q3MxaxXWgIiIiIlImJc1lMLNjgG7Ad6Vs4gGrij0eCSwCjgWKTyRfD2SV0nbx+kWWASdUMMaids4C8oC5xJl4i4iIiEjZlDTH9ksz+wpYA4wHtgAXF5v3OzfcrjmwvaiSc26Nc24jcBgHJ9q7gQax2nbOLYux/+1A44rEWKydHwNvhbezSqtcTKzjEhEREZFilDTHNoFgZHg78GpYVnzub++wrJBiUzfM7AYzOxXYSZA4F6kflpXWdkmNKJaMxxujmdUmmFv9NDCO+JLmWMclIiIiIsUoaS6Fc24n8DdgRBmbfUQwFaPIIOBwYC0HT8doB6yrQNs/IpgHXdEYTwHWOedahwtzN9ZqGCIiIiJVp6S5bBOAwQQjv7FMAa4ys6PClTROJjgpbxpwhZm1NrPTgBOB2SXqTgAGm9n+ts2sjpndDhjBvOS4Ywzb+TGwsNhzCylxAqKIiIiIVJyS5jI45z4F5hBMwyg+9/dLM/upc24ecDfBSXf/AoY65zY5594AngR84EXgGudcQSlt/yws+iXwNXAmcK5zrrCCMf6MYDpGyaS5vCkahxxXPPsVERFJhQ0bNlBYGNdHoqShzz6rOSvZKmkuwTk31jl3e7HHlzrnHnfONSixnvGL4fN/c84dG95eLlbvXufcEc65ts65/y2n7bHOucbOuRbOufOdcx9XMsZznHPPFCsf55z7ebHH1zjnHiv2+O+lHZeIiEhF7Nmzh2OOOYbs7Gyys7NZsWIFO3fupEuXLmXW27JlC+effz79+vXj0ksvZffu3UyYMIHu3buzY8cOZs2aRd26dVN0FJlr3bp19O/fn969e3PzzTcf8rgsGzZsoHfvA6dF3XHHHZx77rk455g9u+QP7dWXkuY0VmIEuOi2POq4RERESlq+fDlXXnkleXl55OXl0bFjRwYOHMjmzZvLrPePf/yDUaNG8dprr9GmTRtmzpzJ0qVLuf7661m0aBGNGpU2Q1IS6bbbbuOuu+5i7ty5fP7555x22mkHPc7Ly4tZr6CggCFDhrBjx479ZV9//TVdu3blvffe45hjjknRESSfkuY0VmIEuOjWKeq4RERESnrnnXeYMWMGp556KkOHDmXPnj08/vjjZGVllVlvxIgRnHPOOUCQbB155JE45ygsLOS1117j/PPPT0H0snr1arp27QrAkUceSaNGjQ56vGXLlpj1ateuzZQpU2jatOn+Mucce/bsYc6cOZx55pnJDz5FlDSLiIhIlXXv3p3XX3+d/Px8CgsLmTVrFt///vfjrr9gwQIKCgro0aMH/fr1Y8aMGbRt25aLLrqoRv3En64GDBhATk4O06dPZ+bMmQwdOvSgx2edFfsUqaZNm9KsWbODyjzP49NPP6VWrVr06dOHDz74IBWHkHR1og5AREREqr9OnTpRv35w6YJu3bqxZs2auOtu2rSJkSNH8uKLwWk1gwYNol27dqxdu5b+/fvz4osv0revLnKbTGPGjGHevHmMHz+eIUOGHPK4cePyrrl2wG9+8xuOO+44NmzYwGWXXcarr77KiSeemMToU0MjzSIiNcymTZvIzc1l48aNUYciKZIOK0wMHjyYZcuWsXfvXl5++WU6d+4cV73du3dz+eWX8/vf/5527drtL1+zZg3t27enfv367Nu3L1lhSzFdunRh/fr1jBo1Kubjiti8eTNNmjSpUf9/SppFRGqQgoICLrzwQvLz8+nbty+LFi2K+wx4SZ0NGzZw8sknU1BQwAUXXEC3bt248cYbS90+1nbptsLE3XffzeDBg+nSpQs9e/bk7LPPPmSblStXMmbMmIPKnnrqKZYsWcL9999PdnY2U6ZMYevWrbRp04aOHTvy+OOPx2xLEm/8+PGMGjWKhg0bxnwc6/8vltWrV9O5c2dOPfVUHnnkkRozr9mcc1HHIJI03bp1c4sXL446DKkgM3vXOdctlfusKX3lrbfeon79+vTo0YPRo0fz1FNP8a9//YsePXowaNAghg8fTnZ2dtRhJkx17SuDBw9m0aJFjBgxglatWvGzn/2Mq666ilGjRtGt26GH8/DDDx+y3WOPPUb37t3p0KED33zzDT/9qZbZL00U/QRqzvtKJimrr2hOs4hIDVI0ojNnzhzy8/Np0aJFXGfAS+q8+eabNGrUiDZt2tCqVSt832fz5s189tlnHH300THrxNqu+AoT8Yz+iUjVaHqGiEgN45xjypQptGjRgmuvvTauM+AlNXbv3s29997LuHHjADjjjDP49NNPefjhhznxxBNp2bJlzHqxttMKEyKppaRZRKSGMTMmTpxIp06dOP744zn//PN58sknK3wGvCTeuHHjGDFiBM2bNwcgJyeHxx57jLvvvpsTTjiBp59+Oma9WNsNGjSIsWPH0rx58/0rTIhI8ihpFhGpQR544AGeffZZIDh7vXnz5lU6A14S6/XXX2fixIlkZ2ezdOlS1q1bx4oVK9i7dy8LFy7EzGLWKygoiLmdVpgQSR3NaRYRqUGGDRvGwIEDefLJJ/E8j379+jF27NiDzoCX6MyZM2f//ezsbB544AGuvfZaPv30U3r27MmVV17JypUrmTx5Mvfdd9/+be+4445Dtiu+wsSNN97I3XffnZSYs25/NSntJtMn4/pHHUK1ker/3+r8f6OkWUSkBmnRogW5ubkHleXk5EQUjZQlLy8PgPfff/+g8o4dOx6UMAOceuqph2wH7L/89NKlS5MTpIjsp+kZIiIiIiLlUNIs+5lZXiXrdTGzLgmM4ykzW2BmWkNJRERE0oKmZ6SQ5dgPgU5AVrFbO6At0IDg/8OA3cC3wGZgLfBxePsovK1097g9KQ2+bEUJc5V/HzSzy4DazrmeZvY3M/uhc25NVdsVERERqQolzUlkOXYUcFZ4+zFBchyPBuGtBfCDsH5xWy3H3gReA2a5e9zaCsVldhjw9zCezcBA59zOEtuMBfKcc3lmdk1YPAV4AWgKfANcDtwLXBrWGeycO8vMGgLPAkcCK5xzN4XP5wGLgE7OuXNLCS8beD68/xpwBqCkWURERCKlpDnBLMeOAK4FrgFOTNJumgKXhDcsx9YCrwCPuXvc6jjqDwOWOeeuMLNrAQ/Ij6NeR2Cfc66PmV0ENHbO3WFmqwCcc38v1r7vnBtrZi+ZWSfn3HKgB/Cwc+6WMvbRCPgivL8J6BpHXCIZQWe5p6/qtsKE/m9FKk5Jc4JYjmUDNwKXAfVSvPtjgd8Av7YcywUmAjPcPa60RTtPAIpWwf97HO03AHYBSwDfzF4jGP2dWcr2HYBeZpYNNAeOApYTJNIvlbOv7eH+ABqjefciIiKSBpSQVIHlWC3Lsastxz4AZgNXkPqE+aCQgH4Eo84fW46NthxrEGO7D4Hu4f3fAtfH2GY3cER4/7zw387A2865fgRTR3qH5buAhgAWrLi/CnjIOZcNjAHWh9ttj+MY3iWYklG0v0/iqCMiIiKSVEqaK+naf17bF3gPmEQwcptusoDxwCrLsZ9bzkGXmXoC6BrOMe5KcAwlTQNGmtljBPOXIUhg/8vM5gNtgMVheS5wmZm9TZBIPwGcb2ZzgF8An1Ug7peBwWb2IDAQqF6/eYqIiEiNpOkZFeT7fhbw0Chv1Fmvrn9159fffh11SOU5GngGGG45Ntzd45Y653YRJKQHCUeGi+77QJ8Y7R1yAp9zbhNwdoniMtsvjXNuazit4xzgD865LeXVEREREUk2jTRXgO/7w4CVwMVm1nhCrwmroo6pAnoAiy3H/mA5VjfqYCBYTaPE7RUA51yBc+5559yXUccoIiIiAhppjovv+42A/wZ+Vrz8xOYnnt6pZadVyzct7xBNZBVWG7gF6GU5NtDd4/4dZTDxjDyLiIiIpAONNJfD9/2TCNYW/lnJ58ys1iO9Htl5aK20dzrwnuVY36gDEREREakOlDSXwff9qwjWLy51veWW9Vue/NOsny5MXVQJcySQazl2e9SBiIiIiKQ7Jc2l8H3/euB/CJdSK8tvu/y2dR2rU5j8qBKuNvB7y7FHog5EREREJJ0paY7B9/0bgccJ1j0uV73a9bJu63zb/ORGlVS/tBz7S9RBiIiIiKQrJc0l+L4/AniUOBPmIoOOHdSleb3mm5ITVUr8l+XYn6MOQkRERCQdKWkuxvf94QSXoK5QwgxgZs0e6vnQisRHlVK/thx7MOogRERERNKNkuaQ7/vZQJXm9nZt1fX0Ds06fJyYiCLzG8uxX0YdhIiIiEg6UdIM+L7/feA5ghPjKs3M6kw8fWLypmjsBD4GdpRTVpbtwN5yt/qT5dipFQ1PREREpKbK+KTZ9/06wPNA60S017pB6+7ntT3v3US0dZBdwGTgC4KLYu8opSyWhQSnNe4GPiKerwb1gOctx1pWNWwRERGRmiDjk2ZgPMHFPhLm3m73NqtFrfLHcytiA3Au0AdoD/ynlLJYvgS6EiTX9eLeYztgkuVYhed3i4iIiNQ0GZ00+75/AfDrRLd7WO3DjvuV96u3E9poFnA08AlB8tu2lLLS7CWYxnFchfZ6AXBzxQIVERERqXkyNmn2fb8BMCFZ7V9z/DUdm9RtsiWhjTrgfaABB6ZYxCorqT2wGmgK/BNYV6G95liOHVOJaEVERERqjIxNmoHfAj9IVuO1rNbhfzj1D+8ltFED+hPMvl5VRllJHpANHAYcD6ys0F4bAg9VOFYRERGRGiQjk2bf948BRid7P6e3Pr1Xu8bt1ieksXnA0vD+twQJcKyy0mwCWhKMRrsK7/1Sy7HsCtcSERERqSEyMmkG7qPsFDMhzKzeo6c/+u+ENHYKsBz4G7CPYMpFrLJYvgUaA0cA7wLHViqCP+ikQBEREclUdaIOINV83z8BuDpV+zu68dE9erfpvWzul3M7V6mhBsDP4yiL5TAOJNTDKx1Bd+BS4KVKtyAiIiJSTWXiSPMIKnGZ7KoYf+r4ulRmUkT60ZUCRUREJCNlVNLs+34j4hubTahGdRt1vKHDDYldgi4afS3HTog6CBEREZFUy6ikGfgZ0CyKHd/U8ab2h9U+bGcU+06wyk/wEBEREammMi1pHhHVjmvXqv29+7vdnx/V/hNoiOVYw6iDEBEREUmljEmafd8/DajayXhVdM5R55zapkGb0i52XV00A66IOggRERGRVMqYpBm4MOoAzKzhX0//69qo40iAn0QdgIiIiEgqZVLSfHbUAQAc1/S4Xt0O71axa/Kln2zLsdIu2i0iIiJS42RE0uz7fjOCdYYjZ2b2UM+H9kQdRxU1B7pFHYSIiIhIqmRE0gz0JbiAdFpoVq9Zp6vaX7Ug6jiq6JyoAxARERFJlUxJmtMuwRvdaXTburXqfhd1HFWQFtNdRERERFIhU5Lm06IOoKS6teoefdfJd1Xn0ebTLMdSemVFERERkahkStLcPuoAYrmk3SWntKrf6uuo46ikw4DvRR2EiIiISCrU+KTZ9/3mBCeupR0za/JIr0c+jDqOKjg26gBEREREUqHGJ83AD6IOoCxeC6/XSS1OWhN1HJWU1q+tiIiISKJkQtIc92joli1bmD9/PgUFBZXe2caNGyksLIx7ezOrPbHXxG2V3mG0NNIsIiIiGSETkua4RkO3bNnCTTfdhO/7XHfddWzatGn/cxs3buTyyy8vte7kyZO54oor2LlzJ/Pnz6du3boVCrDVYa26Xtzu4vwKVUoPGmkWERGRjJAJSXNc85lXr17NLbfcwrBhwzj99NP54IMP9j/3pz/9iW+//bbUuqtWreKyyy7j/fffp0GDBpUK8u6T7z6ijtWJf4g6PTSNOgARERGRVMiEpDmuYd/u3bvTuXNnFi9ezIoVK+jcuTMACxcupEGDBhx++OGl1nXOsWfPHubPn88ZZ5xRqSDr1a73g9GdRs+vVOXo1Is6ABEREZFUqBN1ACkQ9zE655g5cyZNmzalTp06FBYW8t///d889NBD/OpXvyq1Xq9evXj55ZfJzs5m5MiRDBs2jFNPPbXCgV7Z/krvPzv/87bDVYv1j7/b+92/o46hJtqyZQtXXHEFe/fupVGjRkyZMoV69Q79fjJ06FBWrlxJ//79GTNmTNz1REREpOIyIWneF++GZsaYMWN45JFHyMvLY926dQwaNIimTcuehXDeeefxve99j88//5w+ffqQm5tbqaS5ltVaObrT6N4VrhidLVEHUBP94x//YNSoUZxzzjkMHz6cmTNnctFFFx20zUsvvcTevXtZsGAB1113HWvWrCE3N7fceiIiIlI5mZA0745no6eeeoojjjiCiy66iG3bttGkSRPeeecd8vPzee6551i1ahX33HMPOTk5MeuvX7+edu3asW3bNpxzlYlzB3B8ZSpGKK7XVipmxIgR++9//fXXHHnkkYdsk5eXx8CBAwHo168f8+bNi6ueiIiIVE4mzGku/Qy+YgYMGMD06dMZMmQI+/bto1evXjzzzDM8/fTTPP3003To0KHUhHn79u20atWK9u3bM3XqVHr06FGZOBcBrStTMUK7og6gJluwYAEFBQUx+9OOHTs46qijAGjZsiUbNmyIq56IiIhUTiaMNH8ez0bNmjXjiSeeKPX5p59+utTnGjduTK9evQCYOnVqBcMD4Eug4vM5orc+6gBqqk2bNjFy5EhefPHFmM83btyYXbuC7yzbt29n3759cdUTERGRysmEkeaPow4gDh8BDaMOohLWRh1ATbR7924uv/xyfv/739OuXbuY25xyyinMmzcPgGXLlpGVlRVXPREREakcJc3RWwX0ijqISloXdQA10VNPPcWSJUu4//77yc7OJicnhzFjxhy0zSWXXMKkSZMYNWoUzz//PP379z+k3pQpUyI6AhERkZonE6Zn/JtgXvNhUQdSih1U3y8vSpqTYPjw4QwfPrzMbZo2bUpeXh65ubnceuutNGvWLK56IiIiUjnVNVmLm+d5jvSdRrAI6Bp1EJW0D/g06iAyWYsWLRg4cCBt2rSJOhQREZEar8YnzaGVUQcQwx6gZdRBVMGHnudpyTkRERHJCJmSNL8ZdQAxLADaRx1EFbwedQAiIiIiqZIpSfOsqAMoYRtwQtRBVJGSZhEREckYmXAiIJ7nrfV9/2PSZ2T3XSA76iCqYA+QF3UQNUXW7a+mdH+fjOuf0v2JiIjUBJky0gzpM9r8BVDdL9WW73netqiDEBEREUkVJc2p9wnpu/xdvGZGHYCIiIhIKmVS0vwa8E3EMayk+l7IpMhe4O9RByEiIiKSShmTNHue9y3wZMRhFAIWcQxV9arneZ9FHYSIiIhIKmVM0hx6lGCkNAoLgc4R7TuRHo06ABEREZFUy6ik2fO8T4EZEey6EGhd1Ua2bNnC/PnzKSgoKLOsLBs3bqSwsLCyIawlfeaGi4iIiKRMRiXNoQkR7HM+kFWVBrZs2cJNN92E7/tcd911bNq0KWZZLJMnT+aKK65g586dzJ8/n7p161Y2jMfCy5KLiIiIZJSMWKe5OM/zXvd9fwnQNUW73AL8qKqNrF69mltuuYXOnTuzdetWPvjgA+rVq3dI2emnn35I3VWrVnHZZZfx/vvv06BBg8qG8BWamiEiIiIZKhNHmgFuTuG+lgItq9pI9+7d6dy5M4sXL2bFihV07tw5Zlkszjn27NnD/PnzOeOMMyobwv/zPG97pQ9AREREpBrLyKTZ87w84JUU7OozEnghE+ccM2fOpGnTptSpU6fUspJ69erFnDlzaN26NSNHjiQ/P7+iu14FPF6l4EVERESqsYxMmkO/AXYleR+fA/UT1ZiZMWbMGI4//njy8vJKLSvpvPPOY/jw4TRp0oQ+ffqQm2BL5toAACAASURBVJtb0V3f5Hlepc8eFBEREanuMjZp9jxvHfD/kriLFUDPRDX21FNPMW3aNAC2bdtGkyZNYpaVZv369Rx99NHUq1cP5yp0Lt8Uz/PeqELoIiIiItVexibNoT8BFZ6rEIUBAwYwffp0hgwZwr59++jVq1fMsli2b99Oq1ataN++PVOnTqVHj7hnjHwKDE/UMYiIiIhUVxm3ekZxnucV+r4/AHgXOCKBTS8ggaPMAM2aNeOJJ54otyyWxo0b70+op06dGu8udwOXe54X3wLQIiIiIjVYpo80E14S+goSd6XA3UDbBLUVpZs9z1sUdRAiIiIi6SDjk2YAz/PeBH6boOYWAEcnqK2oPO95XhQXgRERERFJS0qaQ57n/QGIe+5CKTYBXRIQTpQWA9dHHYSIiIhIOlHSfLCrgRlVqL8CaJagWKLwLnCO53nbog5EREREJJ0oaS7G87zvgMuA/61E9U+A2MtXVA/vESTMm6MORERERCTdKGkuIbyIx0DghQpW3QDUTXxEKbEMOFsrZYiIiIjEpqQ5Bs/z9gBXAv+Is8oy4LTkRZRUbwFneZ63KepARERERNKVkuZSeJ631/O8q4GbgbIuIe2oviPM4wkS5m+iDkREREQknSlpLofneQ8CvQnmLMcyH+iYsoASYxswwPO8Wz3PS9T61CIiIiI1lpLmOHietxA4GXi5xFPfAlkpD6hqfKC753kvRh2IiIiISHWhpDlOnudt9jzvUmAYUDSd4R3gqOiiqpBdwJ1AV8/zVkUdjIiIiEh1oqS5gjzPewL4IfBnoFPE4cTrReAkz/N+F64OIiIiIiIVoKS5EjzPK/A8bxTQA5hCcDJgOpoDnOF53gDP89ZFHYyIiIhIdaWkuQo8z1vjed4VgAf8BUiHdY53AU8CXTzPO9PzvLejDkhERESkuqsTdQA1ged5K4Ff+75/OzCAYN5z7xSH8THwGPCULlIiIiIiklhKmhPI87xvgf8B/sf3/R8C5wJ9gWygZYJ3t4vgwiSzgJme532Y4PZFREREJKSkOUk8z1sDrAEm+L5vBCcN9gVOAdoBxwBtgdpxNPcl8FF4WwMsAuaGSbqIiIiIJJmS5hTwPM8RXGp7WfFy3/drEyxZdwzQgCCBrgV8R7AG9DZgred521MasIiIiIgcRElzhMKr8a0PbyIiIiKSprR6hoiIiIhIOZQ0i4iIiIiUQ0mziIiIiEg5lDSLiIiIiJRDSbOIiIiISDmUNIuIiIiIlENJs4iIiIhIOZQ0i4iIiIiUQ0mziIiIiEg5lDSLiIiIiJRDSbOIiIiISDmUNIuIiIiIlENJs4iIiIhIOZQ0i4iIiFQDmzZtIjc3l40bNx7y3IYNGygsLIwgqvSXqNdGSbOIiIhIxIYOHUrPnj257777Yj5fUFDAhRdeSH5+Pn379uXrr79mwoQJdO/enR07djBr1izq1q2b4qhTo7zXpsiIESOYPn06QFJeGyXNIiIiIhF66aWX2Lt3LwsWLGDt2rWsWbPmkG2WL1/Ogw8+yJ133sm5557LkiVLWLp0Kddffz2LFi2iUaNGEUSefPG8NgBz587lyy+/5Cc/+QlAUl4bJc0iIiIiEcrLy2PgwIEA9OvXj3nz5h2yzZlnnkmPHj2YM2cO+fn59OzZE+cchYWFvPbaa5x//vmpDjsl4nltCgsLueGGG8jKyuKVV14BSMpro6RZREREJEI7duzgqKOOAqBly5Zs2LAh5nbOOaZMmUKLFi2oW7cu/fr1Y8aMGbRt25aLLrqI2bNnpzLslIjntXn22Wfp2LEjt956K/n5+TzyyCNJeW2UNIuIiIhEqHHjxuzatQuA7du3s2/fvpjbmRkTJ06kU6dOTJs2jUGDBjF27FiaN29O//79efHFF1MZdkrE89q89957DBs2jDZt2nD11Vcze/bspLw2SppFREREInTKKafsn3awbNkysrKyDtnmgQce4NlnnwVg8+bNNG/eHIA1a9bQvn176tevX2qyXZ3F89ocd9xxrF27FoDFixfTrl07IPGvjZJmERERkQhdcsklTJo0iVGjRvH8889z0kknMWbMmIO2GTZsGJMmTaJPnz7s3buXfv36sXXrVtq0aUPHjh15/PHHOfvssyM6guSJ57UZOnQos2fPpk+fPvz1r39l9OjRSXlt6lS5BRERERGptKZNm5KXl0dubi633norbdq0oXPnzgdt06JFC3Jzcw+pd8455wDBahE1UTyvTZMmTXjhhRcOqZvo10ZJs4iIiEjEWrRosX+VCDlYurw2mp4hIiIiIlIOJc0iIiIiIuXQ9AwRERERiVzW7a+mdH+fjOtfoe3NOZekUESiZ2ZfA59GHYdUWDvn3BGp3KH6SrWlviLxSHk/AfWVaqrUvqKkWURERESkHJrTLCIikmQWKHVKpJnVMrOEfCabWf1EtBO1mnIcUnNoTrOIiEjyHQW8YGbfhY/rAJ2AJeHj2sA4MzsamOucex/AzFoDxwEnA985554IyxcBW4BjgdFAE+BHQEfgVWBiKg6qLGb2C8o5lupwHCJFlDSLJImZtSSYy3YE8BZwtXNujZldBFwKXAf8BegKfAVc4ZzbHaOdxsCksJ2PgaHAk0Bn4Fvgc+AqIBc4jODD917n3DQzywPqA98BOOeyzWwXkA80BR5yzj1jZpcDvwbqAb92zr2d+FdE4pHAfvN3oKFzbqCZPUfQV35J7L70b+fcb81sbLEmBgEbwvu/Bn4HjHbOrTSzB4F3nHPPJ/TgazDn3OdAz6LHZjYaeNU59/vi25nZ8cCk8DW+CtgMtAPuAj4qtum68P92LLAdWE/QV85J6oFUzJuUfyzV4TiSwswaAf8DtCQ47rXAQA7+u7uc4DV60szmA4Odcx/HaKsu8FLY1lPOub9FGU/Y3onAOOfcxVHGYmbHAM8C+wj63Y2uknOTNT1DJHnOIUhi+wAzw8cAZwGzwn+znHNnAD7BG0AsI4E14Xb1Cd44AEY653oSfNAUXR90QPj8M2Z2eFh2uXMu2zmXHT7+wjl3JtCb4MML4D4gG/gZwZuRRCdR/QaCL1bF/y2tL91gZoeVqHt/Ub9xzi0FxgGjzKwpQd+ZWtkDzHRm1g64B8gN7xeV1wK2AmcALwKXhNt94pybC3xZbApHlpm9Dvwc+No5t5jwy3E6qMCxpPVxJNlgYEH4fvwd0I1D/+4eBK43sz4Ef7sxE1SCv+13nXOnAwPMrEmU8ZhZe2A80KwScSQ0FuBGYLhz7sfA0QS/ZFSKkmaR5DmP4KfF8wiSnaLEti/BqHA2kBeWPQLMLqWd04A54f15QPeiJ8zMgMbA/pFG59w6gpHkHuXE1xDYG97fTjDquM45V1YSJsmXqH4DsNvMWgGF4ePS+pJP8IWpVM65OUAWcDcw0Tm3L56DkYOFXzr+ASwi+AyeZGZFI3HHEnxRuiTc5nWC0bZ+YWL5OjAw/Lu/EvgQ+DFwlpn9FTjZzB4zs8fDpCVK8RzLINL/OJLpC+BSM/uhc+56gj5xEOfcNwR/438D7i+jrWyg6JefOQRJZpTxbAN+WokYEh6Lc+5O59wH4cNWwMbKBqWkWSR5ehKM4J4FLAS6mFlbYGf4x34EsNXMBgPTgctKaacJsCO8v5NgWgUECdMnBD9XvVmizjdA8/D+C2aWZ2aPhI+PMrO5BHMp/yssu4Dg5/7lZtYdiVKi+g3AMoLEZFn4uLS+NJFgNKa4O8N+k2dmtcOyPxOMTv+jsgeXyczsCGAGwQf8SoKfiy8D7jWzC5xzHxH8wlDLOTfIOXcWcDXwmnPubOfcWc655wADJoTNTgL+j2DqTTPgb865YWWMuqVEPMcCTCHNjyOZnHPTCf6mXjKzhwmm1sX6u3uDIF9bU0ZzjQgSTYBNQOso43HOfeWcq/QvBgl+bQAws0HA+865f1c2LiXNIklgZp2Awwl+ws4Cvg+8C9wGvBZutgVo4pybBIzlQJJb0laC0WQI3hi3hvdHAo8CH8eYn9WS4I0TDkzPGBk+/oLgw2wjsCw8Q/0I59wwgnmHkyp6vJIYCe43EHwxuoYDJ5uV1pe+JBjtyy5Wt/hPoUW/SLwPrHbOFSIVYmYnEPxSMNY596+icufcRoLR2Alm1tI5t5XgJ+cOZraaYBTtBDN73cz+FdbZx4FfiQY751YSTINYAlyRuqMqW3nHUl2OI1nM7IcEo/FdCL4MX03sv7tRBO8Dl5bR3HagQXi/MZXI7xIcT5UkOhYzKzrRtErTD5U0iyTHucDvwnnED4ePZwK/CP8FeDsshwNzTmNZyIFkpjfB1Isi/w0MLfatu+ikh67AgtIaDJPsiQSjOnUJRqNrA6sALd4enUT2GwiSj+4cSJrL6kt/Bs6sfOhSjjXAxc65kr8K4ZxbC3Rxzm0yszOA/xB86Z3vnPuxc66Hc+5sghN1i6sDXGhmXYBbCf6eu4YrcESuAseS1seRRNcDl4YJoE9wLsNBzKwHwVSH/wJuC6fmxPIuwRcOCN4XPok4nqpKWCxm1gL4J3Cdc25LVYJS0iySHOdyYMrEmxyYn7qdIHEBmAasM7MFQL8y2poAHBueHbwLeKHoCedcQdh+0dyxqQRvDleGz8GB6Rl5ZtahWLuTCH4a3gc8RpBkvwmMqfjhSoIkst9A8MG5mgNXJCurL71HsFpHkeI/hQ6q9BEJAM65vc654leGq0UwzaLo+a3hCggPAQ8QfHntF47Kvh7OA64FYGY3ECzd9h3BrwTXA0udc8sJks4pxU4EjkQ8x1IdjiPJ/gJcY8EqR6cSvCeX/LsbC/zBOfcfYDlQ2koUzwA5ZvYXguX6FpayXariqapExnI7cAzwSFi30oMDuiKgiIhIipnZU8CTzrkFxcraA6OcczeZWRuC5bquKfb8HOdcHzOrXzRf1MxOB853zo0ptt3FwAbn3DupOp6S4jkW4Jx0P47qxMy+TzDaPKuqI6oSm5JmkTQSfqsubourxBqXklnUb2qe8KfmelU5mSpd1KRjiVr4BeS5EsWrnHMlT+TNuHhSEYuSZhERERGRcmhOs4iIiIhIOZQ0i4iIiIiUQ0mziIiIiEg5lDSLiIiIiJSjTtQBiCTT4Ycf7rKysqIOQyro3Xff3eicOyKV+1RfqZ7UVyQeUfQTUF+pjsrqK0qapUbLyspi8eLFUYchFWRmn5a/VWKpr1RP6isSjyj6CaivVEdl9RVNzxARERERKYeSZhERERGRcihpFhEREREph5JmEREREZFyKGkWERERESmHkmYRERERkXIoaRYRERERKYeSZhERERGRcihpFklDmzZtIjc3l40bN0YdioiIiKCkWSTtFBQUcOGFF5Kfn0/fvn35+uuvD9lm3bp19O/fn969e3PzzTeXWiYiIiKJoaRZJM0sX76cBx98kDvvvJNzzz2XJUuWHLLNbbfdxl133cXcuXP5/PPPycvLi1kmIiIiiaGkWSTNnHnmmfTo0YM5c+aQn59Pz549D9lm9erVdO3aFYAjjzySLVu2xCwTEanutm3bpvczSYgNGzZQWFhY6fpKmkXSkHOOKVOm0KJFC+rWrXvI8wMGDCAnJ4fp06czc+ZMzjrrrJhlIiKpsmfPHo455hiys7PJzs5mxYoV3HPPPXTv3p2bbrqp3Pq7du3i2GOPBeCVV17hpJNO4rPPPuP//u//aNCgQbLDlzRScrrho48+ur9fdenShRtvvLHUukOHDqVnz57cd999AEyYMIHu3buzY8cOZs2aFfMzNV5KmkXSkJkxceJEOnXqxLRp0w55fsyYMZx//vk8+eSTDBkyhMaNG8csE5HMUNURtERYvnw5V155JXl5eeTl5bF7927mzZtHfn4+Rx55JK+//nqZ9e+77z7+85//ADBr1iweeOAB5s+fT2FhIfXq1UvFIUiaKDnd8MQTT9zfr3r37s0NN9wQs95LL73E3r17WbBgAWvXrmXNmjUsXbqU66+/nkWLFtGoUaMqxaWkWSTNPPDAAzz77LMAbN68mebNm8fcrkuXLqxfv55Ro0aVWSaZR6uvpL8NGzZw8sknV/gE3hEjRjB9+nQgsSNoifDOO+8wY8YMTj31VIYOHcobb7zBT3/6U8yMc889l7lz55Za98MPP2T58uWcdtppANSqVYtdu3Yxb948zjzzzFQdgqSJ0qYbfvHFF2zYsIFu3brFrJeXl8fAgQMB6NevH/PmzcM5R2FhIa+99hrnn39+leJS0iySZoYNG8akSZPo06cPe/fupW3btowZM+aQ7caPH8+oUaNo2LBhmWWSWUquvrJo0SKtqpKGRo8eza5duyp0Au/cuXP58ssv+clPfgKQ0BG0ROjevTuvv/46+fn5FBYWsmvXLo466igAWrZsyYYNG0qtO3r0aB5++OH9jwcOHMjDDz/Msccey80338zkyZOTHr+kj9KmG06cOJHhw4eXWm/Hjh2H9Ll+/foxY8YM2rZty0UXXcTs2bMrHVedStcUkaRo0aIFubm5B5UVzc0qLicnJ64yySxFq6/06NGDgoIC+vXrx7/+9S969OjBoEGDyMvLIzs7O+owM9qbb75Jo0aNaNOmTdwn8BYWFnLDDTdwwQUX8Morr3DxxRcfNIIW64t1qnXq1In69esD0K1bt/2JM8D27dvZt29fzHrPPvssZ555Jj/4wQ/2l/Xp04fJkyezcOFCNmzYwBtvvMFVV12V/IOQtDBmzBjmzZvH+PHj90833LdvH7Nnz+b+++8vtV7jxo0P6XODBg2iXbt2rF27lv79+/Piiy/St2/fSsWlkWYRkRqk5OorLVq00KoqaWT37t3ce++9jBs3Dih9RK2kZ599lo4dO3LrrbeSn5/PI488ktARtEQYPHgwy5YtY+/evbz88svs2LGDefPmAbBs2TKysrJi1ps5cybTpk0jOzubpUuXcuGFFwLw1ltvccYZZ1CnTh3MLFWHIWmi5HTDuXPnctppp5XZF0455ZSYfW7NmjW0b9+e+vXrl/rlLR5KmkVEapjiq69ce+21WlUljYwbN44RI0bsP1ch3hN433vvPYYNG0abNm24+uqrmT17NoMGDWLs2LE0b958/whalO6++24GDx5Mly5d6NmzJ2PGjOG9997jV7/6FePGjePKK69k06ZNXH/99QfVmzx5MnPnziUvL48uXbowY8YM9u3bR8OGDWndujXz58/nRz/6UURHJVEpOd1w1qxZ9OnTZ//zK1euPOQXlksuuYRJkyYxatQonn/+efr378/WrVtp06YNHTt25PHHH+fss8+udEzmnKt0ZZF0161bN7d48eKow5AKMrN3nXOxz/RIkprYV+666y48z+Ooo45i/PjxdO/ePS1+xk+k6tZX+vTpQ61awXjV0qVLGTBgAA899BC9e/fm7bffLvV8hIceeoh69eoxYsQIJk2axJIlS/jzn//MpEmTOP7443n33XfxfZ+//vWvlT6uZNi1axevvvoqXbt23b+cXBSi6CdQM99X0l1BQQG5ubn06dOHNm3aVLh+WX1Fc5pFRGqQBx54gO9973v8/Oc/37/6StHPnP/85z+jDi/jzZkzZ//97OxsnnzySe65556DRtRWrlzJ5MmTDzqXYejQoVx33XU899xzFBYWMnXq1ING0G688UbuvvvulB9PeRo0aMCAAQOiDkMySIsWLfavoJFoGmmWGq06fMvPuv3VlO7vk3H9U7q/yqhuo4fppKCggIEDB/Ldd9/heR4TJ05k7NixHHfccQwePDjq8BJOfUXioZFmiZdGmkVEMkSs1Ve0qoqISNUpaRYREZFSpfrXsESoDr+oSfWjpFlEREREIpfu0xWVNKeY7/u1gLZAVnhrFz5uQPD/YcBu4FtgM7AW+Aj4GPjM87y9KQ9aRETKVN1GYzUSK1JxSpqTzPf9JsCZwFlAX6AjULeSze32fX8d8A7wGpDred7XCQlUREREREqlpDkJfN8/FhgC9AO6kbjXuR7QIbwNAZzv+0sJEuhXPM9bkKD9iEgaSvefLkVEajIlzQni+34d4CLgF8DZBNMsks2Ak8PbbWECPRGY7HnezhTsX0RERCQjKGmuIt/3Dwf+C7ge+F7E4XQBngDG+77/NDDR87yPI45JREREpNpT0lxJvu83Am4GRgNNIg6npObAb4CRvu8/Cdzjed5XEcckIiIiUm3VijqA6shy7Koer/TI3bNvz12kX8JcXB2C6SIf+b4/OpxCIiIiIiIVpKS5AizHfmA5lgf8Y8eeHT3/7P95ftQxxakJMB5Y4vt+z6iDEREREalulDTHyXLsYmAJwfJxAExaM8nbsnvLluiiqrAfAXN8378l6kBEREREqhMlzeWwHKtjOfYn4GWCucL7OVzL0QtHL40mskqrA/zB9/0Xfd9vGnUwIiIiItWBkuYyWI61Bd4CRpW2zTtfvdNz7da1n6YuqoS5DFjs+/6Pog5EREREJN0paS6F5djxwEKgVzmb1hv+9vAvUxBSMvwQeMf3/X5RByIiIiKSzpQ0x2A5dgKQB3w/nu3/vfPfp7357zer2zSNIg2BV3zfPyfqQERERETSlZLmEizHTgRmU8ELldyWf1uDfW7fvuRElXSHESTOZ0cdiIiIiEg6UtJcjOXYSQQjzG0qWvfbvd92ePSDR99OeFCp0wCY5vv+WVEHIiIiIpJulDSHLMeOAGYCR1a2jcc/eLzDjsId2xMXVco1AKb7vt816kBERERE0omSZsByrDbwHNC2Ku3sY9+Rdyy6Y3FiojrUli1bmD9/PgUFBWWWlWXjxo0UFhaWtUkDYKrv+83L2khEREQkkyhpDtwH/DgRDc3+z+wen+/4/ItEtFXcli1buOmmm/B9n+uuu45NmzbFLItl8uTJXHHFFezcuZP58+dTt27d8nb3A+AZ3/ct0cchIiIiUh3ViTqAqFmOXQTclsAmDxvx9ohPp/WbdlQC22T16tXccsstdO7cma1bt/LBBx9Qr169Q8pOP/30Q+quWrWKyy67jPfff58GDRrEu8uLgFuAPyTwMERERESqpYweabYcywKeBRI6orpu27pe73z1jp/INrt3707nzp1ZvHgxK1asoHPnzjHLYnHOsWfPHubPn88ZZ5xRkd3e7/t+74QcgIiIiEg1ltFJMzABaJaMhn+z4Dc451wi23TOMXPmTJo2bUqdOnVKLSupV69ezJkzh9atWzNy5Ejy8/Pj3WUd4Enf9+sl5ABEREREqqmMTZotxy4B+ier/e17tnvPrHlmQSLbNDPGjBnD8ccfT15eXqllJZ133nkMHz6cJk2a0KdPH3Jzcyuy2+OB0VUMXURERKRay8ik2XKsPvBgsvfzkP9Qu2/3fLsrEW099dRTTJs2DYBt27bRpEmTmGWlWb9+PUcffTT16tWjEgPgd/q+n9A52iIiIiLVSUYmzcAvCVaISKq9bu9RY5eMXZiItgYMGMD06dMZMmQI+/bto1evXjHLYtm+fTutWrWiffv2TJ06lR49elR09w2Be6t6DCIiIiLVVcatnmE51gy4M1X7e/WzV7v95ke/2dC6QevWVWmnWbNmPPHEE+WWxdK4ceP9CfXUqVMrG8IQ3/f/5Hne+5VtQERERKS6ysSR5iFAixTur/HI+SNXp3B/yVIL+HXUQYiIiIhEIROT5hGp3uEHmz84fek3Sz9M9X6T4Crf95Oy2oiIiIhIOsuopNly7CygQwS7rjVy/shvI9hvojUkGKkXERERySgZlTQTwShzkc27N3d5Ye0LCTkpMGK/iDoAERERkVTLmKTZcuwogktDR+Z3S3/XZve+3bujjCEBTvR9v2/UQYiIiIikUsYkzcClRLxayB63p924peMSesGTiAyKOgARERGRVMqkpPnsqAMAeGHdC102fbvpm6jjqKK0eC1FREREUiUjkmbLsdpAukwpaPbrd35d3dc6bu/7ftIvDiMiIiKSLjIiaQZOBZpGHUSR9755r9eHmz/8OOo4qkijzSIiIpIxMiVpPifqAEqoc9PbNxVEHUQVKWkWERGRjJEpSfOZUQdQ0lffftXtX5/9692o46iC7KgDEBEREUmVTEmao7igSbnuWnxXs71u796o46ikI3V1QBEREckUNT5pthyrD3w/6jhi+W7fd8f9xf/L21HHUQXHRh2AiIiISCrU+KQZaAdY1EGU5u+r/95x6+6tW6KOo5K0goaIiIhkhExImuMfDd0JfAzsqMLetgMVmHDhcIffmn/re1XYY5Q00iwiIiIZIROS5vhGQ3cBk4EvgGc4OHHeDjxWRt2FwOPAbuAjoHbFAnx7w9u9Ptn2yfqK1UoLGmkWERGRjFDppNnM8ipZr4uZdansfmO019rM5paxSZu4GtoAnAv0AdoD/yn23GtAYRl1vwS6EiTc9eLaW0n1hr89/N+Vqhmt1lEHICIiIpIKUYw0dwlvVWZmLQjGhRuVsVl8aWwWcDTwCUHy2zYsXwvUBRqXU38vwdSO4+La2yE+3/F5j7f+89ayytWOTOW+IoiIiIhUM+UmzWZ2mJk9Z2bzzGyGmTWMsc1YM8sO718T3hqE288xs/81szpm9nvgduB2M3sj3L6hmU0Nt5tYrM08MxtvZrPKCG8vMAjYWsY2dcs7xv0c8D7QgGCKxR5gDuVfxqM9sJrgmoP/BNbFvceD3LLwlnrOuT3Avmpyq+BEFBEREZHqKZ6R5mHAMufcGcCLgBdn2x2Bfc65PsDTQGPn3B3AOGCcc+6sYu374XbfM7NOYXkPYIFz7tzSduCc2+qcS9zKEwb0J5h0sAqYB3QnSKLL4hFc6uMw4HhgZeV2f1KLkxxBIlqrGt1ERET+f3v3Hh9Fdf9//HW4yi0oqGAVoWAV7UgQxQIKRgURUUSL4p0W8UppLd6qoJFqW63f1gvQWsVq5SdeAFtFWxSECBggoCKsKKCgKCqCYLgKgXx+f8wEYthkN8nuzoZ9Px+PPLI7O3PmZaERNgAAIABJREFUc5LJ5r1nz8yK7PfiCT3tgYLg9tPAghjrl0TMd4GIc+4N/NnC28pZ/xjggmCOdFvg8GB5xMxeiqO+WCqajbzXHGBRcPt7/AC8Er/nT+HPW365gu03AM3wI69VrdCHuz5c5JxL28vjRRHfz1ZERESkhosnNH+EP94KcCcwJMo6O4FDgttnB9+zgbfN7CzgIKB7sHw70BAgCIjLgIfNLAcYCZRcRWJL3L2o2M641joRWAz8E3/iQTtgMPDL4KslcH45236PP+f5EOAdqnQhtot+fNG8pvWaZld+y1DF97MVERERqeHqxLHOE8C/gpHgb4HLo6zzCvA359yZwTrgn1L3gHNuBH6sXBgsnwa86Jy7HLgjaP8p59wv8ecmX1a1rpRrXVxrNQCuquDxX1bw2AH4IRvghviKKq2Oq1P0u+zfxXeVj/TyddgFiIiIiKRCzNBsZtuBi6Mszyl1O4J/sbay9pmPbGYb2PfUugrbj6PGitb9JN52wnJrh1vn1qtdL9rPL91V8ZRHERERkZolnpHmtBDlutCFZlbehInS0jo0Z9XNKryk3SXxnlyZblaGXYCIiIhIKtSY0FyZkecyPsWfpZyWV3p48GcPLqrlap0Wdh1VpJFmERERyQhpGSQTyXJtJ/B52HVE07px69VdD+3aNew6qkEjzSIiIpIR9vvQHPg47AKiGXvK2DXOuZr6qXprPc+r6ENlRERERPYbmRKa54RdQFndDu22pHXj1jV5lDkv7AJEREREUiVTQvMbYRdQ1v91+b+a9CEm0UwPuwARERGRVMmU0Dwf+C7sIkpcftTlc5vUbVJTr5hRYlrYBYiIiIikSkaEZsu13cCMsOsAqFur7o6bj7/5iLDrqKaPPc/7LOwiRERERFIlI0Jz4PWwCwC4s+Od8+rWqtsq7DqqSVMzREREJKNkUmh+FdgVZgEH1jtww8/b/LxjmDUkyHNhFyAiIiKSShkTmi3XvgReDrOGh7o8tMQ51zTMGhIg4nnerLCLEBEREUmljAnNgTFh7bhdVrtVJx58Yrew9p9Aj4VdgIiIiEiqZVRotlzLAyJh7Htst7HfOOfqhrHvBNoCjA+7CBEREZFUy6jQHBib6h32aNnj/cMbHf6z6rZTWFhIfn4+GzdurHBZRdavX09RUVFVS3hWnwIoIiIimSgTQ/N4YEMK92d/PvnP1f6o7MLCQoYOHUokEmHw4MFs2LAh6rJoJkyYwCWXXMK2bdvIz8+nbt0qDXjvBh6uTh9EREREaqo6YReQapZrW90ody/wUCr2N/jowfmN6jY6pbrtLF++nFtvvZXs7Gw2bdrEhx9+SL169fZZdsop++5q2bJlXHjhhXzwwQc0aNCgqiU86XneR9XqhIiIiEgNlYkjzeBP0ViR7J3Ur11/+7CfDvtxItrq3Lkz2dnZLFy4kCVLlpCdnR11WTRmxq5du8jPz+fUU0+tyu63AvdUo3wRERGRGi0jQ7PlWhHw22Tv5+4T7i6oU6vOjxLVnpkxdepUsrKyqFOnTrnLyurWrRuzZs2iRYsWDBs2jIKCgsru+l7P876qXvUiIiIiNVdGhmYAy7XXgH8nq/3m9ZuvO+/I8zolsk3nHCNHjuToo48mLy+v3GVlnX322dxwww00adKEHj16MG3atMrs9iPgr9UsXURERKRGy9jQHPg18F0yGn6k6yMfOueaJKq9J598kldeeQWAzZs306RJk6jLyrN69WpatWpFvXr1MLN4d1sE/NLzvCpfbkNERERkf5DRodly7QvgKiDuFBmP9k3bf9KhWYdqn/xX2oABA5gyZQqDBg2iuLiYbt26RV0WzZYtW2jevDnt2rVj0qRJdOnSJd7d3uZ53ryEdUJERESkhsq4q2eUZbk2xY1y9wF3JarNMaeM2eCca5eo9gCaNm3KE088EXNZNI0bN94TqCdNmhTvLid7nqdLzImIiIiQ4SPNpdwDTE1EQ70O7/VuiwYtOieirRB9DAwOuwgRERGRdKHQDFiuFQOXAauq047DFf/hpD80SkxVodkCXKRP/hMRERHZS6E5YLm2EegLfF3VNq479rr8BnUaHJO4qlJuK3CO53mLwi5EREREJJ0oNJdiufYhkAN8WdltG9RusPX6Y68/KuFFpc42oK/nebPDLkREREQk3Sg0l2G5tgw4Dfi8Mtv9/qTfL6jtardMTlVJtx04z/O8t8IuRERERCQdKTRHYbn2MX5w/iye9Q894NC1vQ/vXVNP/tsM9PM8b0bYhYiIiIikK4XmcliurQJOAd6Ote7obqOXO+dq4gmAS4HOnudND7sQERERkXSm0FwBy7U1+HOc76ecD0DxDvKWH3vgsQn9IJMUeR442fO8ZWEXIiIiIpLuMv7DTWKxXNsF3OFGubeAZ4BDSj8+utvozc65mvTiowi4xfO8R8MuRERERKSmqElhL1SWa1OBjsCeqQx9W/VdePABB58YXlWV9g7QTYFZREREpHIUmivBcu1Ly7VewIBartbKUSeOOijsmuK0Efg1/nSMhWEXIyIiIlLTKDRXgeXa5Ce7P3lc/dr1Hwc2hF1PBYqAR4CjPM8b7XlecdgFiYiIiNREmtNcRb84/Rc7gD9HIpF/4I/iXgO0CreqPdYBjwOPeZ73RdjFiIiIiNR0Cs3V5HleIXBvJBL5A9AHuBb/47hrh1BOATAGeNHzvB0h7F9ERERkv6TQnCDB1IfXgNcikcjhwCCgN9AFqJek3e4G5gOvA1M8z3svSfsRERERyWgKzUnged4a4I/AHyORSAP8D0k5PfjqBNSvYtNbgI+BBfhB+U3P876rfsUiIiIiUhGF5iTzPG87/mXqpgNEIhEHtACOBFoHX0cCDfCndDhgB/A9/kdcr8QPyis8z/s61fWLiIiIiEJzynmeZ8DXwVdByOWIiIiISBx0yTkRERERkRgUmkVEREREYlBoFhERERGJQaFZRERERCQGhWYRERERkRgUmkVEREREYlBoFhERERGJQaFZRERERCQGhWYRERERkRgUmkVEREREYlBoFhERERGJQaFZRERERCQGhWYRERGRGm7t2rUUFRWFXcYe6VZPIig0i4iIiIRs7dq1dO/evdzH16xZwxFHHEFOTg45OTmsW7eOMWPG0LlzZ7Zu3crrr79O3bp199t60kGdsAsQERERyWQbN25k0KBBbN26tdx15s+fz4gRI7jhhhv2LFu0aBFDhgxhwYIFNGrUaL+tJ11opFlEREQkRLVr1+aFF14gKyur3HXmzZvHuHHj6NSpE3feeScAZkZRURFvvPEGffr02W/rSRcKzSIiIiIhysrKomnTphWu06dPH/Ly8liwYAFz585l8eLFnHXWWbz66qscccQR9OvXj5kzZ+6X9aQLTc8QERERSXPdunWjfv36AJxwwgmsWLGCgQMH0rp1a1auXEnfvn2ZPHkyp59+ekbWkwoaaRYRERFJc7179+arr75i27ZtvPHGG3ieB8CKFSto164d9evXp7i4OGPrSQWFZhEREZE0MmPGDMaMGfODZbm5uZx++ul06dKF66+/nmOOOYZNmzbRsmVLjjvuOB5//HF69uyZEfWERdMzRERERNJAXl4eAGeccQZnnHHGDx47/fTT+eijj36wLCsri169egH+lSv293rCppFmEREREZEYFJpFRERERGJQaBYRERERiUGhWUREREQkBp0IKCIiIpIibX73Wsr3+en9fct9LNX1VFRLunNmFnYNIknjnFsHfBZ2HVJprc3skFTuUMdKjaVjReKR8uMEdKzUUOUeKwrNIiIiIiIxaE6ziIhIkjlfuVMinXO1nHMJ+Z/snKufiHbCtr/0Q/YfmtMsIiKSfIcDE51zO4L7dYAOwLvB/drA/c65VsBsM/sAwDnXAjgKOAHYYWZPBMsXAIVAW+AWoAlwPHAc8BowNhWdqohz7npi9KUm9EOkhEKzSJI455rhz2U7BHgLuMLMVjjn+gEXAIOBR4BOwDfAJWa2M0o7jYHxQTufAFcD44Bs4HvgC+AyYBpwAP4/33vN7BXnXB5QH9gBYGY5zrntQAGQBTxsZv9yzl0E3ATUA24ys7cT/xOReCTwuHkaaGhmFzvnnsc/Vn5F9GPpSzO70zl3T6kmBgJrg9s3AX8EbjGzpc65vwLzzOzFhHZ+P2ZmXwBdS+47524BXjOzP5Vezzl3NDA++BlfBnwHtAbuAj4uteqq4Hd7D7AFWI1/rPRKakcqZwax+1IT+pEUzrlGwP8DmuH3eyVwMT/8u7sI/2c0zjmXD1xpZp9Eaasu8FLQ1pNm9s8w6wnaOxa438zOD7MW59yRwDNAMf5xd51VcW6ypmeIJE8v/BDbA5ga3Ac4E3g9+N7GzE4FIvhPANEMA1YE69XHf+IAGGZmXfH/0fQMlg0IHv+Xc+7gYNlFZpZjZjnB/TVmdhrQHf+fF8B9QA5wOf6TkYQnUccN+C+sSn8v71i6xjl3QJlt/1By3JjZIuB+YLhzLgv/2JlU1Q5mOudcayAXmBbcLlleC9gEnApMBvoH631qZrOBr0tN4WjjnJsOXAWsM7OFBC+O00El+pLW/UiyK4G5wfPxDuAk9v27+yswxDnXA/9vN2pAxf/bfsfMTgEGOOeahFmPc64d8CDQtAp1JLQW4DrgBjM7A2iF/05GlSg0iyTP2fhvLZ6NH3ZKgu3p+KPCOUBesGw0MLOcdn4GzApuzwE6lzzgnHNAY2DPSKOZrcIfSe4So76GwO7g9hb8UcdVZlZRCJPkS9RxA7DTOdccKArul3csRfBfMJXLzGYBbYC7gbFmVhxPZ+SHghcdzwIL8P8Hj3fOlYzEtcV/odQ/WGc6/mjbWUGwnA5cHPzdXwp8BJwBnOmc+xtwgnPuMefc40FoCVM8fRlI+vcjmdYAFzjnfmJmQ/CPiR8ws2/x/8b/CfyhgrZygJJ3fmbhh8ww69kM/LwKNSS8FjMbYWYfBnebA+urWpRCs0jydMUfwT0TmA90dM4dAWwL/tgPATY5564EpgAXltNOE2BrcHsb/rQK8APTp/hvV80os823wIHB7YnOuTzn3Ojg/uHOudn4cyl/HSw7B//t/sXOuc5ImBJ13AC8jx9M3g/ul3csjcUfjSltRHDc5DnnagfLHsIfnX62qp3LZM65Q4BX8f/BL8V/u/hC4F7n3Dlm9jH+Owy1zGygmZ0JXAG8YWY9zexMM3secMCYoNnxwH/xp940Bf5pZtdWMOqWEvH0BXiBNO9HMpnZFPy/qZecc4/iT62L9nf3Jn5eW1FBc43wgybABqBFmPWY2TdmVuV3DBL8swHAOTcQ+MDMvqxqXQrNIkngnOsAHIz/FnYb4EfAO8DtwBvBaoVAEzMbD9zD3pBb1ib80WTwnxg3BbeHAX8HPokyP6sZ/hMn7J2eMSy4vwb/n9l64P3gDPVDzOxa/HmH4yvbX0mMBB834L8w+gV7TzYr71j6Gn+0L6fUtqXfCi15R+IDYLmZFSGV4pxrj/9OwT1m9r+S5Wa2Hn80doxzrpmZbcJ/y/kY59xy/FG09s656c65/wXbFLP3XaIrzWwp/jSId4FLUterisXqS03pR7I4536CPxrfEf/F8BVE/7sbjv88cEEFzW0BGgS3G1OFfJfgeqol0bU450pONK3W9EOFZpHk6A38MZhH/GhwfypwffAd4O1gOeydcxrNfPaGme74Uy9K/AO4utSr7pKTHjoBc8trMAjZY/FHderij0bXBpYBunh7eBJ53IAfPjqzNzRXdCw9BJxW9dIlhhXA+WZW9l0hzGwl0NHMNjjnTgW+wn/Rm29mZ5hZFzPriX+ibml1gHOdcx2B2/D/njsFV+AIXSX6ktb9SKIhwAVBAIzgn8vwA865LvhTHX4N3B5MzYnmHfwXHOA/L3wacj3VlbBanHMHAc8Bg82ssDpFKTSLJEdv9k6ZmMHe+alb8IMLwCvAKufcXOCsCtoaA7QNzg7eDkwsecDMNgbtl8wdm4T/5HBp8BjsnZ6R55w7plS74/HfGi4GHsMP2TOAkZXvriRIIo8b8P9xLmfvJ5JVdCy9h3+1jhKl3wodWOUeCQBmttvMSn8yXC38aRYlj28KroDwMPAA/ovXs4JR2enBPOBaAM65a/Av3bYD/12CIcAiM1uMHzpfKHUicCji6UtN6EeSPQL8wvlXOToZ/zm57N/dPcCfzewrYDFQ3pUo/gWMcs49gn+5vvnlrJeqeqorkbX8DjgSGB1sW+XBAX0ioIiISIo5554ExpnZ3FLL2gHDzWyoc64l/uW6flHq8Vlm1sM5V79kvqhz7hSgj5mNLLXe+cBaM5uXqv6UFU9fgF7p3o+axDn3I/zR5terO6Iq0Sk0i6SR4FV1aYVWhWtcSmbRcbP/Cd5qrledk6nSxf7Ul7AFL0CeL7N4mZmVPZE34+pJRS0KzSIiIiIiMWhOs4iIiIhIDArNIiIiIiIxKDSLiIiIiMSg0CwiIiIiEkOdsAsQSaaDDz7Y2rRpE3YZUknvvPPOejM7JJX71LFSM+lYkXiEcZyAjpWaqKJjRaFZ9mtt2rRh4cKFYZchleSc+yz2WomlY6Vm0rEi8QjjOAEdKzVRRceKpmeIiIiIiMSg0CwiIiIiEoNCs4iIiIhIDArNIiIiIiIxKDSLiIiIiMSg0CwiIiIiEoNCs4iIiIhIDArNIiIiIiIxKDSLiIiIiMSg0CwiIiKyn9i8eTOFhYVhl7FfUmgWSUMbNmxg2rRprF+/PuxSpIZau3YtJ5xwwp7b3bt3D7kiESlt1apV9O3bl+7du3PzzTezceNGzjnnHE466SSuu+66crfbtWsXRx55JDk5OeTk5LBkyRJefvllfvrTn/L555/z3//+lwYNGqSwJ5lDoVkkzWzcuJFzzz2XgoICTj/9dNatW7fPOmWfbEu78cYbmTJlSqrKlTR1yy23sH37djZu3MigQYPYunVr2CVJEhUXF/Pll1+GXYZUwu23385dd93F7Nmz+eKLLxg/fjyXX345CxcuZPPmzSxcuDDqdosXL+bSSy8lLy+PvLw8jj/+eF5//XUeeOAB8vPzKSoqol69einuTWZQaBZJM4sXL+avf/0rI0aMoHfv3rz77rv7rFP2yTYvLw+A2bNn8/XXX3PeeeeluGpJJzNmzKBRo0a0bNmS2rVr88ILL5CVlRV2WUL0UcLc3Fw6d+7M0KFDK9y2Y8eOe7abNm0aCxYsoH379ixYsIBp06ZRp06dFPVCEmH58uV06tQJgEMPPZSmTZsSiUT47rvv+Pzzz2nVqlXU7ebNm8err77KySefzNVXX82uXbuoVasW27dvZ86cOZx22mmp7EZGUWgWSTOnnXYaXbp0YdasWRQUFNC1a9d91in7ZFtYWEhRURHXXHMNbdq04eWXX0512ZImdu7cyb333sv9998PQFZWFk2bNg25KilRdpRw586dzJkzh4KCAg499FCmT58edbtvv/2W9u3b79muV69ezJw5k//7v/9j5syZfPPNNxx66KEp7k1q7K9zdAcMGMCoUaOYMmUKU6dOJScnh88++4xHH32UY489lmbNmkXdrnPnzkyfPp2CggKKior473//y8UXX8yjjz5K27Ztufnmm5kwYUKKe5MZFJpF0pCZ8cILL3DQQQdRt27dfR4v+2R75pln8swzz3Dcccdx2223UVBQwOjRo0OoXMJ2//33c+ONN3LggQeGXYpEUXaU8M033+TnP/85zjl69+7N7Nmzo243f/58CgoK6NatG/3792fz5s17RhdXrlzJ0UcfneKelK9kPn1F08hKy9Q5uiNHjqRPnz6MGzeOQYMGMWrUKB577DHuvvtu2rdvz1NPPRV1uw4dOnDYYYcBcNJJJ7FixQp69OjBhAkTaNWqFW3btuXNN99MZVcyhkKzSBpyzjF27Fg6dOjAK6+8ss/jZZ9sGzduzHvvvce1115Ly5YtueKKK5g5c2YIlUvYpk+fztixY8nJyWHRokUMGTIk7JKklLKjhNu3b+fwww8HoFmzZqxduzbqdm3btuX1118nPz+fDh068NRTT3Huuefy0EMP0bx5c5577jkefPDBVHalXCXz6cubRlZWJs/R7dixI6tXr2b48OFs3LiRJUuWsHv3bubPn49zLuo2V155Je+//z67d+/mP//5D9nZ2QC89dZbnHrqqdSpU6fcbaV6FJpF0swDDzzAM888A8B3331X7ohh6SdbgKOOOoqVK1cCsHDhQlq3bp2agiWtzJo1a0/46NixI+PGjQu7JCml7Chh48aN2b59OwBbtmyhuLg46nZt27blqKOO2rPdihUraN++PdOnT+fEE09k8+bNfPTRR6npRAVKz6ePNo0smkyeo/vggw8yfPhwGjZsyB133MG1115L06ZN2bBhA5deeilLly5l5MiRP9jm7rvv5sorr6Rjx4507dqVnj17UlxcTMOGDWnRogX5+fkcf/zxIfVo/+bMLOwaRJLmpJNOsvLOQE5XGzdu5OKLL2bHjh14nsfQoUN57rnnuO+++36wXm5uLkcddRRXXnkl4M/7Gzx4MGvXrqWoqIhJkybtGcGqaZxz75jZSancZ008VqTmHSsXX3wxI0aMwPM8evXqRU5ODt988w1jxozhqaee4quvvuLOO+/cZ7tbb72V7t27069fP6666ip69OjBkCFDmDx5Mn369OHXv/41QKgvknbu3Env3r3597//Tf/+/enZsyfbt2+nS5cuDB8+nPfee4/GjRvvs92CBQs44ogjOOyww7jqqqsYMGAABx54ICNGjODCCy9k7ty59O/fn8suu6zKtYVxnICeV2qiio4VnWorkmYOOuggpk2b9oNlZQMzwKhRo35wv0mTJkycODGptYlI9dx9991cdtllmBn9+vVj5MiRdO/end/85jdMnTqVqVOnsmHDBm677bYfBODhw4fTv39/7rzzTrp27cqgQYMA//yHhg0b/mCaR1jKzqcfOXIkc+bM4cEHH9wzjSyaDh06UL9+fWDvKHrJyWzz589n7dq1vPnmm9UKzSKJoNAsIiKSIp7nsXjx4h8smz59Oq+99hq/+c1v+PGPfwzsO2J82GGHMX/+/H3aGzBgAADPPvtskiqO3/Tp05kxYwZjx47dM5/+4YcfZvXq1Tz33HPlbnfllVfuGX3/z3/+s2ek/a233qJnz54sWrRIc3QlLSg0i4iIhKhBgwZ7wm9NNmvWrD23c3JyGDduHLm5uXvm7AIsXbqUCRMm/ODds7Kj79Hm6J5//vkp749IWZrTLPs1zSermWraPFUJj44ViYfmNEu8NKdZRGQ/0OZ3r6V0f5/e3zel+xPJBKn+Owb9LSeKQrNIyBSERGq+MIJQdeh5QKTydJ1mEREREZEYNNIcokgkcgDQBjgCOAD/9+GAncAO4DvgE8/zNoZVo4iIiIgoNKdEJBKpA3QGTgc64AflNsCh+CE51vYbgU+CrxXAPCDP87ytyalYRETEV9OmnoCmn0hyKDQnSSQS+SlwFnAG0APIqkZzBwEnBV8ldkYikXzgjeDrXc/zdCkUERERkSRQaE6gSCTSCLgUuB44Mcm7qwfkBF9/BD6PRCL/AB73PG9dkvctIiIiklEUmhMgEokcjx+Ur6B6I8rV0Qq4D7grEolMBMZ6njcvpFpERERE9isKzdUQTMH4I9Av7FpKqY8f3q+IRCLzgTs8z5sZck0iIiIiNZpCcxVEIpFDgN8CtwG1Qy6nIj8DZkQikVeBmz3PWx52QSIiIiI1ka7TXAmRSKRWJBL5FbAcuAP4IOSS4nUusDgSifw+uMydiIiIiFSCQnOcgtHl/wGjgQODxXWAmnLFivrAXcB7kUjkuLCLEREREalJFJrjEIlETgXew7+EXGnHAfmpr6ha2gMFkUjk0rALEREREakpNKe5ApFIxOHPW76P8n9WbYFtQMNU1ZUAjYAJkUikGzDc87yisAsSERERSWcaaS5HJBKpC0wE7qfiFxeHAQUpKSrxfgXMCqaeiIiIiEg5FJqjiEQi9YDJwM/j3ORk4KvkVZRUXfCvsHFw2IWIiIiIpCuF5jIikUh94CXgvEps1hD4JDkVpYSHgrOIiIhIuRSaSwkux/ZvoG8VNj8FWJrYilLqeODNSCTSPOxCRERERNKNQnMgOOnvRaBPFZtwwK7EVRSKDvjBOayPAhcRERFJSwrNe91N5aZkRNMBmJuAWsKUDTwVdhEiIiIi6UShGYhEIr3xQ3MiHAHsSFBbP1BYWEh+fj4bN26scFlF1q9fT1FRzCvMXRiJRH5b9UpFRERE9i8ZH5ojkUhr4FkS97NoRRJGmwsLCxk6dCiRSITBgwezYcOGqMuimTBhApdccgnbtm0jPz+funXrxrPLB4LrOIuIiIhkvIz+cJPg0nITgUSf/NYJWAck7PrHy5cv59ZbbyU7O5tNmzbx4YcfUq9evX2WnXLKKftsu2zZMi688EI++OADGjRoEO8u6wIvRCKREzzPW5+ofoiIiIjURJk+0nwn0DkJ7WYBHyaywc6dO5Odnc3ChQtZsmQJ2dnZUZdFY2bs2rWL/Px8Tj311Mrs9gjgsUTULyIiIlKTZWxojkQiRwG3J3EXpwArEtmgmTF16lSysrKoU6dOucvK6tatG7NmzaJFixYMGzaMgoJKfYDhz4M53yIiIiIZK2NDM/AIcEAS268NbEpkg845Ro4cydFHH01eXl65y8o6++yzueGGG2jSpAk9evRg2rRpld316GAqi4iIiEhGysjQHIlEegLnpGBXJwILEtHQk08+ySuvvALA5s2badKkSdRl5Vm9ejWtWrWiXr16mFlld/8T4IaqVS4iIiJS82VcaA4+xOTBFO6yOQn40JMBAwYwZcoUBg0aRHFxMd26dYu6LJotW7bQvHlz2rVrx6RJk+jSpUtVShipDz0RERGRTJWJV884B+iYwv21Bd4CTqup8x+8AAAYB0lEQVROI02bNuWJJ56IuSyaxo0b7wnUkyZNqmoJBwM3AvdXtQERERGRmirjRprxg1+qdQC+C2G/iXZdJBLJxGNGREREMlxGBaBIJPJj4OwQdn0Q8H4I+020NkCfsIsQERERSbWMCs34J7OF1eduwKqQ9p1IOiFQREREMk7GhOZIJHIAMDjEEurif0pgTdcnEom0CbsIERERkVTKmNAM9CXxH5ddWScD74VcQ3XVAgaFXYSIiIhIKmVSaE6XT7VrCBSHXUQ1nRV2ASIiIiKplEmhuWfYBQSOAd4Ou4hqOlnXbBYREZFMkhGhORKJtAN+HHYdpRwDbAm7iGqoA+SEXYSIiIhIqmREaCZ9RplLHAosDLuIauoVdgEiIiIiqaLQHJ4uwBdhF1ENZ4ZdgIiIiEiqZEpoPj7sAqI4AFgddhHVcEwkEqkXdhEiIiIiqbDfh+ZIJOLwP8kuHXUDloRdRBXVAlqHXYSIiIhIKuz3oRn4EVA/7CIqUAuwsIuoorZhFyAiIiKSCpkQmuMOdoWFheTn57Nx48Yq72z9+vUUFRVVZpOfAvlV3mG40umKJCIiIiJJkwmhOa5gV1hYyNChQ4lEIgwePJgNGzbseWz9+vVcdNFF5W47YcIELrnkErZt20Z+fj5169atSo3bK7tRGtBIs4iIiGSEOmEXkAJHxrPS8uXLufXWW8nOzmbTpk18+OGHnHLKKQD85S9/4fvvvy9322XLlnHhhRfywQcf0KBBg6rU+CMgj5p37eNWYRcgIiIikgqZMNLcMJ6VOnfuTHZ2NgsXLmTJkiVkZ2cDMH/+fBo0aMDBBx9c7rZmxq5du8jPz+fUU0+tap2dgbVV3TgkVXqFICIiIlLTZEJojns03cyYOnUqWVlZ1KlTh6KiIv7xj39w0003Vbhdt27dmDVrFi1atGDYsGEUFBRUpc5GwIqqbBgiXXJOREREMkImTM+oHe+KzjlGjhzJ6NGjycvLY9WqVQwcOJCsrKwKtzv77LM57LDD+OKLL+jRowfTpk3j5JNPrnShG77fcMzleZfPNbMa8WImq17WmqXe0rDLEBEREUm6TAjNu+JZ6cknn+SQQw6hX79+bN68mSZNmjBv3jwKCgp4/vnnWbZsGbm5uYwaNSrq9qtXr6Z169Zs3rwZs6pdQe6383774Rdbv+hRpY1DsGbbmvVh1yAiIiKSCpkQmnfGs9KAAQO45ZZbmDx5Mj/5yU/o1q3bnhMBAX75y1+WG5i3bNlC8+bNadeuHb///e+5/vrrK13kx4Ufr3r323e7VnrDcMX1sxURERGp6TIhNG+OZ6WmTZvyxBNPlPv4U089Ve5jjRs3plu3bgBMmjSpkuX5bsy/8Rtq3nWPC8MuQERERCQVasTc2WpaFXYBseR9mbfoq21f/SzsOqog7X+2IiIiIomQCaH5k7ALqIiZ2e0Lbk/nj/muyMqwCxARERFJBYXmkI1bNi5/265tx4ZdRxVppFlEREQywn4fmj3PKwS+DbuOaL7f9f32sUvH1rR5zKVppFlEREQywn4fmgNpOdo86r1R83fb7h+FXUcVbQe+DrsIERERkVTIlND8btgFlLVu+7p1r65+9cSw66iGeZZbxQtSi4iIiNQwmRKa3wi7gLJ+M/c3HwJNwq6jGqaHXYCIiIhIqmRKaJ5BnJ8MmApLNy79eMnGJafEXjOtKTSLiIhIxsiI0BycDDg/7DpK/Cr/V98BtcOuoxq+AxaGXYSIiIhIqmREaA6kxRSNqV9MfWfd9+tOCruOapppuVYcdhEiIiIiqZJJofm/YRdQbMXFdy28qybPYy7xctgFiIiIiKRSxoRmz/MWAovCrOHvH/797e93f390mDUkwEbgxbCLEBEREUmljAnNgbFh7Xjbrm1bH//w8ZoemAGetlzbHnYRIiIiIqmUaaH5WfyR0pQbuXDkgmKKW4Sx7wQy4LGwixARERFJtYwKzZ7nbQf+mer9fr3t66+nrZl2crUb2ob/2YZbYyyryBZgd5UreNNybXmVtxYRERGpoTIqNAfGAim98sOw/GEfAw2r1ch2YAKwBvgXfkiOtiya+cDjwE7gY6pzsbvRVd5SREREpAbLuNDsed4qYHyq9rd4w+JlHxV+1K3aDa0FegM9gHbAV+Usi+ZroBN+uK5X5QrmWq69UuWtRURERGqwjAvNgRH4ExuSblj+sK0k4ufcBmgFfIoffo8oZ1l5duNP4ziqyhXcVuUtRURERGq4jAzNnuetAf6c7P288tkrCzbs2NApYQ0a8AHQgL1TLKItK6sdsBzIAp4DVlV6zxMt1+ZUeisRERGR/URGhubA/cCKZDW+23bvHvXuqGYJbdQBfYEWwLIKlpXlATnAAcDRwNJK7XUL8NtK1yoiIiKyH8nY0Ox53g7gxmS1/0jkkbd3Fu9sl7AG57D3o1m+xw/A0ZaVZwPQDH802iq155GWa2sqtYWIiIjIfiZjQzOA53nTgYcS3e6Woi2bn17+9LEJbfREYDH+BfOK8adcRFsWzfdAY+AQ4B2gbdx7/Y/l2iNVrllERERkP1En7ALSwG3415Y4LVEN/m7B7941LGHtAf6c5aviWBbNAewN1DfEvcdVwC/jXltERERkP5bRI80AnuftAgbiX3+i2tZsXfPlW1+9Vf0PMgnXDmCA5dp3YRciIiIikg4yPjQDeJ63FrgIKKpuW0Pzh67CHwOuyW6yXHs37CJERERE0oVCc8DzvLn4kxcqd5pcKQvXLVz6yaZPqv9BJuH6k+XaY2EXISIiIpJOFJpL8TzvSeA6qhicb5p3UxH+ReBqqj9brt0ZdhEiIiIi6UahuQzP854ABuNfjyJuE1dOnF+4szA7OVWlxF8s124PuwgRERGRdKTQHIXneU/jX5didzzrFxUXFf3p/T+1SGpRyfWw5dotYRchIiIikq4Umsvhed6zwCXA1ljr/mXJX/KLiovaJL2oxCsCfmO5pk/8ExEREamAQnMFPM+bBHQGIuWtU7izsHDCxxOOT11VCfMlcLrl2qNhFyIiIiKS7hSaY/A870PgZGBctMdvm3/bIsOapbaqassDOlmuvR12ISIiIiI1gUJzHDzP2+553jXA5cCWkuWfbf7s8/xv8ruEV1mlFQH3AT0t19aGXYyIiIhITaHQXAme500AjgNeABiaP/QLoH6oRcVvFtDRcu0uy7W4TnAUEREREZ9CcyV5nve553mXfLr50+6fbfmsdtj1xGEVMNBy7TTLtaUVreicy6vKDpxzHZ1zHauybZS2mjrn/uece8M592/nXL1EtCsiIiJSHQrNVXRu13PnWK79DBhABScKhmg5MAw41nLtxSTvq2PwlQiXA381s7OAr4GzE9SuiIiISJXVCbuAms5ybTIw2Y1ypwDXAhcBDUIqpxj4LzAamGa5FvWTDZ1zBwBPA0cA3wEXm9m2MuvcA+SZWZ5z7hfB4heAiUAW8C1+X+8FLgi2udLMznTONQSeAQ4FlpjZ0ODxPGAB0MHMekerzcz+VuruIcA38XdfREREJDkUmhMkuBLF226U+w1wBXA1iRt9jSUCvAo8Ybm2Mo71rwXeN7NLnHO/BDygII7tjgOKzayHc64f0NjM7nDOLQMws6dLtR8xs3uccy855zqY2WKgC/Comd0aa0fOua7AQWY2L466RERERJJKoTnBLNe+A8YAY9wo1wI4vdTXTxK0mw3ANOB14A3LtTWV3L49MDm4/XQc6zcAtgPvAhHn3BvACmBqOesfA3RzzuUABwKHA4vxg/RLsXbmnGuGP1r+8zhqExEREUk6heYkCi7r9nzwhRvlDgdOBFoDRwbfS243AGrjzzPfAXwPbAZW4gfUj4PvK4CVlmvF1SjtI/wPbXkTuBN/CsQTZdbZiT89Avx5xf8GsoG3zexO59wEoHvQxnagOYBzzgHLgAIze8o5dy6wOmhnCzEEJ/5NBO4ws8+q3EMRERGRBFJoTqFgRLiyo8LJ8ATwr2CO8bf4J9+V9QrwN+fcmcE6AJ8CDzjnRuCH+oXB8mnAi865y4E7gvafCqZ+bAIuq0RtVwOdgBHBfv5uZi9UYnsRERGRhFNozkBmth24OMrynFK3I0CPKJvvcwKfmW0AepZZXGH7FdT2d+DvsdYTERERSSWFZglNlOtCF5rZ+WHUIiIiIlIRhWYJTTwjzyIiIiLpQB9uIiIiIiISg0KziIiIiEgMCs0iIiIiIjEoNIuIiIiIxKDQLCIiIiISg0KziIiIiEgMCs0iIiIiIjEoNIuIiIiIxKDQLCIiIiISg0KziIiIiEgMCs0iIiIikjBr166lqKgo7DISTqFZRERERPa4+uqr6dq1K/fdd1/Ux1etWkXfvn3p3r07N998MwBjxoyhc+fObN26lddff526deumsuQKJSrEKzSLiIiICAAvvfQSu3fvZu7cuaxcuZIVK1bss87tt9/OXXfdxezZs/niiy/Iy8tj0aJFDBkyhAULFtCoUaOE1hQrxJe48cYbmTJlCpCcEK/QLCIiIiIA5OXlcfHFFwNw1llnMWfOnH3WWb58OZ06dQLg0EMPpbCwEDOjqKiIN954gz59+iSsnnhCPMDs2bP5+uuvOe+88wCSEuIVmkVEREQEgK1bt3L44YcD0KxZM9auXbvPOgMGDGDUqFFMmTKFqVOncuaZZ3LWWWfx6quvcsQRR9CvXz9mzpyZkHriCfFFRUVcc801tGnThpdffhkgKSFeoVlEREREAGjcuDHbt28HYMuWLRQXF++zzsiRI+nTpw/jxo1j0KBBNG7cmIEDB3LPPfdw4IEH0rdvXyZPnpyQeuIJ8c888wzHHXcct912GwUFBYwePTopIV6hWUREREQAOPHEE/eM5r7//vu0adMm6nodO3Zk9erVDB8+fM+yFStW0K5dO+rXrx81bFdFPCH+vffe49prr6Vly5ZcccUVzJw5MykhXqFZRERERADo378/48ePZ/jw4bz44ov89Kc/ZeTIkfus9+CDDzJ8+HAaNmwIwKZNm2jZsiXHHXccjz/+OD179kxIPfGE+KOOOoqVK1cCsHDhQlq3bg0kPsTXqXYLIiIiIrJfyMrKIi8vj2nTpnHbbbfRsmVLsrOz91lv1KhR+2zXq1cvwD8JL1H69+9P9+7d+fLLL/nf//7H888/z8iRI39wJY2rr76awYMH8/zzz1NUVMSkSZN+EOKvu+467r777mrXotAsIiIiInscdNBBe06+C1s8Ib5JkyZMnDhxn20THeIVmkVEREQkbaVLiNecZhERERGRGBSaRURERERi0PQMERERkQzV5nevpXR/n97ft9zH0qmWaJyZJakUkfA559YBn4Vdh1RaazM7JJU71LFSY+lYkXik/DgBHSs1VLnHikKziIhIkjnnHFDbzHaV83gtADOr9sVknXP1zWxHddsJ2/7SD9l/aHqGiIhI8h0OTHTOlYTAOkAH4N3gfm3gfudcK2C2mX0A4JxrARwFnADsMLMnguULgEKgLXAL0AQ4HjgOeA0Ym4pOVcQ5dz0x+lIT+iFSQqFZRCSNOOea4b+dewjwFnCFma1wzvUDLgAGA48AnYBvgEvMbGeUdp4GGprZxc6554HvgV8B44O2PwGuBsYBX5rZnc65e0o1MRBYG9y+CfgjcIuZLXXO/RWYZ2YvJrTz+zEz+wLoWnLfOXcL8JqZ/an0es65o4Hxwc/4MuA7oDVwF/BxqVVXBb/be4AtwGr8Y6VXUjtSOTOI3Zea0A8RQFfPEEka51wz59xm59wBzrn5zrmfBMv7Oeeecr5HnXNznHMvOefqldNOY+fcv4P1/uWcq+Oce9o5955zbq5zbqJzrq5zLs85N885tyAIWATL5gbf84Jl251zbwXbDwqWXeScezvY9pQU/Ygkul7AAUAPYGpwH+BM4PXgexszOxWIABdV0FZ2me/DgBXBtvWBkgufXuOcO6DMtn8ws5zgaxFwPzDcOZcFdAcmVbWDmc451xrIBaYFt0uW1wI2AacCk4H+wXqfmtls4OuSaRxAG+fcdOAqYJ2ZLQTSZipDJfqS1v1IJudco+C5/S3n3Hjn3Cjn3Iclz9fOuY7OuT8454YE6+c759qV01Zd59yU4Hl8cNj1BI8f65x7OexanHNHBtvMcM497pxzVakJFJpFkilR4ae8oDPMzLrij870DJYNCB7/l3Pu4GDZRSXhJ7i/xsxOww8+dwXL7gNygMvxRxUlPGfjvyV9Nv5xUvK7PR2Yhv97yguWjQZmVtDWTudcc6AouP8zYFZwew7QObgdwf/dl8vMZgFtgLuBsYmYe5uJghcdzwIL8P8Hj3fOnR883Bb/uaJ/sM504P8BZwXBcjpwcfBP/1LgI+AM4Ezn3N+AE5xzjwXBoNwwkyLx9GUg6d+PZLoSmBs8H+8ATmLfF6t/BYY453rg/x/4pJy2hgHvmNkpwADnXJMw6wl+bw8CTatQR0JrAa4DbjCzM4BW+NN/qkShWSR5EhV+ygs6JScXNQb2vD1vZquAAqBLjPoaAruD21vw36pfZWYVjVxK8nXFfxFzJjAf6OicOwLYZmbf4k+t2OScuxKYAlxYQVvv4weT94P7TYCtwe1tQFZweyz+P5bSRpQa1akdLHsI/0XZs1XtXCZzzh0CvAr8AVgKFOP//u51zp1jZh/jv8iuZWYDzexM4ArgDTPraWZnmtnzgAPGBM2OB/6LP/WmKfBPM7u2ggCREvH0BXiBNO9Hkq0BLnDO/cTMhuC/kPqB4G9+JvBP/OOmPDlAyXSpWfghM8x6NgM/r0INCa/FzEaY2YfB3ebA+qoWpdAskjyJCj/lBZ3RwKf4805nlNnmW+DA4PbEIPiMDu4f7pybjX8C0q+DZefgz5Fd7JzrjITCOdcBOBh/6kMb4EfAO8DtwBvBaoVAEzMbD9zD3t9zNO8Cv2DvyWab8F9kATQK7gN8jT/al1Nq29KjOiUvrj4AlptZEVIpzrn2+C+W7zGz/5UsN7P1+KOxY5xzzcxsE/7o2THOueX4gaC9c266c+5/wTbF7H3Be6WZLcWfBvEucEnqelWxWH2pKf1IFjObgv9C9CXn3KP4J4NGe7H6Jn5eW1FBc43wgybABqBFmPWY2TfVufJJgn82ADjnBgIfmNmXVa1LoVkkCRIcfsoLOsOAvwOf2L7XjmyG/8QJe6dnDAvur8EfAVoPvO+cqw8cYmbX4p+sM76y/ZWE6Q38MZhK82hwfypwffAd4O1gOeydq1yed/HfmSgJzfPZG4y7478jUeIh4LSqly4xrADON7OyL3Axs5VARzPb4Jw7FfgK/+8338zOMLMuZtYTKHveQx3gXOdcR+A2/FHaTs6/AkfoKtGXtO5Hsjj/PJepQEf8QZQriP5idTj+/48LKmhuC9AguN2YKuS7BNdTLYmuxTlXcnWWak0/VGgWSY5Ehp+Kgs4/gKtLverGOXck/qjx3PIaDEL2WPx/TnXxR6NrA8sAXbw9PL3Z+67BDPZO7dmCfxwAvAKscs7NBc6K0d6nwHL2frjCGKCtcy4f2A5MLFnRzN7Dv1pHidKjOgOr3CMBwMx2m1npD7mohT/NouTxTc65usDDwAP4f4dnBaOy04N5wLUAnHPX4F+6bQf+uwRDgEVmthg/dL5Q6pyGUMTTl5rQjyQbAlwQBMAI/jkwP+Cc64I/1eHXwO0VnMT2Dv4oPfj/Tz4NuZ7qSlgtzrmDgOeAwWZWWJ2i9OEmIkkQ/FO4xcwWOedOB4biv8JdAhxsZruDP/C/4b+S3gbMNLP7orTVGH/0twX+aFXJZcLGmdmc4K2rOcCN+E8su4G7zexN518xoz57z0a/Dv8yV0c5/2oJ7+LPfbsG/0SwYuBBM5uc8B+KiOzhnHsS/294bqll7YDhZjbUOdcSuN/MflHq8Vlm1sOV+tAP51/tpo+ZjSy13vnAWjObl6r+lBVPX4Be6d6PZHLO/Qj//ACH/87jMuA89l7q8e/AL4ERZvaOc+4J/Ofv/0RpqzX+nPDpQDegS6nR2JTXU6rNPNt7EnootTjnHsC/OsuyYFGumb1Vdr246lJoFhGp2YIXR6UVmtn50daVmiF4UV2vOvNC08X+1Jd0FgTNU4HXqzuiKtEpNIukEYUfEREpTzBq/3yZxcvMrOzVbzKunlTUotAsIiIiIhKDTgQUEREREYlBoVlEREREJAaFZhERERGRGBSaRURERERiUGgWEREREYnh/wPLUAx2pqMsdgAAAABJRU5ErkJggg==\n",
300 | "text/plain": [
301 | ""
302 | ]
303 | },
304 | "metadata": {},
305 | "output_type": "display_data"
306 | }
307 | ],
308 | "source": [
309 | "# 可视化图形展示\n",
310 | "# part 1 全局配置\n",
311 | "fig = plt.figure(figsize=(10, 7))\n",
312 | "titles = ['RECORD_RATE','AVG_ORDERS','AVG_MONEY','IS_ACTIVE','SEX'] # 共用标题\n",
313 | "line_index,col_index = 3,5 # 定义网格数\n",
314 | "ax_ids = np.arange(1,16).reshape(line_index,col_index) # 生成子网格索引值\n",
315 | "plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签\n",
316 | " \n",
317 | "# part 2 画出三个类别的占比\n",
318 | "pie_fracs = features_all['record_rate'].tolist()\n",
319 | "for ind in range(len(pie_fracs)):\n",
320 | " ax = fig.add_subplot(line_index, col_index, ax_ids[:,0][ind])\n",
321 | " init_labels = ['','',''] # 初始化空label标签\n",
322 | " init_labels[ind] = 'cluster_{0}'.format(ind) # 设置标签\n",
323 | " init_colors = ['lightgray', 'lightgray', 'lightgray']\n",
324 | " init_colors[ind] = 'g' # 设置目标面积区别颜色\n",
325 | " ax.pie(x=pie_fracs, autopct='%3.0f %%',labels=init_labels,colors=init_colors)\n",
326 | " ax.set_aspect('equal') # 设置饼图为圆形\n",
327 | " if ind == 0:\n",
328 | " ax.set_title(titles[0])\n",
329 | " \n",
330 | "# part 3 画出AVG_ORDERS均值\n",
331 | "avg_orders_label = 'AVG_ORDERS'\n",
332 | "avg_orders_fraces = features_all[avg_orders_label]\n",
333 | "for ind, frace in enumerate(avg_orders_fraces):\n",
334 | " ax = fig.add_subplot(line_index, col_index, ax_ids[:,1][ind])\n",
335 | " ax.bar(x=unique_labels,height=[0,avg_orders_fraces[ind],0])# 画出柱形图\n",
336 | " ax.set_ylim((0, max(avg_orders_fraces)*1.2))\n",
337 | " ax.set_xticks([])\n",
338 | " ax.set_yticks([])\n",
339 | " if ind == 0:# 设置总标题\n",
340 | " ax.set_title(titles[1])\n",
341 | " # 设置每个柱形图的数值标签和x轴label\n",
342 | " ax.text(unique_labels[1],frace+0.4,s='{:.2f}'.format(frace),ha='center',va='top')\n",
343 | " ax.text(unique_labels[1],-0.4,s=avg_orders_label,ha='center',va='bottom')\n",
344 | " \n",
345 | "# part 4 画出AVG_MONEY均值\n",
346 | "avg_money_label = 'AVG_MONEY'\n",
347 | "avg_money_fraces = features_all[avg_money_label]\n",
348 | "for ind, frace in enumerate(avg_money_fraces):\n",
349 | " ax = fig.add_subplot(line_index, col_index, ax_ids[:,2][ind])\n",
350 | " ax.bar(x=unique_labels,height=[0,avg_money_fraces[ind],0])# 画出柱形图\n",
351 | " ax.set_ylim((0, max(avg_money_fraces)*1.2))\n",
352 | " ax.set_xticks([])\n",
353 | " ax.set_yticks([])\n",
354 | " if ind == 0:# 设置总标题\n",
355 | " ax.set_title(titles[2])\n",
356 | " # 设置每个柱形图的数值标签和x轴label\n",
357 | " ax.text(unique_labels[1],frace+4,s='{:.0f}'.format(frace),ha='center',va='top')\n",
358 | " ax.text(unique_labels[1],-4,s=avg_money_label,ha='center',va='bottom')\n",
359 | " \n",
360 | "# part 5 画出是否活跃\n",
361 | "axtivity_labels = ['不活跃','活跃']\n",
362 | "x_ticket = [i for i in range(len(axtivity_labels))]\n",
363 | "activity_data = features_all[axtivity_labels]\n",
364 | "ylim_max = np.max(np.max(activity_data))\n",
365 | "for ind,each_data in enumerate(activity_data.values):\n",
366 | " ax = fig.add_subplot(line_index, col_index, ax_ids[:,3][ind])\n",
367 | " ax.bar(x=x_ticket,height=each_data) # 画出柱形图\n",
368 | " ax.set_ylim((0, ylim_max*1.2))\n",
369 | " ax.set_xticks([])\n",
370 | " ax.set_yticks([]) \n",
371 | " if ind == 0:# 设置总标题\n",
372 | " ax.set_title(titles[3])\n",
373 | " # 设置每个柱形图的数值标签和x轴label\n",
374 | " activity_values = ['{:.1%}'.format(i) for i in each_data]\n",
375 | " for i in range(len(x_ticket)):\n",
376 | " ax.text(x_ticket[i],each_data[i]+0.05,s=activity_values[i],ha='center',va='top')\n",
377 | " ax.text(x_ticket[i],-0.05,s=axtivity_labels[i],ha='center',va='bottom')\n",
378 | " \n",
379 | "# part 6 画出性别分布\n",
380 | "sex_data = features_all.iloc[:,-3:]\n",
381 | "x_ticket = [i for i in range(len(sex_data))]\n",
382 | "sex_labels = ['SEX_{}'.format(i) for i in range(3)]\n",
383 | "ylim_max = np.max(np.max(sex_data))\n",
384 | "for ind,each_data in enumerate(sex_data.values):\n",
385 | " ax = fig.add_subplot(line_index, col_index, ax_ids[:,4][ind])\n",
386 | " ax.bar(x=x_ticket,height=each_data) # 画柱形图\n",
387 | " ax.set_ylim((0, ylim_max*1.2))\n",
388 | " ax.set_xticks([])\n",
389 | " ax.set_yticks([])\n",
390 | " if ind == 0: # 设置标题\n",
391 | " ax.set_title(titles[4]) \n",
392 | " # 设置每个柱形图的数值标签和x轴label\n",
393 | " sex_values = ['{:.1%}'.format(i) for i in each_data]\n",
394 | " for i in range(len(x_ticket)):\n",
395 | " ax.text(x_ticket[i],each_data[i]+0.1,s=sex_values[i],ha='center',va='top')\n",
396 | " ax.text(x_ticket[i],-0.1,s=sex_labels[i],ha='center',va='bottom')\n",
397 | " \n",
398 | "plt.tight_layout(pad=0.8) #设置默认的间距"
399 | ]
400 | },
401 | {
402 | "cell_type": "markdown",
403 | "metadata": {},
404 | "source": [
405 | "# 结论\n",
406 | "\n",
407 | "聚类后,群体划分为3类:\n",
408 | "- cluster_0:显著和区分性特征是平均订单量少(仅为2.02),男性为主的客户群体;\n",
409 | "- cluster_1:平均订单量多(3.99),女性为主的客户\n",
410 | "- cluster_2:与cluster_1类似,但群体属于未知性别。\n",
411 | "\n",
412 | "鉴于平均订单价值和活跃程度在所有类别中的分布相对意志和均匀,无法达到区分的特性,也不具有表示该群体的显著性特征。因此忽略。\n",
413 | "\n",
414 | "最后,我们得到3类群体:**低价值的男性客户群体、高价值的女性客户群体以及高价值的未知性别客户群体。**\n",
415 | "\n",
416 | "**衍生的分析方向**:\n",
417 | "- 未知性别群体不应该有如此高的平均订单价值,更重要的是其样本量并不少。那么不太可能是随机发生的事件,很可能在某些方面,例如数据采集、客户体验、客户注册等方面存在某些问题,或者这类客户群体就是不愿意透露性别。可作为另一个EDA课题的开始\n",
418 | "- 第二类高价值的女性客户群体,可做用户喜欢和特征分析,例如看一下她们都是什么事件购买、客单价平均多少、集中品类、折扣力度喜欢、来源渠道、促销方式等是否有明显的集中化倾向。"
419 | ]
420 | }
421 | ],
422 | "metadata": {
423 | "kernelspec": {
424 | "display_name": "Python 3",
425 | "language": "python",
426 | "name": "python3"
427 | },
428 | "language_info": {
429 | "codemirror_mode": {
430 | "name": "ipython",
431 | "version": 3
432 | },
433 | "file_extension": ".py",
434 | "mimetype": "text/x-python",
435 | "name": "python",
436 | "nbconvert_exporter": "python",
437 | "pygments_lexer": "ipython3",
438 | "version": "3.6.8"
439 | }
440 | },
441 | "nbformat": 4,
442 | "nbformat_minor": 2
443 | }
444 |
--------------------------------------------------------------------------------