├── O2O优惠券预测 ├── O2O优惠券预测-数据探索.ipynb ├── O2O优惠券预测-模型训练.ipynb ├── O2O优惠券预测-模型验证及优化.ipynb ├── O2O优惠券预测-特征工程.ipynb └── O2O优惠券预测-赛题实践.ipynb ├── README.md ├── 天猫重复购买预测 ├── 天猫重复购买预测 02 数据探索.ipynb ├── 天猫重复购买预测 03 特征工程.ipynb ├── 天猫重复购买预测 04 模型训练、验证和评测.ipynb └── 天猫重复购买预测 05 特征优化和特征选择.ipynb ├── 工业蒸汽 ├── zhengqi_test.txt ├── zhengqi_train.txt ├── 工业蒸汽 02数据探索.ipynb ├── 工业蒸汽 03 特征工程.ipynb ├── 工业蒸汽 04 模型训练.ipynb ├── 工业蒸汽 05 模型验证.ipynb ├── 工业蒸汽 06 特征优化.ipynb └── 工业蒸汽 07 模型融合.ipynb └── 阿里云安全恶意程序检测 ├── 阿里云安全恶意程序检测-优化技巧与解决方案升级.ipynb ├── 阿里云安全恶意程序检测-数据探索.ipynb ├── 阿里云安全恶意程序检测-特征工程与基线模型.ipynb ├── 阿里云安全恶意程序检测-特征工程进阶与方案优化.ipynb └── 阿里云安全恶意程序检测-高阶数据探索.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # alibaba_tianchi_book 2 | 阿里云天池大赛赛题解析,原书代码链接: https://tianchi.aliyun.com/specials/promotion/bookcode?spm=5176.14154004.J_3941670930.15.31fe5699wu5Gtm 3 | 4 | 为了方便查看,我进行了整理,总共包含四个项目。 5 | 6 | 代码质量很高,有深度,个人觉得进阶机器学习看这本书应该是足够了。 7 | 8 | -------------------------------------------------------------------------------- /天猫重复购买预测/天猫重复购买预测 05 特征优化和特征选择.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## 导入相关包" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import pandas as pd\n", 17 | "import numpy as np\n", 18 | "\n", 19 | "import warnings\n", 20 | "warnings.filterwarnings(\"ignore\") " 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "## 读取数据(训练数据前10000行,测试数据前100条)" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 2, 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "train_data = pd.read_csv('train_all.csv',nrows=10000)\n", 37 | "test_data = pd.read_csv('test_all.csv',nrows=100)" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "## 读取全部数据" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 3, 50 | "metadata": {}, 51 | "outputs": [], 52 | "source": [ 53 | "# train_data = pd.read_csv('train_all.csv',nrows=None)\n", 54 | "# test_data = pd.read_csv('test_all.csv',nrows=None)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "## 获取训练和测试数据" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 4, 67 | "metadata": {}, 68 | "outputs": [], 69 | "source": [ 70 | "features_columns = [col for col in train_data.columns if col not in ['user_id','label']]\n", 71 | "train = train_data[features_columns].values\n", 72 | "test = test_data[features_columns].values\n", 73 | "target =train_data['label'].values" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "## 缺失值补全" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "处理缺失值有很多方法,最常用为以下几种:\n", 88 | "1. 删除。当数据量较大时,或者缺失数据占比较小时,可以使用这种方法。\n", 89 | "2. 填充。通用的方法是采用平均数、中位数来填充,可以适用插值或者模型预测的方法进行缺失补全。\n", 90 | "3. 不处理。树类模型对缺失值不明感。" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "#### 采用中值进行填充" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 5, 103 | "metadata": {}, 104 | "outputs": [], 105 | "source": [ 106 | "# from sklearn.preprocessing import Imputer\n", 107 | "# imputer = Imputer(strategy=\"median\")\n", 108 | "\n", 109 | "from sklearn.impute import SimpleImputer\n", 110 | "\n", 111 | "imputer = SimpleImputer(missing_values=np.nan, strategy='mean')\n", 112 | "imputer = imputer.fit(train)\n", 113 | "train_imputer = imputer.transform(train)\n", 114 | "test_imputer = imputer.transform(test)" 115 | ] 116 | }, 117 | { 118 | "cell_type": "markdown", 119 | "metadata": {}, 120 | "source": [ 121 | "## 特征选择概念" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": {}, 127 | "source": [ 128 | "在机器学习和统计学中,特征选择(英语:feature selection)也被称为变量选择、属性选择 或变量子集选择 。它是指:为了构建模型而选择相关特征(即属性、指标)子集的过程。使用特征选择技术有三个原因:\n", 129 | "\n", 130 | " 简化模型,使之更易于被研究人员或用户理解,\n", 131 | " 缩短训练时间,\n", 132 | " 改善通用性、降低过拟合(即降低方差)。\n", 133 | "\n", 134 | "要使用特征选择技术的关键假设是:训练数据包含许多冗余 或无关 的特征,因而移除这些特征并不会导致丢失信息。 冗余 或无关 特征是两个不同的概念。如果一个特征本身有用,但如果这个特征与另一个有用特征强相关,且那个特征也出现在数据中,那么这个特征可能就变得多余。\n", 135 | "特征选择技术与特征提取有所不同。特征提取是从原有特征的功能中创造新的特征,而特征选择则只返回原有特征中的子集。 特征选择技术的常常用于许多特征但样本(即数据点)相对较少的领域。特征选择应用的典型用例包括:解析书面文本和微阵列数据,这些场景下特征成千上万,但样本只有几十到几百个。" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": 6, 141 | "metadata": {}, 142 | "outputs": [], 143 | "source": [ 144 | "from sklearn.model_selection import cross_val_score\n", 145 | "from sklearn.ensemble import RandomForestClassifier\n", 146 | "\n", 147 | "def feature_selection(train, train_sel, target):\n", 148 | " clf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0, n_jobs=-1)\n", 149 | " \n", 150 | " scores = cross_val_score(clf, train, target, cv=5)\n", 151 | " scores_sel = cross_val_score(clf, train_sel, target, cv=5)\n", 152 | " \n", 153 | " print(\"No Select Accuracy: %0.2f (+/- %0.2f)\" % (scores.mean(), scores.std() * 2)) \n", 154 | " print(\"Features Select Accuracy: %0.2f (+/- %0.2f)\" % (scores.mean(), scores.std() * 2))" 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "metadata": {}, 160 | "source": [ 161 | "### 删除方差较小的要素(方法一)\n", 162 | "VarianceThreshold是一种简单的基线特征选择方法。它会删除方差不符合某个阈值的所有要素。默认情况下,它会删除所有零方差要素,即在所有样本中具有相同值的要素。" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 7, 168 | "metadata": {}, 169 | "outputs": [ 170 | { 171 | "name": "stdout", 172 | "output_type": "stream", 173 | "text": [ 174 | "训练数据未特征筛选维度 (2000, 229)\n", 175 | "训练数据特征筛选维度后 (2000, 29)\n" 176 | ] 177 | } 178 | ], 179 | "source": [ 180 | "from sklearn.feature_selection import VarianceThreshold\n", 181 | "\n", 182 | "sel = VarianceThreshold(threshold=(.8 * (1 - .8)))\n", 183 | "sel = sel.fit(train)\n", 184 | "train_sel = sel.transform(train)\n", 185 | "test_sel = sel.transform(test)\n", 186 | "print('训练数据未特征筛选维度', train.shape)\n", 187 | "print('训练数据特征筛选维度后', train_sel.shape)" 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "metadata": {}, 193 | "source": [ 194 | "### 特征选择前后区别" 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": 8, 200 | "metadata": {}, 201 | "outputs": [ 202 | { 203 | "name": "stdout", 204 | "output_type": "stream", 205 | "text": [ 206 | "No Select Accuracy: 0.93 (+/- 0.00)\n", 207 | "Features Select Accuracy: 0.93 (+/- 0.00)\n" 208 | ] 209 | } 210 | ], 211 | "source": [ 212 | "feature_selection(train, train_sel, target)" 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "metadata": {}, 218 | "source": [ 219 | "### 单变量特征选择(方法二)\n", 220 | "通过基于单变量统计检验选择最佳特征。" 221 | ] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "execution_count": 9, 226 | "metadata": {}, 227 | "outputs": [ 228 | { 229 | "name": "stdout", 230 | "output_type": "stream", 231 | "text": [ 232 | "训练数据未特征筛选维度 (2000, 229)\n", 233 | "训练数据特征筛选维度后 (2000, 2)\n" 234 | ] 235 | } 236 | ], 237 | "source": [ 238 | "from sklearn.feature_selection import SelectKBest\n", 239 | "# from sklearn.feature_selection import chi2\n", 240 | "from sklearn.feature_selection import mutual_info_classif\n", 241 | "\n", 242 | "sel = SelectKBest(mutual_info_classif, k=2)\n", 243 | "sel = sel.fit(train, target)\n", 244 | "train_sel = sel.transform(train)\n", 245 | "test_sel = sel.transform(test)\n", 246 | "print('训练数据未特征筛选维度', train.shape)\n", 247 | "print('训练数据特征筛选维度后', train_sel.shape)" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": 10, 253 | "metadata": {}, 254 | "outputs": [ 255 | { 256 | "name": "stdout", 257 | "output_type": "stream", 258 | "text": [ 259 | "训练数据未特征筛选维度 (2000, 229)\n", 260 | "训练数据特征筛选维度后 (2000, 10)\n" 261 | ] 262 | } 263 | ], 264 | "source": [ 265 | "sel = SelectKBest(mutual_info_classif, k=10)\n", 266 | "sel = sel.fit(train, target)\n", 267 | "train_sel = sel.transform(train)\n", 268 | "test_sel = sel.transform(test)\n", 269 | "print('训练数据未特征筛选维度', train.shape)\n", 270 | "print('训练数据特征筛选维度后', train_sel.shape)" 271 | ] 272 | }, 273 | { 274 | "cell_type": "markdown", 275 | "metadata": {}, 276 | "source": [ 277 | "### 特征选择前后区别" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": 11, 283 | "metadata": {}, 284 | "outputs": [ 285 | { 286 | "name": "stdout", 287 | "output_type": "stream", 288 | "text": [ 289 | "No Select Accuracy: 0.93 (+/- 0.00)\n", 290 | "Features Select Accuracy: 0.93 (+/- 0.00)\n" 291 | ] 292 | } 293 | ], 294 | "source": [ 295 | "feature_selection(train, train_sel, target)" 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": {}, 301 | "source": [ 302 | "### 递归功能消除(方法三)\n", 303 | "选定模型拟合,进行递归拟合,每次把评分低得特征去除,重复上诉循环。" 304 | ] 305 | }, 306 | { 307 | "cell_type": "code", 308 | "execution_count": 12, 309 | "metadata": {}, 310 | "outputs": [ 311 | { 312 | "name": "stdout", 313 | "output_type": "stream", 314 | "text": [ 315 | "[False False False False False False False False False False False False\n", 316 | " False False False False False False False False False False False False\n", 317 | " False False False False False False False False False False False False\n", 318 | " False False False False False False False False False False False False\n", 319 | " False False False False False False False False False False False False\n", 320 | " False False False False False False False False False False False False\n", 321 | " False False False False False False False False False False False False\n", 322 | " False False False False False False False False False False False False\n", 323 | " False False False False False False False False False False False False\n", 324 | " False False False False False False False False False False False False\n", 325 | " False False False False False False False False False False False False\n", 326 | " False False False False False False False False False False False False\n", 327 | " False False False False False False False False False False False False\n", 328 | " False False False False False False False False False False False False\n", 329 | " False False False False False False False False False False False True\n", 330 | " True False False False False False False False False False False True\n", 331 | " False True False False False False True False False True False False\n", 332 | " False True False True False True False True False False False False\n", 333 | " False False False False False False False False False False False False\n", 334 | " False]\n", 335 | "[220 219 218 217 216 215 213 212 211 210 209 208 207 206 205 204 203 202\n", 336 | " 201 200 197 195 192 187 186 185 184 183 182 181 180 179 178 177 176 175\n", 337 | " 174 173 172 171 170 169 168 167 166 165 164 163 162 161 160 158 157 155\n", 338 | " 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 137 136\n", 339 | " 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 117 116 115\n", 340 | " 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97\n", 341 | " 95 94 93 92 91 90 89 88 87 214 86 85 84 83 189 193 199 198\n", 342 | " 196 194 191 190 188 81 80 79 77 74 73 72 71 69 68 67 66 65\n", 343 | " 64 156 63 61 60 59 58 57 55 54 138 135 50 48 46 45 44 42\n", 344 | " 39 119 118 38 35 31 28 25 24 22 17 15 14 96 13 12 43 1\n", 345 | " 1 4 82 75 78 76 26 30 70 20 7 1 62 1 51 53 49 47\n", 346 | " 1 27 41 1 23 21 18 1 11 1 19 1 6 1 8 29 2 5\n", 347 | " 10 3 9 16 32 33 34 36 37 40 52 56 159]\n" 348 | ] 349 | } 350 | ], 351 | "source": [ 352 | "from sklearn.feature_selection import RFECV\n", 353 | "from sklearn.ensemble import RandomForestClassifier\n", 354 | "\n", 355 | "clf = RandomForestClassifier(n_estimators=10, max_depth=2, random_state=0, n_jobs=-1)\n", 356 | "selector = RFECV(clf, step=1, cv=2)\n", 357 | "selector = selector.fit(train, target)\n", 358 | "print(selector.support_)\n", 359 | "print(selector.ranking_)" 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "metadata": {}, 365 | "source": [ 366 | "### 使用模型选择特征(方法四)" 367 | ] 368 | }, 369 | { 370 | "cell_type": "markdown", 371 | "metadata": {}, 372 | "source": [ 373 | "#### 使用LR拟合的参数进行变量选择(L2范数进行特征选择)\n", 374 | "LR模型采用拟合参数形式进行变量选择,筛选对回归目标影响大的" 375 | ] 376 | }, 377 | { 378 | "cell_type": "code", 379 | "execution_count": 13, 380 | "metadata": {}, 381 | "outputs": [ 382 | { 383 | "name": "stdout", 384 | "output_type": "stream", 385 | "text": [ 386 | "训练数据未特征筛选维度 (2000, 229)\n", 387 | "训练数据特征筛选维度后 (2000, 19)\n" 388 | ] 389 | } 390 | ], 391 | "source": [ 392 | "from sklearn.feature_selection import SelectFromModel\n", 393 | "from sklearn.linear_model import LogisticRegression\n", 394 | "from sklearn.preprocessing import Normalizer\n", 395 | "\n", 396 | "normalizer = Normalizer()\n", 397 | "normalizer = normalizer.fit(train) \n", 398 | "\n", 399 | "train_norm = normalizer.transform(train) \n", 400 | "test_norm = normalizer.transform(test)\n", 401 | "\n", 402 | "LR = LogisticRegression(penalty='l2',C=5)\n", 403 | "LR = LR.fit(train_norm, target)\n", 404 | "model = SelectFromModel(LR, prefit=True)\n", 405 | "train_sel = model.transform(train)\n", 406 | "test_sel = model.transform(test)\n", 407 | "print('训练数据未特征筛选维度', train.shape)\n", 408 | "print('训练数据特征筛选维度后', train_sel.shape)" 409 | ] 410 | }, 411 | { 412 | "cell_type": "markdown", 413 | "metadata": {}, 414 | "source": [ 415 | "##### L2范数选择参数" 416 | ] 417 | }, 418 | { 419 | "cell_type": "code", 420 | "execution_count": 14, 421 | "metadata": {}, 422 | "outputs": [ 423 | { 424 | "data": { 425 | "text/plain": [ 426 | "array([ 0.27519508, -0.02736226, -0.00522652, 0.90644126, -0.4310027 ,\n", 427 | " -0.25110925, -0.4058899 , 0.29059019, 0.10568508, -0.02731211])" 428 | ] 429 | }, 430 | "execution_count": 14, 431 | "metadata": {}, 432 | "output_type": "execute_result" 433 | } 434 | ], 435 | "source": [ 436 | "LR.coef_[0][:10]" 437 | ] 438 | }, 439 | { 440 | "cell_type": "markdown", 441 | "metadata": {}, 442 | "source": [ 443 | "### 特征选择前后区别" 444 | ] 445 | }, 446 | { 447 | "cell_type": "code", 448 | "execution_count": 15, 449 | "metadata": {}, 450 | "outputs": [ 451 | { 452 | "name": "stdout", 453 | "output_type": "stream", 454 | "text": [ 455 | "No Select Accuracy: 0.93 (+/- 0.00)\n", 456 | "Features Select Accuracy: 0.93 (+/- 0.00)\n" 457 | ] 458 | } 459 | ], 460 | "source": [ 461 | "feature_selection(train, train_sel, target)" 462 | ] 463 | }, 464 | { 465 | "cell_type": "markdown", 466 | "metadata": {}, 467 | "source": [ 468 | "#### 使用LR拟合的参数进行变量选择(L1范数进行特征选择)\n", 469 | "LR模型采用拟合参数形式进行变量选择,筛选对回归目标影响大的" 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": 16, 475 | "metadata": {}, 476 | "outputs": [], 477 | "source": [ 478 | "# from sklearn.feature_selection import SelectFromModel\n", 479 | "# from sklearn.linear_model import LogisticRegression\n", 480 | "# from sklearn.preprocessing import Normalizer\n", 481 | "\n", 482 | "# normalizer = Normalizer()\n", 483 | "# normalizer = normalizer.fit(train) \n", 484 | "\n", 485 | "# train_norm = normalizer.transform(train) \n", 486 | "# test_norm = normalizer.transform(test)\n", 487 | "\n", 488 | "# LR = LogisticRegression(penalty='l1',C=5)\n", 489 | "# LR = LR.fit(train_norm, target)\n", 490 | "# model = SelectFromModel(LR, prefit=True)\n", 491 | "# train_sel = model.transform(train)\n", 492 | "# test_sel = model.transform(test)\n", 493 | "# print('训练数据未特征筛选维度', train.shape)\n", 494 | "# print('训练数据特征筛选维度后', train_sel.shape)" 495 | ] 496 | }, 497 | { 498 | "cell_type": "markdown", 499 | "metadata": {}, 500 | "source": [ 501 | "##### L1范数选择参数\n", 502 | "对于α的良好选择,只要满足某些特定条件,Lasso就可以仅使用少量观察来完全恢复精确的非零变量集。" 503 | ] 504 | }, 505 | { 506 | "cell_type": "code", 507 | "execution_count": 17, 508 | "metadata": {}, 509 | "outputs": [], 510 | "source": [ 511 | "# LR.coef_[0][:10]" 512 | ] 513 | }, 514 | { 515 | "cell_type": "markdown", 516 | "metadata": {}, 517 | "source": [ 518 | "### 特征选择前后区别" 519 | ] 520 | }, 521 | { 522 | "cell_type": "code", 523 | "execution_count": 18, 524 | "metadata": {}, 525 | "outputs": [ 526 | { 527 | "name": "stdout", 528 | "output_type": "stream", 529 | "text": [ 530 | "No Select Accuracy: 0.93 (+/- 0.00)\n", 531 | "Features Select Accuracy: 0.93 (+/- 0.00)\n" 532 | ] 533 | } 534 | ], 535 | "source": [ 536 | "feature_selection(train, train_sel, target)" 537 | ] 538 | }, 539 | { 540 | "cell_type": "markdown", 541 | "metadata": {}, 542 | "source": [ 543 | "### 基于树模型特征选择\n", 544 | "树模型基于分裂评价标准所计算的总的评分作为依据进行相关排序,然后进行特征筛选" 545 | ] 546 | }, 547 | { 548 | "cell_type": "code", 549 | "execution_count": 19, 550 | "metadata": {}, 551 | "outputs": [ 552 | { 553 | "name": "stdout", 554 | "output_type": "stream", 555 | "text": [ 556 | "训练数据未特征筛选维度 (2000, 229)\n", 557 | "训练数据特征筛选维度后 (2000, 71)\n" 558 | ] 559 | } 560 | ], 561 | "source": [ 562 | "from sklearn.ensemble import ExtraTreesClassifier\n", 563 | "from sklearn.feature_selection import SelectFromModel\n", 564 | "\n", 565 | "clf = ExtraTreesClassifier(n_estimators=50)\n", 566 | "clf = clf.fit(train, target)\n", 567 | "\n", 568 | "model = SelectFromModel(clf, prefit=True)\n", 569 | "train_sel = model.transform(train)\n", 570 | "test_sel = model.transform(test)\n", 571 | "print('训练数据未特征筛选维度', train.shape)\n", 572 | "print('训练数据特征筛选维度后', train_sel.shape)" 573 | ] 574 | }, 575 | { 576 | "cell_type": "markdown", 577 | "metadata": {}, 578 | "source": [ 579 | "#### 树特征重要性" 580 | ] 581 | }, 582 | { 583 | "cell_type": "code", 584 | "execution_count": 20, 585 | "metadata": {}, 586 | "outputs": [ 587 | { 588 | "data": { 589 | "text/plain": [ 590 | "array([0.09210871, 0.00578114, 0.00388741, 0.0047027 , 0.00324662,\n", 591 | " 0.00409547, 0.00560588, 0.00399393, 0.00499705, 0.00233944])" 592 | ] 593 | }, 594 | "execution_count": 20, 595 | "metadata": {}, 596 | "output_type": "execute_result" 597 | } 598 | ], 599 | "source": [ 600 | "clf.feature_importances_[:10]" 601 | ] 602 | }, 603 | { 604 | "cell_type": "code", 605 | "execution_count": 21, 606 | "metadata": {}, 607 | "outputs": [], 608 | "source": [ 609 | "df_features_import = pd.DataFrame()\n", 610 | "df_features_import['features_import'] = clf.feature_importances_\n", 611 | "df_features_import['features_name'] = features_columns" 612 | ] 613 | }, 614 | { 615 | "cell_type": "code", 616 | "execution_count": 22, 617 | "metadata": {}, 618 | "outputs": [ 619 | { 620 | "data": { 621 | "text/html": [ 622 | "
\n", 623 | "\n", 636 | "\n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | "
features_importfeatures_name
00.092109merchant_id
2280.085244xgb_clf
2270.056583lgb_clf
1990.007003embeeding_72
1790.006930embeeding_52
180.006444seller_most_1_cnt
2070.006367embeeding_80
1930.006110embeeding_66
1900.006107embeeding_63
1320.006077embeeding_5
1440.005996embeeding_17
1460.005913embeeding_19
10.005781age_range
1580.005715embeeding_31
1910.005701embeeding_64
1650.005673embeeding_38
150.005648cat_most_1
60.005606brand_nunique
220.005488user_cnt_0
2200.005485embeeding_93
1660.005473embeeding_39
870.005472tfidf_60
1270.005463embeeding_0
1960.005427embeeding_69
2050.005407embeeding_78
1470.005402embeeding_20
1630.005347embeeding_36
1920.005328embeeding_65
1690.005280embeeding_42
500.005277tfidf_23
\n", 797 | "
" 798 | ], 799 | "text/plain": [ 800 | " features_import features_name\n", 801 | "0 0.092109 merchant_id\n", 802 | "228 0.085244 xgb_clf\n", 803 | "227 0.056583 lgb_clf\n", 804 | "199 0.007003 embeeding_72\n", 805 | "179 0.006930 embeeding_52\n", 806 | "18 0.006444 seller_most_1_cnt\n", 807 | "207 0.006367 embeeding_80\n", 808 | "193 0.006110 embeeding_66\n", 809 | "190 0.006107 embeeding_63\n", 810 | "132 0.006077 embeeding_5\n", 811 | "144 0.005996 embeeding_17\n", 812 | "146 0.005913 embeeding_19\n", 813 | "1 0.005781 age_range\n", 814 | "158 0.005715 embeeding_31\n", 815 | "191 0.005701 embeeding_64\n", 816 | "165 0.005673 embeeding_38\n", 817 | "15 0.005648 cat_most_1\n", 818 | "6 0.005606 brand_nunique\n", 819 | "22 0.005488 user_cnt_0\n", 820 | "220 0.005485 embeeding_93\n", 821 | "166 0.005473 embeeding_39\n", 822 | "87 0.005472 tfidf_60\n", 823 | "127 0.005463 embeeding_0\n", 824 | "196 0.005427 embeeding_69\n", 825 | "205 0.005407 embeeding_78\n", 826 | "147 0.005402 embeeding_20\n", 827 | "163 0.005347 embeeding_36\n", 828 | "192 0.005328 embeeding_65\n", 829 | "169 0.005280 embeeding_42\n", 830 | "50 0.005277 tfidf_23" 831 | ] 832 | }, 833 | "execution_count": 22, 834 | "metadata": {}, 835 | "output_type": "execute_result" 836 | } 837 | ], 838 | "source": [ 839 | "df_features_import.sort_values(['features_import'],ascending=0).head(30)" 840 | ] 841 | }, 842 | { 843 | "cell_type": "code", 844 | "execution_count": 23, 845 | "metadata": {}, 846 | "outputs": [], 847 | "source": [ 848 | "# features_columns" 849 | ] 850 | }, 851 | { 852 | "cell_type": "markdown", 853 | "metadata": {}, 854 | "source": [ 855 | "### 特征选择前后区别" 856 | ] 857 | }, 858 | { 859 | "cell_type": "code", 860 | "execution_count": 24, 861 | "metadata": {}, 862 | "outputs": [ 863 | { 864 | "name": "stdout", 865 | "output_type": "stream", 866 | "text": [ 867 | "No Select Accuracy: 0.93 (+/- 0.00)\n", 868 | "Features Select Accuracy: 0.93 (+/- 0.00)\n" 869 | ] 870 | } 871 | ], 872 | "source": [ 873 | "feature_selection(train, train_sel, target)" 874 | ] 875 | }, 876 | { 877 | "cell_type": "markdown", 878 | "metadata": {}, 879 | "source": [ 880 | "### Lgb特征重要性" 881 | ] 882 | }, 883 | { 884 | "cell_type": "code", 885 | "execution_count": 25, 886 | "metadata": {}, 887 | "outputs": [ 888 | { 889 | "name": "stdout", 890 | "output_type": "stream", 891 | "text": [ 892 | "[LightGBM] [Warning] Unknown parameter: colsample_bylevel\n", 893 | "[LightGBM] [Warning] Unknown parameter: tree_method\n", 894 | "[LightGBM] [Warning] Unknown parameter: silent\n", 895 | "[LightGBM] [Warning] Unknown parameter: colsample_bylevel\n", 896 | "[LightGBM] [Warning] Unknown parameter: tree_method\n", 897 | "[LightGBM] [Warning] Unknown parameter: silent\n", 898 | "[LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.006242 seconds.\n", 899 | "You can set `force_row_wise=true` to remove the overhead.\n", 900 | "And if memory is not enough, you can set `force_col_wise=true`.\n", 901 | "[LightGBM] [Info] Total Bins 32114\n", 902 | "[LightGBM] [Info] Number of data points in the train set: 1200, number of used features: 224\n", 903 | "[LightGBM] [Warning] Unknown parameter: colsample_bylevel\n", 904 | "[LightGBM] [Warning] Unknown parameter: tree_method\n", 905 | "[LightGBM] [Warning] Unknown parameter: silent\n", 906 | "[LightGBM] [Info] Start training from score -0.068100\n", 907 | "[LightGBM] [Info] Start training from score -2.720629\n", 908 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 909 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 910 | "[1]\tvalid_0's multi_logloss: 0.256738\n", 911 | "Training until validation scores don't improve for 100 rounds\n", 912 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 913 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 914 | "[2]\tvalid_0's multi_logloss: 0.256574\n", 915 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 916 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 917 | "[3]\tvalid_0's multi_logloss: 0.256518\n", 918 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 919 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 920 | "[4]\tvalid_0's multi_logloss: 0.25657\n", 921 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 922 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 923 | "[5]\tvalid_0's multi_logloss: 0.256756\n", 924 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 925 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 926 | "[6]\tvalid_0's multi_logloss: 0.25682\n", 927 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 928 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 929 | "[7]\tvalid_0's multi_logloss: 0.256989\n", 930 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 931 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 932 | "[8]\tvalid_0's multi_logloss: 0.257236\n", 933 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 934 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 935 | "[9]\tvalid_0's multi_logloss: 0.25712\n", 936 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 937 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 938 | "[10]\tvalid_0's multi_logloss: 0.257011\n", 939 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 940 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 941 | "[11]\tvalid_0's multi_logloss: 0.257042\n", 942 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 943 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 944 | "[12]\tvalid_0's multi_logloss: 0.257402\n", 945 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 946 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 947 | "[13]\tvalid_0's multi_logloss: 0.257419\n", 948 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 949 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 950 | "[14]\tvalid_0's multi_logloss: 0.257646\n", 951 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 952 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 953 | "[15]\tvalid_0's multi_logloss: 0.257552\n", 954 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 955 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 956 | "[16]\tvalid_0's multi_logloss: 0.257604\n", 957 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 958 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 959 | "[17]\tvalid_0's multi_logloss: 0.257797\n", 960 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 961 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 962 | "[18]\tvalid_0's multi_logloss: 0.257928\n", 963 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 964 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 965 | "[19]\tvalid_0's multi_logloss: 0.258142\n", 966 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 967 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 968 | "[20]\tvalid_0's multi_logloss: 0.25847\n", 969 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 970 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 971 | "[21]\tvalid_0's multi_logloss: 0.258654\n", 972 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 973 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 974 | "[22]\tvalid_0's multi_logloss: 0.258846\n", 975 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 976 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 977 | "[23]\tvalid_0's multi_logloss: 0.258962\n", 978 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 979 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 980 | "[24]\tvalid_0's multi_logloss: 0.258991\n", 981 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 982 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 983 | "[25]\tvalid_0's multi_logloss: 0.259334\n", 984 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 985 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 986 | "[26]\tvalid_0's multi_logloss: 0.259433\n", 987 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 988 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 989 | "[27]\tvalid_0's multi_logloss: 0.259912\n", 990 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 991 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 992 | "[28]\tvalid_0's multi_logloss: 0.260153\n", 993 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 994 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 995 | "[29]\tvalid_0's multi_logloss: 0.260576\n", 996 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 997 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 998 | "[30]\tvalid_0's multi_logloss: 0.26094\n", 999 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1000 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1001 | "[31]\tvalid_0's multi_logloss: 0.261198\n", 1002 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1003 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1004 | "[32]\tvalid_0's multi_logloss: 0.26141\n", 1005 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1006 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1007 | "[33]\tvalid_0's multi_logloss: 0.261614\n", 1008 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1009 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1010 | "[34]\tvalid_0's multi_logloss: 0.261801\n", 1011 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1012 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1013 | "[35]\tvalid_0's multi_logloss: 0.261931\n", 1014 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1015 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1016 | "[36]\tvalid_0's multi_logloss: 0.262242\n", 1017 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1018 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1019 | "[37]\tvalid_0's multi_logloss: 0.262492\n", 1020 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1021 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1022 | "[38]\tvalid_0's multi_logloss: 0.26273\n", 1023 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1024 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1025 | "[39]\tvalid_0's multi_logloss: 0.262855\n", 1026 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1027 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1028 | "[40]\tvalid_0's multi_logloss: 0.263225\n", 1029 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1030 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1031 | "[41]\tvalid_0's multi_logloss: 0.263311\n", 1032 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1033 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1034 | "[42]\tvalid_0's multi_logloss: 0.263612\n", 1035 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n" 1036 | ] 1037 | }, 1038 | { 1039 | "name": "stdout", 1040 | "output_type": "stream", 1041 | "text": [ 1042 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1043 | "[43]\tvalid_0's multi_logloss: 0.263937\n", 1044 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1045 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1046 | "[44]\tvalid_0's multi_logloss: 0.264398\n", 1047 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1048 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1049 | "[45]\tvalid_0's multi_logloss: 0.264822\n", 1050 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1051 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1052 | "[46]\tvalid_0's multi_logloss: 0.264977\n", 1053 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1054 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1055 | "[47]\tvalid_0's multi_logloss: 0.265401\n", 1056 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1057 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1058 | "[48]\tvalid_0's multi_logloss: 0.265718\n", 1059 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1060 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1061 | "[49]\tvalid_0's multi_logloss: 0.265859\n", 1062 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1063 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1064 | "[50]\tvalid_0's multi_logloss: 0.266173\n", 1065 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1066 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1067 | "[51]\tvalid_0's multi_logloss: 0.266544\n", 1068 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1069 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1070 | "[52]\tvalid_0's multi_logloss: 0.266719\n", 1071 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1072 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1073 | "[53]\tvalid_0's multi_logloss: 0.266817\n", 1074 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1075 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1076 | "[54]\tvalid_0's multi_logloss: 0.267013\n", 1077 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1078 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1079 | "[55]\tvalid_0's multi_logloss: 0.267385\n", 1080 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1081 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1082 | "[56]\tvalid_0's multi_logloss: 0.267389\n", 1083 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1084 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1085 | "[57]\tvalid_0's multi_logloss: 0.267662\n", 1086 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1087 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1088 | "[58]\tvalid_0's multi_logloss: 0.267792\n", 1089 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1090 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1091 | "[59]\tvalid_0's multi_logloss: 0.268017\n", 1092 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1093 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1094 | "[60]\tvalid_0's multi_logloss: 0.268158\n", 1095 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1096 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1097 | "[61]\tvalid_0's multi_logloss: 0.268437\n", 1098 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1099 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1100 | "[62]\tvalid_0's multi_logloss: 0.268773\n", 1101 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1102 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1103 | "[63]\tvalid_0's multi_logloss: 0.268824\n", 1104 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1105 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1106 | "[64]\tvalid_0's multi_logloss: 0.269138\n", 1107 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1108 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1109 | "[65]\tvalid_0's multi_logloss: 0.269357\n", 1110 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1111 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1112 | "[66]\tvalid_0's multi_logloss: 0.269572\n", 1113 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1114 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1115 | "[67]\tvalid_0's multi_logloss: 0.269786\n", 1116 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1117 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1118 | "[68]\tvalid_0's multi_logloss: 0.270102\n", 1119 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1120 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1121 | "[69]\tvalid_0's multi_logloss: 0.270435\n", 1122 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1123 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1124 | "[70]\tvalid_0's multi_logloss: 0.270566\n", 1125 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1126 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1127 | "[71]\tvalid_0's multi_logloss: 0.270679\n", 1128 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1129 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1130 | "[72]\tvalid_0's multi_logloss: 0.271056\n", 1131 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1132 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1133 | "[73]\tvalid_0's multi_logloss: 0.271474\n", 1134 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1135 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1136 | "[74]\tvalid_0's multi_logloss: 0.27168\n", 1137 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1138 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1139 | "[75]\tvalid_0's multi_logloss: 0.271918\n", 1140 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1141 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1142 | "[76]\tvalid_0's multi_logloss: 0.271937\n", 1143 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1144 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1145 | "[77]\tvalid_0's multi_logloss: 0.272113\n", 1146 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1147 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1148 | "[78]\tvalid_0's multi_logloss: 0.27242\n", 1149 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1150 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1151 | "[79]\tvalid_0's multi_logloss: 0.272712\n", 1152 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1153 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1154 | "[80]\tvalid_0's multi_logloss: 0.27267\n", 1155 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1156 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1157 | "[81]\tvalid_0's multi_logloss: 0.273019\n", 1158 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1159 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1160 | "[82]\tvalid_0's multi_logloss: 0.272981\n", 1161 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1162 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1163 | "[83]\tvalid_0's multi_logloss: 0.273218\n", 1164 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1165 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1166 | "[84]\tvalid_0's multi_logloss: 0.27353\n", 1167 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1168 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1169 | "[85]\tvalid_0's multi_logloss: 0.273649\n", 1170 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1171 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1172 | "[86]\tvalid_0's multi_logloss: 0.273775\n", 1173 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1174 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1175 | "[87]\tvalid_0's multi_logloss: 0.273835\n", 1176 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1177 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1178 | "[88]\tvalid_0's multi_logloss: 0.274091\n", 1179 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1180 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1181 | "[89]\tvalid_0's multi_logloss: 0.274422\n", 1182 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1183 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1184 | "[90]\tvalid_0's multi_logloss: 0.274716\n", 1185 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1186 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1187 | "[91]\tvalid_0's multi_logloss: 0.275082\n", 1188 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n" 1189 | ] 1190 | }, 1191 | { 1192 | "name": "stdout", 1193 | "output_type": "stream", 1194 | "text": [ 1195 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1196 | "[92]\tvalid_0's multi_logloss: 0.275278\n", 1197 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1198 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1199 | "[93]\tvalid_0's multi_logloss: 0.275447\n", 1200 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1201 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1202 | "[94]\tvalid_0's multi_logloss: 0.275438\n", 1203 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1204 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1205 | "[95]\tvalid_0's multi_logloss: 0.275778\n", 1206 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1207 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1208 | "[96]\tvalid_0's multi_logloss: 0.27591\n", 1209 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1210 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1211 | "[97]\tvalid_0's multi_logloss: 0.276129\n", 1212 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1213 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1214 | "[98]\tvalid_0's multi_logloss: 0.276326\n", 1215 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1216 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1217 | "[99]\tvalid_0's multi_logloss: 0.276449\n", 1218 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1219 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1220 | "[100]\tvalid_0's multi_logloss: 0.276745\n", 1221 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1222 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1223 | "[101]\tvalid_0's multi_logloss: 0.276895\n", 1224 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1225 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1226 | "[102]\tvalid_0's multi_logloss: 0.276914\n", 1227 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1228 | "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n", 1229 | "[103]\tvalid_0's multi_logloss: 0.277281\n", 1230 | "Early stopping, best iteration is:\n", 1231 | "[3]\tvalid_0's multi_logloss: 0.256518\n" 1232 | ] 1233 | } 1234 | ], 1235 | "source": [ 1236 | "import lightgbm\n", 1237 | "from sklearn.model_selection import train_test_split\n", 1238 | "\n", 1239 | "X_train, X_test, y_train, y_test = train_test_split(train, target, test_size=0.4, random_state=0)\n", 1240 | "\n", 1241 | "clf = lightgbm\n", 1242 | "\n", 1243 | "train_matrix = clf.Dataset(X_train, label=y_train)\n", 1244 | "test_matrix = clf.Dataset(X_test, label=y_test)\n", 1245 | "params = {\n", 1246 | " 'boosting_type': 'gbdt',\n", 1247 | " #'boosting_type': 'dart',\n", 1248 | " 'objective': 'multiclass',\n", 1249 | " 'metric': 'multi_logloss',\n", 1250 | " 'min_child_weight': 1.5,\n", 1251 | " 'num_leaves': 2**5,\n", 1252 | " 'lambda_l2': 10,\n", 1253 | " 'subsample': 0.7,\n", 1254 | " 'colsample_bytree': 0.7,\n", 1255 | " 'colsample_bylevel': 0.7,\n", 1256 | " 'learning_rate': 0.03,\n", 1257 | " 'tree_method': 'exact',\n", 1258 | " 'seed': 2017,\n", 1259 | " \"num_class\": 2,\n", 1260 | " 'silent': True,\n", 1261 | " }\n", 1262 | "num_round = 10000\n", 1263 | "early_stopping_rounds = 100\n", 1264 | "model = clf.train(params, \n", 1265 | " train_matrix,\n", 1266 | " num_round,\n", 1267 | " valid_sets=test_matrix,\n", 1268 | " early_stopping_rounds=early_stopping_rounds)" 1269 | ] 1270 | }, 1271 | { 1272 | "cell_type": "code", 1273 | "execution_count": 26, 1274 | "metadata": {}, 1275 | "outputs": [], 1276 | "source": [ 1277 | "def lgb_transform(train, test, model, topK):\n", 1278 | " train_df = pd.DataFrame(train)\n", 1279 | " train_df.columns = range(train.shape[1])\n", 1280 | " \n", 1281 | " test_df = pd.DataFrame(test)\n", 1282 | " test_df.columns = range(test.shape[1])\n", 1283 | " \n", 1284 | " features_import = pd.DataFrame()\n", 1285 | " features_import['importance'] = model.feature_importance()\n", 1286 | " features_import['col'] = range(train.shape[1])\n", 1287 | " \n", 1288 | " features_import = features_import.sort_values(['importance'],ascending=0).head(topK)\n", 1289 | " sel_col = list(features_import.col)\n", 1290 | " \n", 1291 | " train_sel = train_df[sel_col]\n", 1292 | " test_sel = test_df[sel_col]\n", 1293 | " return train_sel, test_sel" 1294 | ] 1295 | }, 1296 | { 1297 | "cell_type": "code", 1298 | "execution_count": 27, 1299 | "metadata": {}, 1300 | "outputs": [ 1301 | { 1302 | "name": "stdout", 1303 | "output_type": "stream", 1304 | "text": [ 1305 | "训练数据未特征筛选维度 (2000, 229)\n", 1306 | "训练数据特征筛选维度后 (2000, 20)\n" 1307 | ] 1308 | } 1309 | ], 1310 | "source": [ 1311 | "train_sel, test_sel = lgb_transform(train, test, model, 20)\n", 1312 | "print('训练数据未特征筛选维度', train.shape)\n", 1313 | "print('训练数据特征筛选维度后', train_sel.shape)" 1314 | ] 1315 | }, 1316 | { 1317 | "cell_type": "markdown", 1318 | "metadata": {}, 1319 | "source": [ 1320 | "### lgb特征重要性" 1321 | ] 1322 | }, 1323 | { 1324 | "cell_type": "code", 1325 | "execution_count": 28, 1326 | "metadata": {}, 1327 | "outputs": [ 1328 | { 1329 | "data": { 1330 | "text/plain": [ 1331 | "array([2, 3, 0, 0, 0, 1, 1, 0, 1, 0])" 1332 | ] 1333 | }, 1334 | "execution_count": 28, 1335 | "metadata": {}, 1336 | "output_type": "execute_result" 1337 | } 1338 | ], 1339 | "source": [ 1340 | "model.feature_importance()[:10]" 1341 | ] 1342 | }, 1343 | { 1344 | "cell_type": "code", 1345 | "execution_count": 29, 1346 | "metadata": {}, 1347 | "outputs": [], 1348 | "source": [ 1349 | "#sorted(model.feature_importance(),reverse=True)[:10]" 1350 | ] 1351 | }, 1352 | { 1353 | "cell_type": "markdown", 1354 | "metadata": {}, 1355 | "source": [ 1356 | "### 特征选择前后区别" 1357 | ] 1358 | }, 1359 | { 1360 | "cell_type": "code", 1361 | "execution_count": 30, 1362 | "metadata": {}, 1363 | "outputs": [ 1364 | { 1365 | "name": "stdout", 1366 | "output_type": "stream", 1367 | "text": [ 1368 | "No Select Accuracy: 0.93 (+/- 0.00)\n", 1369 | "Features Select Accuracy: 0.93 (+/- 0.00)\n" 1370 | ] 1371 | } 1372 | ], 1373 | "source": [ 1374 | "feature_selection(train, train_sel, target)" 1375 | ] 1376 | }, 1377 | { 1378 | "cell_type": "code", 1379 | "execution_count": null, 1380 | "metadata": {}, 1381 | "outputs": [], 1382 | "source": [] 1383 | } 1384 | ], 1385 | "metadata": { 1386 | "kernelspec": { 1387 | "display_name": "Python 3", 1388 | "language": "python", 1389 | "name": "python3" 1390 | }, 1391 | "language_info": { 1392 | "codemirror_mode": { 1393 | "name": "ipython", 1394 | "version": 3 1395 | }, 1396 | "file_extension": ".py", 1397 | "mimetype": "text/x-python", 1398 | "name": "python", 1399 | "nbconvert_exporter": "python", 1400 | "pygments_lexer": "ipython3", 1401 | "version": "3.7.1" 1402 | }, 1403 | "latex_envs": { 1404 | "LaTeX_envs_menu_present": true, 1405 | "autoclose": false, 1406 | "autocomplete": true, 1407 | "bibliofile": "biblio.bib", 1408 | "cite_by": "apalike", 1409 | "current_citInitial": 1, 1410 | "eqLabelWithNumbers": true, 1411 | "eqNumInitial": 1, 1412 | "hotkeys": { 1413 | "equation": "Ctrl-E", 1414 | "itemize": "Ctrl-I" 1415 | }, 1416 | "labels_anchors": false, 1417 | "latex_user_defs": false, 1418 | "report_style_numbering": false, 1419 | "user_envs_cfg": false 1420 | } 1421 | }, 1422 | "nbformat": 4, 1423 | "nbformat_minor": 2 1424 | } 1425 | -------------------------------------------------------------------------------- /工业蒸汽/工业蒸汽 06 特征优化.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.7.0"},"colab":{"name":"工业蒸汽 06 特征优化.ipynb","provenance":[],"collapsed_sections":[]}},"cells":[{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"wo_kOHZTLhVZ","executionInfo":{"status":"ok","timestamp":1623400030401,"user_tz":-480,"elapsed":22582,"user":{"displayName":"陆轩韬","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GgCF1eSJvFwyQhlJNSqYJSHyWwDFm88zduORKCQ=s64","userId":"09875294307280000133"}},"outputId":"e5f54b16-a3a0-4165-f896-bdd2384af00c"},"source":["from google.colab import drive\n","drive.mount('/content/drive')"],"execution_count":1,"outputs":[{"output_type":"stream","text":["Mounted at /content/drive\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"G6wonEdoLjSN","executionInfo":{"status":"ok","timestamp":1623400032113,"user_tz":-480,"elapsed":1729,"user":{"displayName":"陆轩韬","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GgCF1eSJvFwyQhlJNSqYJSHyWwDFm88zduORKCQ=s64","userId":"09875294307280000133"}},"outputId":"6ffe00a1-5ad1-41e6-8b04-5594798cdb7d"},"source":["%cd /content/drive/MyDrive/Colab Notebooks/天池/工业蒸汽"],"execution_count":2,"outputs":[{"output_type":"stream","text":["/content/drive/MyDrive/Colab Notebooks/天池/工业蒸汽\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"q_gDa5rQLdXM"},"source":["## 特征优化\n","\n","### 导入数据"]},{"cell_type":"code","metadata":{"id":"qYQ3-RsyLdXW","executionInfo":{"status":"ok","timestamp":1623401062097,"user_tz":-480,"elapsed":556,"user":{"displayName":"陆轩韬","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GgCF1eSJvFwyQhlJNSqYJSHyWwDFm88zduORKCQ=s64","userId":"09875294307280000133"}}},"source":["import pandas as pd\n","\n","train_data_file = \"./zhengqi_train.txt\"\n","test_data_file = \"./zhengqi_test.txt\"\n","\n","train_data = pd.read_csv(train_data_file, sep='\\t', encoding='utf-8')\n","test_data = pd.read_csv(test_data_file, sep='\\t', encoding='utf-8')"],"execution_count":13,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"1d20kkI_LdXX"},"source":["### 定义特征构造方法,构造特征"]},{"cell_type":"code","metadata":{"id":"ayULiWwdLdXY","executionInfo":{"status":"ok","timestamp":1623401065107,"user_tz":-480,"elapsed":505,"user":{"displayName":"陆轩韬","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GgCF1eSJvFwyQhlJNSqYJSHyWwDFm88zduORKCQ=s64","userId":"09875294307280000133"}}},"source":["epsilon=1e-5\n","\n","#组交叉特征,可以自行定义,如增加: x*x/y, log(x)/y 等等\n","func_dict = {\n"," 'add': lambda x,y: x+y,\n"," 'mins': lambda x,y: x-y,\n"," 'div': lambda x,y: x/(y+epsilon),\n"," 'multi': lambda x,y: x*y\n"," }"],"execution_count":14,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"ysPQ-6EQLdXY"},"source":["### 定义特征构造的函数"]},{"cell_type":"code","metadata":{"id":"dPZ-p8xSLdXZ","executionInfo":{"status":"ok","timestamp":1623401066461,"user_tz":-480,"elapsed":3,"user":{"displayName":"陆轩韬","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GgCF1eSJvFwyQhlJNSqYJSHyWwDFm88zduORKCQ=s64","userId":"09875294307280000133"}}},"source":["def auto_features_make(train_data,test_data,func_dict,col_list):\n"," train_data, test_data = train_data.copy(), test_data.copy()\n"," for col_i in col_list:\n"," for col_j in col_list:\n"," for func_name, func in func_dict.items():\n"," for data in [train_data,test_data]:\n"," func_features = func(data[col_i],data[col_j])\n"," col_func_features = '-'.join([col_i,func_name,col_j])\n"," data[col_func_features] = func_features\n"," return train_data,test_data"],"execution_count":15,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"rd49LO4ZLdXZ"},"source":["### 对训练集和测试集数据进行特征构造"]},{"cell_type":"code","metadata":{"id":"Yw1NxFGPLdXa","executionInfo":{"status":"ok","timestamp":1623401082922,"user_tz":-480,"elapsed":14522,"user":{"displayName":"陆轩韬","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GgCF1eSJvFwyQhlJNSqYJSHyWwDFm88zduORKCQ=s64","userId":"09875294307280000133"}}},"source":["train_data2, test_data2 = auto_features_make(train_data,test_data,func_dict,col_list=test_data.columns)"],"execution_count":16,"outputs":[]},{"cell_type":"code","metadata":{"id":"5aDoibXvLdXa","executionInfo":{"status":"ok","timestamp":1623401101042,"user_tz":-480,"elapsed":9223,"user":{"displayName":"陆轩韬","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GgCF1eSJvFwyQhlJNSqYJSHyWwDFm88zduORKCQ=s64","userId":"09875294307280000133"}}},"source":["from sklearn.decomposition import PCA #主成分分析法\n","\n","#PCA方法降维\n","pca = PCA(n_components=500)\n","train_data2_pca = pca.fit_transform(train_data2.iloc[:,0:-1])\n","test_data2_pca = pca.transform(test_data2)\n","train_data2_pca = pd.DataFrame(train_data2_pca)\n","test_data2_pca = pd.DataFrame(test_data2_pca)\n","train_data2_pca['target'] = train_data2['target']"],"execution_count":17,"outputs":[]},{"cell_type":"code","metadata":{"id":"gWeLa6r8LdXb","executionInfo":{"status":"ok","timestamp":1623401103248,"user_tz":-480,"elapsed":496,"user":{"displayName":"陆轩韬","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GgCF1eSJvFwyQhlJNSqYJSHyWwDFm88zduORKCQ=s64","userId":"09875294307280000133"}}},"source":["X_train2 = train_data2[test_data2.columns].values\n","y_train = train_data2['target']"],"execution_count":18,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"kDqTOUhmLdXb"},"source":["### 使用lightgbm模型对新构造的特征进行模型训练和评估"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":833},"id":"wMfN5jqTLdXb","executionInfo":{"status":"error","timestamp":1623402038351,"user_tz":-480,"elapsed":6450,"user":{"displayName":"陆轩韬","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GgCF1eSJvFwyQhlJNSqYJSHyWwDFm88zduORKCQ=s64","userId":"09875294307280000133"}},"outputId":"f5f30543-7f11-4076-8560-e1ecf4ca9efa"},"source":["# ls_validation i\n","from sklearn.model_selection import KFold\n","from sklearn.metrics import mean_squared_error\n","import lightgbm as lgb\n","import numpy as np\n","\n","# 5折交叉验证\n","Folds=5\n","# kf = KFold(len(X_train2), n_splits=Folds, random_state=2019, shuffle=True)\n","kf = KFold(len(X_train2), random_state=2019, shuffle=True)\n","# 记录训练和预测MSE\n","MSE_DICT = {\n"," 'train_mse':[],\n"," 'test_mse':[]\n","}\n","\n","# 线下训练预测\n","for i, (train_index, test_index) in enumerate(kf.split(X_train2)):\n"," # lgb树模型\n"," lgb_reg = lgb.LGBMRegressor(\n"," learning_rate=0.01,\n"," max_depth=-1,\n"," n_estimators=5000,\n"," boosting_type='gbdt',\n"," random_state=2019,\n"," objective='regression',\n"," )\n"," \n"," # 切分训练集和预测集\n"," X_train_KFold, X_test_KFold = X_train2[train_index], X_train2[test_index]\n"," y_train_KFold, y_test_KFold = y_train[train_index], y_train[test_index]\n"," \n"," # 训练模型\n"," lgb_reg.fit(\n"," X=X_train_KFold,y=y_train_KFold,\n"," eval_set=[(X_train_KFold, y_train_KFold),(X_test_KFold, y_test_KFold)],\n"," eval_names=['Train','Test'],\n"," early_stopping_rounds=100,\n"," eval_metric='MSE',\n"," verbose=50\n"," )\n","\n"," # 训练集预测 测试集预测\n"," y_train_KFold_predict = lgb_reg.predict(X_train_KFold,num_iteration=lgb_reg.best_iteration_)\n"," y_test_KFold_predict = lgb_reg.predict(X_test_KFold,num_iteration=lgb_reg.best_iteration_) \n"," \n"," print('第{}折 训练和预测 训练MSE 预测MSE'.format(i))\n"," train_mse = mean_squared_error(y_train_KFold_predict, y_train_KFold)\n"," print('------\\n', '训练MSE\\n', train_mse, '\\n------')\n"," test_mse = mean_squared_error(y_test_KFold_predict, y_test_KFold)\n"," print('------\\n', '预测MSE\\n', test_mse, '\\n------\\n')\n"," \n"," MSE_DICT['train_mse'].append(train_mse)\n"," MSE_DICT['test_mse'].append(test_mse)\n","print('------\\n', '训练MSE\\n', MSE_DICT['train_mse'], '\\n', np.mean(MSE_DICT['train_mse']), '\\n------')\n","print('------\\n', '预测MSE\\n', MSE_DICT['test_mse'], '\\n', np.mean(MSE_DICT['test_mse']), '\\n------')"],"execution_count":20,"outputs":[{"output_type":"stream","text":["Training until validation scores don't improve for 100 rounds.\n","[50]\tTrain's l2: 0.418978\tTrain's l2: 0.418978\tTest's l2: 0.106374\tTest's l2: 0.106374\n","[100]\tTrain's l2: 0.203693\tTrain's l2: 0.203693\tTest's l2: 0.02276\tTest's l2: 0.02276\n","[150]\tTrain's l2: 0.114486\tTrain's l2: 0.114486\tTest's l2: 0.00527795\tTest's l2: 0.00527795\n","[200]\tTrain's l2: 0.0741934\tTrain's l2: 0.0741934\tTest's l2: 5.99266e-05\tTest's l2: 5.99266e-05\n","[250]\tTrain's l2: 0.0535396\tTrain's l2: 0.0535396\tTest's l2: 0.00036171\tTest's l2: 0.00036171\n","[300]\tTrain's l2: 0.041529\tTrain's l2: 0.041529\tTest's l2: 0.00267813\tTest's l2: 0.00267813\n","Early stopping, best iteration is:\n","[221]\tTrain's l2: 0.0640274\tTrain's l2: 0.0640274\tTest's l2: 6.95547e-08\tTest's l2: 6.95547e-08\n","第0折 训练和预测 训练MSE 预测MSE\n","------\n"," 训练MSE\n"," 0.0640273654375399 \n","------\n","------\n"," 预测MSE\n"," 6.954770450692039e-08 \n","------\n","\n","Training until validation scores don't improve for 100 rounds.\n","[50]\tTrain's l2: 0.419128\tTrain's l2: 0.419128\tTest's l2: 0.142103\tTest's l2: 0.142103\n","[100]\tTrain's l2: 0.203838\tTrain's l2: 0.203838\tTest's l2: 0.0997735\tTest's l2: 0.0997735\n","[150]\tTrain's l2: 0.114537\tTrain's l2: 0.114537\tTest's l2: 0.0765469\tTest's l2: 0.0765469\n","[200]\tTrain's l2: 0.0742645\tTrain's l2: 0.0742645\tTest's l2: 0.0612271\tTest's l2: 0.0612271\n","[250]\tTrain's l2: 0.0536164\tTrain's l2: 0.0536164\tTest's l2: 0.0471729\tTest's l2: 0.0471729\n","[300]\tTrain's l2: 0.0415117\tTrain's l2: 0.0415117\tTest's l2: 0.0427081\tTest's l2: 0.0427081\n"],"name":"stdout"},{"output_type":"error","ename":"KeyboardInterrupt","evalue":"ignored","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)","\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 38\u001b[0m \u001b[0mearly_stopping_rounds\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m100\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 39\u001b[0m \u001b[0meval_metric\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'MSE'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 40\u001b[0;31m \u001b[0mverbose\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m50\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 41\u001b[0m )\n\u001b[1;32m 42\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.7/dist-packages/lightgbm/sklearn.py\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, X, y, sample_weight, init_score, eval_set, eval_names, eval_sample_weight, eval_init_score, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks)\u001b[0m\n\u001b[1;32m 683\u001b[0m \u001b[0mverbose\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mverbose\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfeature_name\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfeature_name\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 684\u001b[0m \u001b[0mcategorical_feature\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcategorical_feature\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 685\u001b[0;31m callbacks=callbacks)\n\u001b[0m\u001b[1;32m 686\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 687\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.7/dist-packages/lightgbm/sklearn.py\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, X, y, sample_weight, init_score, group, eval_set, eval_names, eval_sample_weight, eval_class_weight, eval_init_score, eval_group, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks)\u001b[0m\n\u001b[1;32m 542\u001b[0m \u001b[0mverbose_eval\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mverbose\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfeature_name\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfeature_name\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 543\u001b[0m \u001b[0mcategorical_feature\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcategorical_feature\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 544\u001b[0;31m callbacks=callbacks)\n\u001b[0m\u001b[1;32m 545\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 546\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mevals_result\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.7/dist-packages/lightgbm/engine.py\u001b[0m in \u001b[0;36mtrain\u001b[0;34m(params, train_set, num_boost_round, valid_sets, valid_names, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, evals_result, verbose_eval, learning_rates, keep_training_booster, callbacks)\u001b[0m\n\u001b[1;32m 216\u001b[0m evaluation_result_list=None))\n\u001b[1;32m 217\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 218\u001b[0;31m \u001b[0mbooster\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mupdate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfobj\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfobj\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 219\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 220\u001b[0m \u001b[0mevaluation_result_list\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.7/dist-packages/lightgbm/basic.py\u001b[0m in \u001b[0;36mupdate\u001b[0;34m(self, train_set, fobj)\u001b[0m\n\u001b[1;32m 1800\u001b[0m _safe_call(_LIB.LGBM_BoosterUpdateOneIter(\n\u001b[1;32m 1801\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhandle\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1802\u001b[0;31m ctypes.byref(is_finished)))\n\u001b[0m\u001b[1;32m 1803\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__is_predicted_cur_iter\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;32mFalse\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0m_\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange_\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__num_dataset\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1804\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mis_finished\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalue\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;31mKeyboardInterrupt\u001b[0m: "]}]},{"cell_type":"code","metadata":{"id":"aB3dbDkaPV8W"},"source":[""],"execution_count":null,"outputs":[]}]} -------------------------------------------------------------------------------- /阿里云安全恶意程序检测/阿里云安全恶意程序检测-优化技巧与解决方案升级.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## 第六节:优化技巧与解决方案升级" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## 6.2 深度学习解决方案:TextCNN建模" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "### 6.2.2 数据读取" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "import pandas as pd\n", 31 | "import numpy as np\n", 32 | "import seaborn as sns\n", 33 | "import matplotlib.pyplot as plt\n", 34 | "\n", 35 | "import lightgbm as lgb\n", 36 | "from sklearn.model_selection import train_test_split\n", 37 | "from sklearn.preprocessing import OneHotEncoder\n", 38 | "\n", 39 | "from tqdm import tqdm_notebook\n", 40 | "from sklearn.preprocessing import LabelBinarizer,LabelEncoder\n", 41 | "\n", 42 | "import warnings\n", 43 | "warnings.filterwarnings('ignore')\n", 44 | "%matplotlib inline" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 4, 50 | "metadata": {}, 51 | "outputs": [], 52 | "source": [ 53 | "path = '../security_data/'\n", 54 | "train = pd.read_csv(path + 'security_train.csv')\n", 55 | "test = pd.read_csv(path + 'security_test.csv')" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 2, 61 | "metadata": {}, 62 | "outputs": [], 63 | "source": [ 64 | "import numpy as np\n", 65 | "import pandas as pd\n", 66 | "from tqdm import tqdm \n", 67 | "\n", 68 | "class _Data_Preprocess:\n", 69 | " def __init__(self):\n", 70 | " self.int8_max = np.iinfo(np.int8).max\n", 71 | " self.int8_min = np.iinfo(np.int8).min\n", 72 | "\n", 73 | " self.int16_max = np.iinfo(np.int16).max\n", 74 | " self.int16_min = np.iinfo(np.int16).min\n", 75 | "\n", 76 | " self.int32_max = np.iinfo(np.int32).max\n", 77 | " self.int32_min = np.iinfo(np.int32).min\n", 78 | "\n", 79 | " self.int64_max = np.iinfo(np.int64).max\n", 80 | " self.int64_min = np.iinfo(np.int64).min\n", 81 | "\n", 82 | " self.float16_max = np.finfo(np.float16).max\n", 83 | " self.float16_min = np.finfo(np.float16).min\n", 84 | "\n", 85 | " self.float32_max = np.finfo(np.float32).max\n", 86 | " self.float32_min = np.finfo(np.float32).min\n", 87 | "\n", 88 | " self.float64_max = np.finfo(np.float64).max\n", 89 | " self.float64_min = np.finfo(np.float64).min\n", 90 | "\n", 91 | " def _get_type(self, min_val, max_val, types):\n", 92 | " if types == 'int':\n", 93 | " if max_val <= self.int8_max and min_val >= self.int8_min:\n", 94 | " return np.int8\n", 95 | " elif max_val <= self.int16_max <= max_val and min_val >= self.int16_min:\n", 96 | " return np.int16\n", 97 | " elif max_val <= self.int32_max and min_val >= self.int32_min:\n", 98 | " return np.int32\n", 99 | " return None\n", 100 | "\n", 101 | " elif types == 'float':\n", 102 | " if max_val <= self.float16_max and min_val >= self.float16_min:\n", 103 | " return np.float16\n", 104 | " if max_val <= self.float32_max and min_val >= self.float32_min:\n", 105 | " return np.float32\n", 106 | " if max_val <= self.float64_max and min_val >= self.float64_min:\n", 107 | " return np.float64\n", 108 | " return None\n", 109 | "\n", 110 | " def _memory_process(self, df):\n", 111 | " init_memory = df.memory_usage().sum() / 1024 ** 2 / 1024\n", 112 | " print('Original data occupies {} GB memory.'.format(init_memory))\n", 113 | " df_cols = df.columns\n", 114 | "\n", 115 | " \n", 116 | " for col in tqdm_notebook(df_cols):\n", 117 | " try:\n", 118 | " if 'float' in str(df[col].dtypes):\n", 119 | " max_val = df[col].max()\n", 120 | " min_val = df[col].min()\n", 121 | " trans_types = self._get_type(min_val, max_val, 'float')\n", 122 | " if trans_types is not None:\n", 123 | " df[col] = df[col].astype(trans_types)\n", 124 | " elif 'int' in str(df[col].dtypes):\n", 125 | " max_val = df[col].max()\n", 126 | " min_val = df[col].min()\n", 127 | " trans_types = self._get_type(min_val, max_val, 'int')\n", 128 | " if trans_types is not None:\n", 129 | " df[col] = df[col].astype(trans_types)\n", 130 | " except:\n", 131 | " print(' Can not do any process for column, {}.'.format(col)) \n", 132 | " afterprocess_memory = df.memory_usage().sum() / 1024 ** 2 / 1024\n", 133 | " print('After processing, the data occupies {} GB memory.'.format(afterprocess_memory))\n", 134 | " return df\n", 135 | "\n", 136 | "memory_process = _Data_Preprocess()" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 5, 142 | "metadata": { 143 | "scrolled": true 144 | }, 145 | "outputs": [ 146 | { 147 | "data": { 148 | "text/html": [ 149 | "
\n", 150 | "\n", 163 | "\n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | "
file_idlabelapitidindex
015LdrLoadDll24880
115LdrGetProcedureAddress24881
215LdrGetProcedureAddress24882
315LdrGetProcedureAddress24883
415LdrGetProcedureAddress24884
\n", 217 | "
" 218 | ], 219 | "text/plain": [ 220 | " file_id label api tid index\n", 221 | "0 1 5 LdrLoadDll 2488 0\n", 222 | "1 1 5 LdrGetProcedureAddress 2488 1\n", 223 | "2 1 5 LdrGetProcedureAddress 2488 2\n", 224 | "3 1 5 LdrGetProcedureAddress 2488 3\n", 225 | "4 1 5 LdrGetProcedureAddress 2488 4" 226 | ] 227 | }, 228 | "execution_count": 5, 229 | "metadata": {}, 230 | "output_type": "execute_result" 231 | } 232 | ], 233 | "source": [ 234 | "train.head()" 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | "### 6.2.3 数据预处理" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": 6, 247 | "metadata": {}, 248 | "outputs": [], 249 | "source": [ 250 | "# (字符串转化为数字)\n", 251 | "unique_api = train['api'].unique()" 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": 8, 257 | "metadata": {}, 258 | "outputs": [], 259 | "source": [ 260 | "api2index = {item:(i+1) for i,item in enumerate(unique_api)}\n", 261 | "index2api = {(i+1):item for i,item in enumerate(unique_api)}" 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": 9, 267 | "metadata": {}, 268 | "outputs": [], 269 | "source": [ 270 | "train['api_idx'] = train['api'].map(api2index)\n", 271 | "test['api_idx'] = test['api'].map(api2index)" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": 10, 277 | "metadata": {}, 278 | "outputs": [], 279 | "source": [ 280 | "# 获取每个文件对应的字符串序列\n", 281 | "def get_sequence(df,period_idx):\n", 282 | " seq_list = []\n", 283 | " for _id,begin in enumerate(period_idx[:-1]):\n", 284 | " seq_list.append(df.iloc[begin:period_idx[_id+1]]['api_idx'].values)\n", 285 | " seq_list.append(df.iloc[period_idx[-1]:]['api_idx'].values)\n", 286 | " return seq_list" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": 11, 292 | "metadata": {}, 293 | "outputs": [], 294 | "source": [ 295 | "train_period_idx = train.file_id.drop_duplicates(keep='first').index.values\n", 296 | "test_period_idx = test.file_id.drop_duplicates(keep='first').index.values" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": 13, 302 | "metadata": {}, 303 | "outputs": [], 304 | "source": [ 305 | "train_df = train[['file_id','label']].drop_duplicates(keep='first')\n", 306 | "test_df = test[['file_id']].drop_duplicates(keep='first')" 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": 14, 312 | "metadata": {}, 313 | "outputs": [], 314 | "source": [ 315 | "train_df['seq'] = get_sequence(train,train_period_idx)\n", 316 | "test_df['seq'] = get_sequence(test,test_period_idx)" 317 | ] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "metadata": {}, 322 | "source": [ 323 | "### 6.2.4 TextCNN网络结构" 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": 16, 329 | "metadata": {}, 330 | "outputs": [ 331 | { 332 | "name": "stderr", 333 | "output_type": "stream", 334 | "text": [ 335 | "Using TensorFlow backend.\n" 336 | ] 337 | } 338 | ], 339 | "source": [ 340 | "from keras.preprocessing.text import Tokenizer\n", 341 | "from keras.preprocessing.sequence import pad_sequences\n", 342 | "from keras.layers import Dense, Input, LSTM, Lambda, Embedding, Dropout, Activation,GRU,Bidirectional\n", 343 | "from keras.layers import Conv1D,Conv2D,MaxPooling2D,GlobalAveragePooling1D,GlobalMaxPooling1D, MaxPooling1D, Flatten\n", 344 | "from keras.layers import CuDNNGRU, CuDNNLSTM, SpatialDropout1D\n", 345 | "from keras.layers.merge import concatenate, Concatenate, Average, Dot, Maximum, Multiply, Subtract, average\n", 346 | "from keras.models import Model\n", 347 | "from keras.optimizers import RMSprop,Adam\n", 348 | "from keras.layers.normalization import BatchNormalization\n", 349 | "from keras.callbacks import EarlyStopping, ModelCheckpoint\n", 350 | "from keras.optimizers import SGD\n", 351 | "from keras import backend as K\n", 352 | "from sklearn.decomposition import TruncatedSVD, NMF, LatentDirichletAllocation\n", 353 | "from keras.layers import SpatialDropout1D\n", 354 | "from keras.layers.wrappers import Bidirectional" 355 | ] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "execution_count": 17, 360 | "metadata": {}, 361 | "outputs": [], 362 | "source": [ 363 | "def TextCNN(max_len,max_cnt,embed_size, num_filters,kernel_size,conv_action, mask_zero):\n", 364 | " _input = Input(shape=(max_len,), dtype='int32')\n", 365 | " _embed = Embedding(max_cnt, embed_size, input_length=max_len, mask_zero=mask_zero)(_input)\n", 366 | " _embed = SpatialDropout1D(0.15)(_embed)\n", 367 | " warppers = []\n", 368 | " \n", 369 | " for _kernel_size in kernel_size:\n", 370 | " conv1d = Conv1D(filters=num_filters, kernel_size=_kernel_size, activation=conv_action)(_embed)\n", 371 | " warppers.append(GlobalMaxPooling1D()(conv1d))\n", 372 | " \n", 373 | " fc = concatenate(warppers)\n", 374 | " fc = Dropout(0.5)(fc)\n", 375 | " #fc = BatchNormalization()(fc)\n", 376 | " fc = Dense(256, activation='relu')(fc)\n", 377 | " fc = Dropout(0.25)(fc)\n", 378 | " #fc = BatchNormalization()(fc) \n", 379 | " preds = Dense(8, activation = 'softmax')(fc)\n", 380 | " \n", 381 | " model = Model(inputs=_input, outputs=preds)\n", 382 | " \n", 383 | " model.compile(loss='categorical_crossentropy',\n", 384 | " optimizer='adam',\n", 385 | " metrics=['accuracy'])\n", 386 | " return model" 387 | ] 388 | }, 389 | { 390 | "cell_type": "code", 391 | "execution_count": 18, 392 | "metadata": {}, 393 | "outputs": [], 394 | "source": [ 395 | "train_labels = pd.get_dummies(train_df.label).values\n", 396 | "train_seq = pad_sequences(train_df.seq.values, maxlen = 6000)\n", 397 | "test_seq = pad_sequences(test_df.seq.values, maxlen = 6000)" 398 | ] 399 | }, 400 | { 401 | "cell_type": "markdown", 402 | "metadata": {}, 403 | "source": [ 404 | "### 6.2.5 TextCNN训练和预测" 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": 19, 410 | "metadata": {}, 411 | "outputs": [], 412 | "source": [ 413 | "from sklearn.model_selection import StratifiedKFold,KFold \n", 414 | "skf = KFold(n_splits=5, shuffle=True)" 415 | ] 416 | }, 417 | { 418 | "cell_type": "code", 419 | "execution_count": 20, 420 | "metadata": {}, 421 | "outputs": [], 422 | "source": [ 423 | "max_len = 6000\n", 424 | "max_cnt = 295\n", 425 | "embed_size = 256\n", 426 | "num_filters = 64\n", 427 | "kernel_size = [2,4,6,8,10,12,14]\n", 428 | "conv_action = 'relu'\n", 429 | "mask_zero = False\n", 430 | "TRAIN = True" 431 | ] 432 | }, 433 | { 434 | "cell_type": "code", 435 | "execution_count": 21, 436 | "metadata": { 437 | "scrolled": true 438 | }, 439 | "outputs": [ 440 | { 441 | "name": "stdout", 442 | "output_type": "stream", 443 | "text": [ 444 | "FOLD: \n", 445 | "2778 11109\n", 446 | "Train on 11109 samples, validate on 2778 samples\n", 447 | "Epoch 1/100\n", 448 | "11109/11109 [==============================] - 75s 7ms/step - loss: 0.8165 - acc: 0.7370 - val_loss: 0.4825 - val_acc: 0.8485\n", 449 | "Epoch 2/100\n", 450 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.4772 - acc: 0.8499 - val_loss: 0.4141 - val_acc: 0.8625\n", 451 | "Epoch 3/100\n", 452 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.4172 - acc: 0.8673 - val_loss: 0.3785 - val_acc: 0.8780\n", 453 | "Epoch 4/100\n", 454 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.3768 - acc: 0.8769 - val_loss: 0.3821 - val_acc: 0.8726\n", 455 | "Epoch 5/100\n", 456 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.3568 - acc: 0.8831 - val_loss: 0.3932 - val_acc: 0.8783\n", 457 | "Epoch 6/100\n", 458 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.3388 - acc: 0.8893 - val_loss: 0.3566 - val_acc: 0.8902\n", 459 | "Epoch 7/100\n", 460 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.3179 - acc: 0.8968 - val_loss: 0.3553 - val_acc: 0.8902\n", 461 | "Epoch 8/100\n", 462 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.3050 - acc: 0.8991 - val_loss: 0.3590 - val_acc: 0.8870\n", 463 | "Epoch 9/100\n", 464 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.2913 - acc: 0.9006 - val_loss: 0.3593 - val_acc: 0.8909\n", 465 | "Epoch 10/100\n", 466 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.2812 - acc: 0.9047 - val_loss: 0.3528 - val_acc: 0.8906\n", 467 | "Epoch 11/100\n", 468 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.2633 - acc: 0.9054 - val_loss: 0.3608 - val_acc: 0.8823\n", 469 | "Epoch 12/100\n", 470 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.2665 - acc: 0.9103 - val_loss: 0.3589 - val_acc: 0.8859\n", 471 | "Epoch 13/100\n", 472 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.2495 - acc: 0.9097 - val_loss: 0.3570 - val_acc: 0.8909\n", 473 | "2778/2778 [==============================] - 4s 1ms/step\n", 474 | "12955/12955 [==============================] - 13s 980us/step\n", 475 | "FOLD: \n", 476 | "2778 11109\n", 477 | "Train on 11109 samples, validate on 2778 samples\n", 478 | "Epoch 1/100\n", 479 | "11109/11109 [==============================] - 65s 6ms/step - loss: 0.8297 - acc: 0.7290 - val_loss: 0.4925 - val_acc: 0.8463\n", 480 | "Epoch 2/100\n", 481 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.4808 - acc: 0.8442 - val_loss: 0.4115 - val_acc: 0.8690\n", 482 | "Epoch 3/100\n", 483 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.4149 - acc: 0.8643 - val_loss: 0.4037 - val_acc: 0.8715\n", 484 | "Epoch 4/100\n", 485 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.3799 - acc: 0.8774 - val_loss: 0.3798 - val_acc: 0.8841\n", 486 | "Epoch 5/100\n", 487 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.3530 - acc: 0.8850 - val_loss: 0.3773 - val_acc: 0.8870\n", 488 | "Epoch 6/100\n", 489 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.3291 - acc: 0.8924 - val_loss: 0.3676 - val_acc: 0.8855\n", 490 | "Epoch 7/100\n", 491 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.3115 - acc: 0.8959 - val_loss: 0.3773 - val_acc: 0.8888\n", 492 | "Epoch 8/100\n", 493 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.3032 - acc: 0.8998 - val_loss: 0.3518 - val_acc: 0.8891\n", 494 | "Epoch 9/100\n", 495 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.2892 - acc: 0.9027 - val_loss: 0.3655 - val_acc: 0.8920\n", 496 | "Epoch 10/100\n", 497 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.2792 - acc: 0.9056 - val_loss: 0.3615 - val_acc: 0.8906\n", 498 | "Epoch 11/100\n", 499 | "11109/11109 [==============================] - 64s 6ms/step - loss: 0.2736 - acc: 0.9085 - val_loss: 0.3719 - val_acc: 0.8924\n", 500 | "2778/2778 [==============================] - 3s 1ms/step\n", 501 | "12955/12955 [==============================] - 12s 951us/step\n", 502 | "FOLD: \n", 503 | "2777 11110\n", 504 | "Train on 11110 samples, validate on 2777 samples\n", 505 | "Epoch 1/100\n", 506 | "11110/11110 [==============================] - 67s 6ms/step - loss: 0.8388 - acc: 0.7239 - val_loss: 0.4323 - val_acc: 0.8657\n", 507 | "Epoch 2/100\n", 508 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.4969 - acc: 0.8418 - val_loss: 0.3881 - val_acc: 0.8783\n", 509 | "Epoch 3/100\n", 510 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.4276 - acc: 0.8631 - val_loss: 0.3587 - val_acc: 0.8855\n", 511 | "Epoch 4/100\n", 512 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.3882 - acc: 0.8770 - val_loss: 0.3542 - val_acc: 0.8898\n", 513 | "Epoch 5/100\n", 514 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.3590 - acc: 0.8875 - val_loss: 0.3640 - val_acc: 0.8920\n", 515 | "Epoch 6/100\n", 516 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.3437 - acc: 0.8851 - val_loss: 0.3445 - val_acc: 0.8992\n", 517 | "Epoch 7/100\n", 518 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.3235 - acc: 0.8912 - val_loss: 0.3552 - val_acc: 0.8923\n", 519 | "Epoch 8/100\n", 520 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.3079 - acc: 0.8986 - val_loss: 0.3491 - val_acc: 0.8927\n", 521 | "Epoch 9/100\n", 522 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.2937 - acc: 0.9035 - val_loss: 0.3370 - val_acc: 0.9003\n", 523 | "Epoch 10/100\n", 524 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.2897 - acc: 0.9018 - val_loss: 0.3523 - val_acc: 0.8959\n", 525 | "Epoch 11/100\n", 526 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.2788 - acc: 0.9066 - val_loss: 0.3519 - val_acc: 0.8967\n", 527 | "Epoch 12/100\n", 528 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.2653 - acc: 0.9131 - val_loss: 0.3571 - val_acc: 0.8952\n", 529 | "2777/2777 [==============================] - 3s 1ms/step\n", 530 | "12955/12955 [==============================] - 12s 955us/step\n", 531 | "FOLD: \n", 532 | "2777 11110\n", 533 | "Train on 11110 samples, validate on 2777 samples\n", 534 | "Epoch 1/100\n", 535 | "11110/11110 [==============================] - 66s 6ms/step - loss: 0.8286 - acc: 0.7326 - val_loss: 0.4647 - val_acc: 0.8596\n", 536 | "Epoch 2/100\n", 537 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.4807 - acc: 0.8524 - val_loss: 0.4081 - val_acc: 0.8704\n", 538 | "Epoch 3/100\n", 539 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.4233 - acc: 0.8656 - val_loss: 0.3920 - val_acc: 0.8808\n", 540 | "Epoch 4/100\n", 541 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.3771 - acc: 0.8819 - val_loss: 0.3767 - val_acc: 0.8804\n", 542 | "Epoch 5/100\n", 543 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.3529 - acc: 0.8861 - val_loss: 0.3990 - val_acc: 0.8725\n", 544 | "Epoch 6/100\n", 545 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.3353 - acc: 0.8900 - val_loss: 0.3776 - val_acc: 0.8822\n", 546 | "Epoch 7/100\n", 547 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.3234 - acc: 0.8935 - val_loss: 0.3717 - val_acc: 0.8855\n", 548 | "Epoch 8/100\n", 549 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.3012 - acc: 0.9010 - val_loss: 0.3758 - val_acc: 0.8848\n", 550 | "Epoch 9/100\n", 551 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.2907 - acc: 0.9011 - val_loss: 0.3656 - val_acc: 0.8862\n", 552 | "Epoch 10/100\n", 553 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.2852 - acc: 0.9022 - val_loss: 0.3676 - val_acc: 0.8858\n", 554 | "Epoch 11/100\n", 555 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.2683 - acc: 0.9085 - val_loss: 0.3630 - val_acc: 0.8862\n", 556 | "Epoch 12/100\n", 557 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.2595 - acc: 0.9091 - val_loss: 0.3768 - val_acc: 0.8884\n", 558 | "Epoch 13/100\n", 559 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.2533 - acc: 0.9140 - val_loss: 0.3817 - val_acc: 0.8822\n", 560 | "Epoch 14/100\n", 561 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.2464 - acc: 0.9155 - val_loss: 0.3757 - val_acc: 0.8862\n", 562 | "2777/2777 [==============================] - 3s 1ms/step\n", 563 | "12955/12955 [==============================] - 12s 949us/step\n", 564 | "FOLD: \n", 565 | "2777 11110\n", 566 | "Train on 11110 samples, validate on 2777 samples\n", 567 | "Epoch 1/100\n", 568 | "11110/11110 [==============================] - 65s 6ms/step - loss: 0.8168 - acc: 0.7315 - val_loss: 0.4718 - val_acc: 0.8567\n", 569 | "Epoch 2/100\n", 570 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.4880 - acc: 0.8459 - val_loss: 0.4047 - val_acc: 0.8711\n", 571 | "Epoch 3/100\n", 572 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.4224 - acc: 0.8674 - val_loss: 0.3871 - val_acc: 0.8732\n", 573 | "Epoch 4/100\n", 574 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.3900 - acc: 0.8728 - val_loss: 0.3676 - val_acc: 0.8812\n", 575 | "Epoch 5/100\n", 576 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.3581 - acc: 0.8846 - val_loss: 0.3713 - val_acc: 0.8819\n", 577 | "Epoch 6/100\n", 578 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.3391 - acc: 0.8890 - val_loss: 0.3542 - val_acc: 0.8905\n", 579 | "Epoch 7/100\n", 580 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.3158 - acc: 0.8975 - val_loss: 0.3610 - val_acc: 0.8902\n", 581 | "Epoch 8/100\n", 582 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.3074 - acc: 0.8986 - val_loss: 0.3520 - val_acc: 0.8887\n", 583 | "Epoch 9/100\n", 584 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.2905 - acc: 0.9026 - val_loss: 0.3588 - val_acc: 0.8941\n", 585 | "Epoch 10/100\n", 586 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.2795 - acc: 0.9032 - val_loss: 0.3417 - val_acc: 0.8923\n", 587 | "Epoch 11/100\n", 588 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.2747 - acc: 0.9044 - val_loss: 0.3456 - val_acc: 0.8912\n", 589 | "Epoch 12/100\n", 590 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.2546 - acc: 0.9131 - val_loss: 0.3517 - val_acc: 0.8902\n", 591 | "Epoch 13/100\n", 592 | "11110/11110 [==============================] - 64s 6ms/step - loss: 0.2483 - acc: 0.9144 - val_loss: 0.3785 - val_acc: 0.8909\n", 593 | "2777/2777 [==============================] - 3s 1ms/step\n", 594 | "12955/12955 [==============================] - 12s 949us/step\n" 595 | ] 596 | } 597 | ], 598 | "source": [ 599 | "import os\n", 600 | "os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0,1\"\n", 601 | "meta_train = np.zeros(shape = (len(train_seq),8))\n", 602 | "meta_test = np.zeros(shape = (len(test_seq),8))\n", 603 | "FLAG = True\n", 604 | "i = 0\n", 605 | "for tr_ind,te_ind in skf.split(train_labels):\n", 606 | " i +=1\n", 607 | " print('FOLD: '.format(i))\n", 608 | " print(len(te_ind),len(tr_ind)) \n", 609 | " model_name = 'benchmark_textcnn_fold_'+str(i)\n", 610 | " X_train,X_train_label = train_seq[tr_ind],train_labels[tr_ind]\n", 611 | " X_val,X_val_label = train_seq[te_ind],train_labels[te_ind]\n", 612 | " \n", 613 | " model = TextCNN(max_len,max_cnt,embed_size,num_filters,kernel_size,conv_action,mask_zero)\n", 614 | " \n", 615 | " model_save_path = './NN/%s_%s.hdf5'%(model_name,embed_size)\n", 616 | " early_stopping =EarlyStopping(monitor='val_loss', patience=3)\n", 617 | " model_checkpoint = ModelCheckpoint(model_save_path, save_best_only=True, save_weights_only=True)\n", 618 | " if TRAIN and FLAG:\n", 619 | " model.fit(X_train,X_train_label,validation_data=(X_val,X_val_label),epochs=100,batch_size=64,shuffle=True,callbacks=[early_stopping,model_checkpoint] )\n", 620 | " \n", 621 | " model.load_weights(model_save_path)\n", 622 | " pred_val = model.predict(X_val,batch_size=128,verbose=1)\n", 623 | " pred_test = model.predict(test_seq,batch_size=128,verbose=1)\n", 624 | " \n", 625 | " meta_train[te_ind] = pred_val\n", 626 | " meta_test += pred_test\n", 627 | " K.clear_session()\n", 628 | "meta_test /= 5.0 " 629 | ] 630 | }, 631 | { 632 | "cell_type": "markdown", 633 | "metadata": {}, 634 | "source": [ 635 | "### 6.2.6 结果提交" 636 | ] 637 | }, 638 | { 639 | "cell_type": "code", 640 | "execution_count": 22, 641 | "metadata": {}, 642 | "outputs": [], 643 | "source": [ 644 | "test_df['prob0'] = 0\n", 645 | "test_df['prob1'] = 0\n", 646 | "test_df['prob2'] = 0\n", 647 | "test_df['prob3'] = 0\n", 648 | "test_df['prob4'] = 0\n", 649 | "test_df['prob5'] = 0\n", 650 | "test_df['prob6'] = 0\n", 651 | "test_df['prob7'] = 0\n", 652 | "\n", 653 | "test_df[['prob0','prob1','prob2','prob3','prob4','prob5','prob6','prob7']] = meta_test\n", 654 | "test_df[['file_id','prob0','prob1','prob2','prob3','prob4','prob5','prob6','prob7']].to_csv('nn_baseline_5fold.csv',index = None)" 655 | ] 656 | }, 657 | { 658 | "cell_type": "code", 659 | "execution_count": null, 660 | "metadata": {}, 661 | "outputs": [], 662 | "source": [] 663 | } 664 | ], 665 | "metadata": { 666 | "kernelspec": { 667 | "display_name": "Python 3", 668 | "language": "python", 669 | "name": "python3" 670 | }, 671 | "language_info": { 672 | "codemirror_mode": { 673 | "name": "ipython", 674 | "version": 3 675 | }, 676 | "file_extension": ".py", 677 | "mimetype": "text/x-python", 678 | "name": "python", 679 | "nbconvert_exporter": "python", 680 | "pygments_lexer": "ipython3", 681 | "version": "3.7.1" 682 | }, 683 | "toc": { 684 | "nav_menu": {}, 685 | "number_sections": true, 686 | "sideBar": true, 687 | "skip_h1_title": false, 688 | "title_cell": "Table of Contents", 689 | "title_sidebar": "Contents", 690 | "toc_cell": true, 691 | "toc_position": {}, 692 | "toc_section_display": true, 693 | "toc_window_display": true 694 | } 695 | }, 696 | "nbformat": 4, 697 | "nbformat_minor": 2 698 | } 699 | -------------------------------------------------------------------------------- /阿里云安全恶意程序检测/阿里云安全恶意程序检测-数据探索.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 第二节:数据探索" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 4, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "data": { 17 | "text/html": [ 18 | "\n", 43 | "
!!以上是作者为了排版而修改的排版效果,请注意是否需要使用!!
\n" 44 | ], 45 | "text/plain": [ 46 | "" 47 | ] 48 | }, 49 | "metadata": {}, 50 | "output_type": "display_data" 51 | } 52 | ], 53 | "source": [ 54 | "%%html\n", 55 | "\n", 80 | "
!!以上是作者为了排版而修改的排版效果,请注意是否需要使用!!
" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "## 2.1 训练集数据探索" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "### 2.1.1 数据特征类型" 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "execution_count": 7, 100 | "metadata": {}, 101 | "outputs": [], 102 | "source": [ 103 | "# 导入相关应用包\n", 104 | "import pandas as pd\n", 105 | "import numpy as np\n", 106 | "import seaborn as sns\n", 107 | "import matplotlib.pyplot as plt\n", 108 | "\n", 109 | "# 忽略警告信息\n", 110 | "import warnings\n", 111 | "warnings.filterwarnings(\"ignore\")\n", 112 | "\n", 113 | "%matplotlib inline\n", 114 | "\n", 115 | "# 读取数据\n", 116 | "path = './dataset/'\n", 117 | "train = pd.read_csv(path + 'security_train.csv') # 训练集\n", 118 | "test = pd.read_csv(path + 'security_test.csv') # 测试集" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 8, 124 | "metadata": {}, 125 | "outputs": [ 126 | { 127 | "data": { 128 | "text/html": [ 129 | "
\n", 130 | "\n", 143 | "\n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | "
file_idlabelapitidindex
015LdrLoadDll24880.0
115LdrGetProcedureAddress24881.0
215LdrGetProcedureAddress24882.0
315LdrGetProcedureAddress24883.0
415LdrGetProcedureAddress24884.0
\n", 197 | "
" 198 | ], 199 | "text/plain": [ 200 | " file_id label api tid index\n", 201 | "0 1 5 LdrLoadDll 2488 0.0\n", 202 | "1 1 5 LdrGetProcedureAddress 2488 1.0\n", 203 | "2 1 5 LdrGetProcedureAddress 2488 2.0\n", 204 | "3 1 5 LdrGetProcedureAddress 2488 3.0\n", 205 | "4 1 5 LdrGetProcedureAddress 2488 4.0" 206 | ] 207 | }, 208 | "execution_count": 8, 209 | "metadata": {}, 210 | "output_type": "execute_result" 211 | } 212 | ], 213 | "source": [ 214 | "train.head()" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": 9, 220 | "metadata": {}, 221 | "outputs": [ 222 | { 223 | "data": { 224 | "text/html": [ 225 | "
\n", 226 | "\n", 239 | "\n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | "
file_idlabeltidindex
count35952.00000035952.00000035952.00000035951.000000
mean5.1420510.9891522494.9645642153.216267
std2.5473821.957361129.9799381537.349809
min1.0000000.000000282.0000000.000000
25%4.0000000.0000002456.000000722.000000
50%5.0000000.0000002500.0000002004.000000
75%7.0000000.0000002596.0000003502.000000
max9.0000005.0000002980.0000005000.000000
\n", 308 | "
" 309 | ], 310 | "text/plain": [ 311 | " file_id label tid index\n", 312 | "count 35952.000000 35952.000000 35952.000000 35951.000000\n", 313 | "mean 5.142051 0.989152 2494.964564 2153.216267\n", 314 | "std 2.547382 1.957361 129.979938 1537.349809\n", 315 | "min 1.000000 0.000000 282.000000 0.000000\n", 316 | "25% 4.000000 0.000000 2456.000000 722.000000\n", 317 | "50% 5.000000 0.000000 2500.000000 2004.000000\n", 318 | "75% 7.000000 0.000000 2596.000000 3502.000000\n", 319 | "max 9.000000 5.000000 2980.000000 5000.000000" 320 | ] 321 | }, 322 | "execution_count": 9, 323 | "metadata": {}, 324 | "output_type": "execute_result" 325 | } 326 | ], 327 | "source": [ 328 | "train.describe()" 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": {}, 334 | "source": [ 335 | "### 2.1.2 数据分布探索" 336 | ] 337 | }, 338 | { 339 | "cell_type": "code", 340 | "execution_count": 10, 341 | "metadata": {}, 342 | "outputs": [ 343 | { 344 | "data": { 345 | "text/plain": [ 346 | "" 347 | ] 348 | }, 349 | "execution_count": 10, 350 | "metadata": {}, 351 | "output_type": "execute_result" 352 | }, 353 | { 354 | "data": { 355 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAEGCAYAAABbzE8LAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAANb0lEQVR4nO3df2yc913A8fenMRtRTWnXlJJ5FbfVTKxrRmnMtD8YKkVlIUGqBAhpm2hgqtA6LWkifmij0eKCkTYGoiVCm0phNFCxwTZUqLqsQSJDQmqnc9Um+5HRa+epy8LoUqLVbdIpyZc/7nFzcc9nO/Xd5/Hl/ZJOuzz35Pr9+O55+/Hj2YlSCpKkwbsoewGSdKEywJKUxABLUhIDLElJDLAkJRlZzs7r1q0rjUajT0uRpOE0PT39vVLKFfO3LyvAjUaDZrO5cquSpAtARHyr23YvQUhSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCVZ1r8JJw3KrbfeyvHjxxkbG8teysvGx8fZtm1b9jI0RAywauno0aPMvvAi//NSPd6ia158LnsJGkL1eHdL3awZ4cRPbc5eBQBrDz+UvQQNIa8BS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDrIHYs2cPe/bsyV7GBcWPef2NZC9AF4ZWq5W9hAuOH/P68wxYkpIYYElKYoAlKYkBlqQkBliSkhhgSUpigCUpiQGWpCQGWJKSGGBJSmKAJSmJAZakJAZYkpIYYElKYoAlKYkBlqQkBliSkhhgSUpigCUpiQGWpCQGWJKSGGBJSmKAJSmJAZakJAZYkpIYYElKYoAlKYkBlqQkBliSkhhgSUpigCUpiQGWpCQGWJKSGGBJSmKAJSmJAZakJAZYkpIYYElKMpAA33DDDS/f6mSl1rXQ8zSbTW688Uamp6dX9L8nDYNWq8WWLVt44IEHzjlO5j/earUAOHbsGNu3b6fVarF9+3aOHTvWl3V1HqfzbyvNM+A+mpyc5MyZM+zevTt7KVLtTE1N8cILL3DXXXd1PU7mHp+amgLgvvvu49ChQ0xNTXHo0CH27t2bsewV1fcAz/+sUZezv5Va10LP02w2mZ2dBWB2dra2HwcpQ6vVYmZmBoBSCtA+TubOgjsfn5mZodlssm/fPkopzMzMUEph3759K34WvNhxudLH7ciKPpteNjk5mb2EWjly5AgnTpzg9ttvX9L+J06cgNLnRS3DRSe/T6v1/JLXXwetVou1a9dmL6OrubPa+Xbv3s2DDz74isfnvprsdPr0afbu3cvOnTv7ts5+W/QMOCJ+JyKaEdF89tlnB7GmoTB39ivplebObuebO27mPz47O8upU6fO2Xbq1Cn279/fj+UNzKJnwKWUe4B7ACYmJmp0TlJvo6OjRrjD2NgYAHffffeS9t+yZQuzJ3/QzyUty5kfvoTxN1255PXXQZ3P1huNRtcIj46Odn18dHSUkydPnhPhkZERbrrppn4vta/8JlyfeAlCWtiuXbu6br/zzju7Pj45OclFF52bqzVr1nDLLbf0Z4ED0vcAHzhwoOefs6zUuhZ6nomJiZc/m4+Ojtb24yBlGB8fp9FoABARQPs42bhx4ysebzQaTExMsGnTJiKCRqNBRLBp0yYuv/zyFV3XYsflSh+3ngH30dxn7bnP6pLO2rVrFxdffDE7duzoepzMPT53Nrx161Y2bNjArl272LBhw6o/+wWIuf8LyFJMTEyUZrPZx+VoWM1dj1zuNeDZ63+zn8tasrWHH2LjKr0GvJrWPKwiYrqUMjF/u2fAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSkpHsBejCMD4+nr2EC44f8/ozwBqIbdu2ZS/hguPHvP68BCFJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUpKR7AVICzp9irWHH8peBQBrXnwOuDJ7GRoyBli1tH79eo4fP87YWF2idyXj4+PZi9CQMcCqpXvvvTd7CVLfeQ1YkpIYYElKYoAlKYkBlqQkBliSkhhgSUpigCUpiQGWpCQGWJKSGGBJSmKAJSmJAZakJAZYkpIYYElKYoAlKYkBlqQkBliSkhhgSUpigCUpiQGWpCRRSln6zhHPAt9a4u7rgO+dz6JqZljmgOGZxTnqZ1hm6dccP1FKuWL+xmUFeDkiollKmejLkw/QsMwBwzOLc9TPsMwy6Dm8BCFJSQywJCXpZ4Dv6eNzD9KwzAHDM4tz1M+wzDLQOfp2DViS1JuXICQpiQGWpCRLDnBEXBUR/xERX4uIr0bE7dX2j0fE4Yg4GBH/EhGXdvydD0dEKyK+ERHv6ti+qdrWiogPrehE5z/HH1czPB4RD0fE66vtERF/Wa31YERc3/FcWyPiyeq2dZBz9Jql4/HfjYgSEevqPEuP12QyIo5Ur8njEbG54+/U7r3Va5bqsW3VsfLViPjTOs/S4zX5TMfrMRMRj6/SOa6LiEeqOZoR8fZq+2CPkVLKkm7AeuD66v6PAP8NXAP8EjBSbf8Y8LHq/jXAE8BrgTcCTwFrqttTwJuA11T7XLPUdbzaW485LunYZzvwyer+ZuALQADvAB6ttr8OeLr638uq+5cNao5es1R/vgr4Iu0fnFlX51l6vCaTwO912b+W761FZvkF4N+B11aP/VidZ+n13urY58+Bj6zGOYCHgV/uOC4OZBwjSz4DLqUcLaU8Vt1/Hvg6MFZKebiUcqra7RHgDdX9m4FPl1JeKqV8E2gBb69urVLK06WUHwCfrvYdiB5zfL9jt4uBue9O3gzsLW2PAJdGxHrgXcD+UspzpZT/A/YDmwY1Byw8S/XwXwB/0DEH1HSWReboppbvLeg5y23AR0spL1WP/W+dZ1nsNYmIAH4D+MdVOkcBLql2+1HgOx1zDOwYOa9rwBHRAH4GeHTeQ++j/dkD2kM+0/HYt6ttC20fuPlzRMSfRMQzwHuBj1S71X4OOHeWiLgZOFJKeWLebrWfpct764PVl4J/GxGXVdtqPwe8YpY3A++MiEcj4ksR8bPVbrWfZYHj/Z3Ad0spT1Z/Xm1z7AA+Xh3vfwZ8uNptoHMsO8ARMQp8DtjRedYYEXcAp4D7X+2iBqHbHKWUO0opV9Ge4YOZ61uOzllovwZ/yNlPIKtGl9fkE8DVwHXAUdpf8q4KXWYZof3l6zuA3wf+qTqLrLWFjnfg3Zw9+629LnPcBuysjvedwN9krGtZAY6IH6I9xP2llM93bP8t4FeA95bqgglwhPZ1yDlvqLYttH1gFpqjw/3Ar1X3azsHdJ3latrX4J6IiJlqXY9FxI9T41m6vSallO+WUk6XUs4Af037y1l6rDd9Dljw/fVt4PPVl7ZfBs7Q/sUvtZ2lx/E+Avwq8JmO3VfbHFuBufv/TNZ7axkXswPYC9w1b/sm4GvAFfO2v5VzL8o/TfuC/Eh1/42cvSj/1ld7MXsF5vjJjvvbgM9W97dw7kX5L5ezF+W/SfuC/GXV/dcNao5es8zbZ4az34Sr5Sw9XpP1Hfd30r7GWNv31iKzvB/4o+r+m2l/ORt1naXXe6s65r80b9uqmoP2teAbqvu/CExnHCPLGeTnaF+4Pgg8Xt02077Y/kzHtk92/J07aH8H9BtU33Gstm+m/d3Ip4A7BnyALDTH54CvVNv/jfY35uZewL+q1noImOh4rvdV87eA3x7kHL1mmbfPDGcDXMtZerwmf1+t8yDwr5wb5Nq9txaZ5TXAP1TvsceAG+s8S6/3FvB3wPu7/J1VM0e1fZr2J4RHgY0Zx4g/iixJSfxJOElKYoAlKYkBlqQkBliSkhhgSUpigLWqRMSlEfGB6v7rI+KzC+x3ICJW/T8SqeFmgLXaXAp8AKCU8p1Syq/nLkc6fyPZC5CW6aPA1dXvoX0SeEsp5dqIWAt8Cvhp4DCwNm+J0tIYYK02HwKuLaVcV/12qwer7bcBL5ZS3hIRb6P902ZSrXkJQsPi52n/qC+llIO0f/RUqjUDLElJDLBWm+dp/9My8/0n8B6AiLgWeNsgFyWdD68Ba1UppRyLiP+KiK/Q/pWCcz4BfCoivl5tn05ZoLQM/jY0SUriJQhJSmKAJSmJAZakJAZYkpIYYElKYoAlKYkBlqQk/w8fN12SFSqSDQAAAABJRU5ErkJggg==\n", 356 | "text/plain": [ 357 | "
" 358 | ] 359 | }, 360 | "metadata": { 361 | "needs_background": "light" 362 | }, 363 | "output_type": "display_data" 364 | } 365 | ], 366 | "source": [ 367 | "sns.boxplot(x=train.iloc[:10000][\"tid\"])" 368 | ] 369 | }, 370 | { 371 | "cell_type": "code", 372 | "execution_count": 11, 373 | "metadata": {}, 374 | "outputs": [ 375 | { 376 | "data": { 377 | "text/plain": [ 378 | "file_id 9\n", 379 | "label 3\n", 380 | "api 166\n", 381 | "tid 56\n", 382 | "index 5001\n", 383 | "dtype: int64" 384 | ] 385 | }, 386 | "execution_count": 11, 387 | "metadata": {}, 388 | "output_type": "execute_result" 389 | } 390 | ], 391 | "source": [ 392 | "train.nunique()" 393 | ] 394 | }, 395 | { 396 | "cell_type": "markdown", 397 | "metadata": {}, 398 | "source": [ 399 | "### 2.1.3 数据缺失值探索" 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": 12, 405 | "metadata": {}, 406 | "outputs": [ 407 | { 408 | "data": { 409 | "text/plain": [ 410 | "count 35951.000000\n", 411 | "mean 2153.216267\n", 412 | "std 1537.349809\n", 413 | "min 0.000000\n", 414 | "25% 722.000000\n", 415 | "50% 2004.000000\n", 416 | "75% 3502.000000\n", 417 | "max 5000.000000\n", 418 | "Name: index, dtype: float64" 419 | ] 420 | }, 421 | "execution_count": 12, 422 | "metadata": {}, 423 | "output_type": "execute_result" 424 | } 425 | ], 426 | "source": [ 427 | "train['index'].describe()" 428 | ] 429 | }, 430 | { 431 | "cell_type": "markdown", 432 | "metadata": {}, 433 | "source": [ 434 | "### 2.1.4 奇异值探索" 435 | ] 436 | }, 437 | { 438 | "cell_type": "code", 439 | "execution_count": 13, 440 | "metadata": { 441 | "scrolled": true 442 | }, 443 | "outputs": [ 444 | { 445 | "data": { 446 | "text/plain": [ 447 | "count 35951.000000\n", 448 | "mean 2153.216267\n", 449 | "std 1537.349809\n", 450 | "min 0.000000\n", 451 | "25% 722.000000\n", 452 | "50% 2004.000000\n", 453 | "75% 3502.000000\n", 454 | "max 5000.000000\n", 455 | "Name: index, dtype: float64" 456 | ] 457 | }, 458 | "execution_count": 13, 459 | "metadata": {}, 460 | "output_type": "execute_result" 461 | } 462 | ], 463 | "source": [ 464 | "train['index'].describe()" 465 | ] 466 | }, 467 | { 468 | "cell_type": "code", 469 | "execution_count": 14, 470 | "metadata": {}, 471 | "outputs": [ 472 | { 473 | "data": { 474 | "text/plain": [ 475 | "count 35952.000000\n", 476 | "mean 2494.964564\n", 477 | "std 129.979938\n", 478 | "min 282.000000\n", 479 | "25% 2456.000000\n", 480 | "50% 2500.000000\n", 481 | "75% 2596.000000\n", 482 | "max 2980.000000\n", 483 | "Name: tid, dtype: float64" 484 | ] 485 | }, 486 | "execution_count": 14, 487 | "metadata": {}, 488 | "output_type": "execute_result" 489 | } 490 | ], 491 | "source": [ 492 | "train['tid'].describe()" 493 | ] 494 | }, 495 | { 496 | "cell_type": "markdown", 497 | "metadata": {}, 498 | "source": [ 499 | "### 2.1.5 标签分布探索" 500 | ] 501 | }, 502 | { 503 | "cell_type": "code", 504 | "execution_count": 15, 505 | "metadata": {}, 506 | "outputs": [ 507 | { 508 | "data": { 509 | "text/plain": [ 510 | "0 28350\n", 511 | "5 6786\n", 512 | "2 816\n", 513 | "Name: label, dtype: int64" 514 | ] 515 | }, 516 | "execution_count": 15, 517 | "metadata": {}, 518 | "output_type": "execute_result" 519 | } 520 | ], 521 | "source": [ 522 | "train['label'].value_counts()" 523 | ] 524 | }, 525 | { 526 | "cell_type": "code", 527 | "execution_count": 16, 528 | "metadata": {}, 529 | "outputs": [ 530 | { 531 | "data": { 532 | "text/plain": [ 533 | "" 534 | ] 535 | }, 536 | "execution_count": 16, 537 | "metadata": {}, 538 | "output_type": "execute_result" 539 | }, 540 | { 541 | "data": { 542 | "image/png": "iVBORw0KGgoAAAANSUhEUgAABeEAAAH+CAYAAAAbE8XCAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAABcSAAAXEgFnn9JSAAAuyklEQVR4nO3de7BtZXku+OeNm4vcBCN4PKLYYohHFO+CEm+AaW+0KCSpmI4tmnTa1pY0cFKxGikUK4nWkVaLmFSOAvZJ5bQ2mhCC0YQyKEEMiSgY8ATEjop3QeTmBoxv/zHHsqdrz7X23rC+vfZm/35Vq8Ya3xjPHN9cVdZyP+vjm9XdAQAAAAAA1t7PrPcEAAAAAADg/koJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMsmG9J7Azq6pvJdkjydfWey4AAAAAAKzoEUnu7O5/t7XB6u4B82FLVNWtu+22294HH3zwek8FAAAAAIAV3HDDDbnrrrtu6+59tjZrJfz6+trBBx/8uGuuuWa95wEAAAAAwAoOPfTQXHvttfdqRxN7wgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgkA3rPQG4P3nU71603lMAVvGvf/CS9Z4CAAAAsJOxEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGOQ+l/BVtUdVHVdV76+qf6mqjVV1R1VdVVWnV9VeCzJnVFWv8vUHqzzvyKr6aFXdXFW3V9UVVfWqzczxwKo6t6q+Mc3vuqp6S1XtvkrmgVX11unejVP2nKp6+Nb9hAAAAAAA2FltWIPXeGWS/zx9/8Ukf5lknyTPSvKWJL9aVc/t7u8syF6W5EsLxj+76EFVdXySD2b2x4NPJflekqOTfKCqDuvuUxdkHpPk8iQPSfLPSS5N8rQkpyc5uqqO7u67lmV2T/KJJEck+WaSC5I8KsmJSV5aVUd095cX/jQAAAAAAGCyFiX8PUn+JMm7uvuLS4NV9bAkFyV5cpJ3ZVbWL/e+7j5vSx5SVQ9Ock6SByQ5vrs/Mo0/NMnfJzmlqv6quy9ZFj0vswL+Pd190pTZkORDSV6e5E1JzliWOS2zAv7yJL/Y3bdPuZOTvHOax/O2ZN4AAAAAAOy87vN2NN39ge7+rfkCfhr/ZpLXT6evqKpd7+OjfiOzFfYXLBXw03O+neR3ptNT5gNV9YwkRyb5ztw96e4fJXldZn9AeONUyi9ldk3yhun09UsF/JQ7K8nVSZ5bVU+9j+8HAAAAAID7udEfzHrVdNwtyc/ex9d6yXQ8f8G1i5JsTHLMsn3elzIXLt9yZirvL02yX5JfmLt0ZJIHJbmhuz+34FlLzz9266YPAAAAAMDOZnQJ/+jpeE+SmxdcP6qq3lVVf1xVp21mdfkTp+OVyy90992Z7fe+e5JDtiSzbPyw+5gBAAAAAIBNrMWe8Ks5aTp+bPlK9MmvLzs/s6o+nOTV89vAVNU+ma1OT5IbV3jWjZl94OpBmW0ZkySP3IJMpsySe5NZVVVds8Klg7f0NQAAAAAA2PEMWwlfVS9O8trMVsG/ednlLyU5NcmhSfZK8ogkv5bk60mOT/Jflt2/19z3d67wyDum494LcqMzAAAAAACwiSEr4avqsUn+NEkl+Y/dfdX89e7+02WRO5L8WVX9XZIvJDmuqo7o7s+MmN+21t2HLhqfVsg/bhtPBwAAAACAbWTNV8JX1cOTfCyzDzw9q7vfvaXZ7v5mknOn0xfOXbp97vs9VojvOR1vW5AbnQEAAAAAgE2saQlfVQ9O8jeZ7Zd+bmZbzmyt66fjw5YGuvvWJD+YTg9cIbc0/pW5sa9uowwAAAAAAGxizUr4qtoryV9ntr3KR5L8Znf3vXip/abjHcvGl7a0ecqCZ++S5PFJNia5bksyy8avnhu7NxkAAAAAANjEmpTwVbVbkguSPCPJx5P8anf/2714nUry8un0ymWXL5qOJyyIvjTJ7kku7u6NCzLHTnOcf9ZDkzw7yfeTXDZ36bLMVt0fXFVPWvCspedfuPI7AQAAAACANSjhq+oBSf5rkqOSXJrkFd199yr3719Vr6+qvZeN75Xkj5IcnuRbma2mn/e+JLcmeVlVvWIud0CSd0yn75wPdPcVmZXqByR5+1xmQ5L3JtklyXu6+565zN1Jzp5O/7Cq9pzLnZzksCSf7O7PrvQeAQAAAAAgSTaswWu8If//6vXvJXnvbEH7Jk7t7u9l9sGmZyf5g6r6xyTfTLJ/Ztu8/GySW5Kc0N13zoe7++aqek2SDyU5v6ouSXJTkmOS7JvZh8BesuC5Jya5PMlJVXVUkmuTPD3Jo5N8OsnvL8i8bXrdZyW5vqouzWyf+8OTfDfJa1b7gQAAAAAAQLI2Jfx+c9+/fMW7kjMyK+lvymxV+hFJDsms6P63JP9vkvOS/J/d/fVFL9DdH66q5yQ5bcrvmlmpfnZ3f2CFzPVV9eQkb03ywmmOX01yZpLf6+67FmQ2VtXzk7wpySuTHJfk5ml+b+7uG1d5nwAAAAAAkGQNSvjuPiOzgn1L778tye/eh+ddluRFW5n5WmYr4rcm88Mkp09fAAAAAACw1dbkg1kBAAAAAIBNKeEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAxyn0v4qtqjqo6rqvdX1b9U1caquqOqrqqq06tqr1Wyr66qK6rq9qq6uao+WlXP2szzjpzuu3nKXVFVr9pM5sCqOreqvjHN77qqektV7b5K5oFV9dbp3o1T9pyqevjmfyoAAAAAALA2K+FfmeTPk7wmyb8l+csklyb575K8Jck/VtUBy0NV9a4k5yZ5fJKLk1yR5AVJPlVVxy16UFUdn+STSV6Y5OokH0vyc0k+UFX/aYXMY5J8Lsmrk9yU5IIkD0hyepKLq2q3BZndk3wiyZuT7DVlvpbkxCSfq6pHr/oTAQAAAACArE0Jf0+SP0nyuO5+XHf/cne/MMnPZ1Z+PzbJu+YDVXVMkpMyK8Wf2N3HTZnnZFbkn1tV+y7LPDjJOZkV6Cd09/O6+4Tp9b+U5JSqet6C+Z2X5CFJ3tPdT+juX5nm9udJjkzypgWZ05IckeTyJId096909+FJTkmy/zQPAAAAAABY1X0u4bv7A939W939xWXj30zy+un0FVW169zlk6fj27r7+rnM5Un+OMm+SV677FG/kWSfJBd090fmMt9O8jvT6Snzgap6RmZF+3fm7kl3/yjJ6zL7A8Ibq2rDXGbXJG+YTl/f3bfP5c7KbAX+c6vqqQt/IAAAAAAAMBn9waxXTcfdkvxsMttrPclR0/j5CzJLY8cuG3/JKpmLkmxMcsyyfd6XMhd2913zgam8vzTJfkl+Ye7SkUkelOSG7v7cVswPAAAAAAB+yugSfmnv9HuS3Dx9//OZlfLf7e4bF2SunI6HLRt/4rLrP9Hddyf55yS7JzlkSzKrPOveZAAAAAAAYBMbNn/LfXLSdPzY3Er0R07HRQV8uvuOqrolyX5VtXd331ZV+2S2On3F3DT+tCQHZbZlzGafNTd+0NzYvcmsqqquWeHSwVv6GgAAAAAA7HiGrYSvqhdntq/7PUnePHdpr+l45yrxO6bj3ssyq+WWZ7bkWWuVAQAAAACATQxZCV9Vj03yp0kqyX/s7qs2E7lf6+5DF41PK+Qft42nAwAAAADANrLmK+Gr6uFJPpbZB56e1d3vXnbL7dNxj1VeZs/peNuyzGq55ZktedZaZQAAAAAAYBNrWsJX1YOT/E1m+6Wfm+TUBbd9dToeuMJr7Jlk3yTf7+7bkqS7b03yg9Vyc+Nf2dJnrWEGAAAAAAA2sWYlfFXtleSvM9te5SNJfrO7e8Gt/5LkriT7T6vml3vKdLx62fhVy67PP3uXJI9PsjHJdVuSWeVZ9yYDAAAAAACbWJMSvqp2S3JBkmck+XiSX+3uf1t0b3f/MMknptNfWnDLCdPxwmXjFy27Pu+lSXZPcnF3b1yQOXaa4/ycH5rk2Um+n+SyuUuXZbbq/uCqetJWzA8AAAAAAH7KfS7hq+oBSf5rkqOSXJrkFd1992ZiZ03H06rq5+Ze65lJfivJLUnevyzzviS3JnlZVb1iLnNAkndMp++cD3T3FZmV6gckeftcZkOS9ybZJcl7uvueuczdSc6eTv9w2h5nKXdyksOSfLK7P7uZ9wgAAAAAwE5uwxq8xhuSvHz6/ntJ3ltVi+47tbu/lyTdfXFVvTvJSUk+X1V/m2TXJC9IUklO7O5b5sPdfXNVvSbJh5KcX1WXJLkpyTGZ7SF/VndfsuC5Jya5PMlJVXVUkmuTPD3Jo5N8OsnvL8i8bXrdZyW5vqouzWyf+8OTfDfJa1b9iQAAAAAAQNamhN9v7vuXr3hXckZmJX2SpLt/u6o+n1mJ/4Ikdye5OMmZ3f3pRS/Q3R+uquckOS3JEZkV99cmObu7P7BC5vqqenKStyZ54TTHryY5M8nvdfddCzIbq+r5Sd6U5JVJjktyc5Lzkry5u29c5X0CAAAAAECSNSjhu/uMzAr2e5M9L7Nie2sylyV50VZmvpbZivityfwwyenTFwAAAAAAbLU1+WBWAAAAAABgU0p4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDrEkJX1VPrarfraqPVNWNVdVV1avcf8bSPSt8/cEq2SOr6qNVdXNV3V5VV1TVqzYzvwOr6tyq+kZVbayq66rqLVW1+yqZB1bVW6d7N07Zc6rq4Vv2UwEAAAAAYGe3YY1e581JXnYvcpcl+dKC8c8uurmqjk/ywcz+ePCpJN9LcnSSD1TVYd196oLMY5JcnuQhSf45yaVJnpbk9CRHV9XR3X3XsszuST6R5Igk30xyQZJHJTkxyUur6oju/vJWv1sAAAAAAHYqa1XCX57k6iT/OH39a5LdtiD3vu4+b0seUFUPTnJOkgckOb67PzKNPzTJ3yc5par+qrsvWRY9L7MC/j3dfdKU2ZDkQ0lenuRNSc5YljktswL+8iS/2N23T7mTk7xzmsfztmTeAAAAAADsvNZkO5rufnt3n97dF3b3t9biNRf4jST7JLlgqYCfnv3tJL8znZ4yH6iqZyQ5Msl35u5Jd/8oyeuS3JPkjVMpv5TZNckbptPXLxXwU+6szP7Y8NyqeuravTUAAAAAAO6PdqQPZn3JdDx/wbWLkmxMcsyyfd6XMhcu33JmKu8vTbJfkl+Yu3RkkgcluaG7P7fgWUvPP3brpg8AAAAAwM5mvUv4o6rqXVX1x1V12mZWlz9xOl65/EJ3353Zfu+7JzlkSzLLxg+7jxkAAAAAANjEWu0Jf2/9+rLzM6vqw0lePb8NTFXtk9nq9CS5cYXXujGzD1w9KLMtY5LkkVuQyZRZcm8yq6qqa1a4dPCWvgYAAAAAADue9VoJ/6UkpyY5NMleSR6R5NeSfD3J8Un+y7L795r7/s4VXvOO6bj3gtzoDAAAAAAAbGJdVsJ3958uG7ojyZ9V1d8l+UKS46rqiO7+zLaf3drr7kMXjU8r5B+3jacDAAAAAMA2st57wv+U7v5mknOn0xfOXbp97vs9VojvOR1vW5AbnQEAAAAAgE1sVyX85Prp+LClge6+NckPptMDV8gtjX9lbuyr2ygDAAAAAACb2B5L+P2m4x3Lxq+ajk9ZHqiqXZI8PsnGJNdtSWbZ+NVzY/cmAwAAAAAAm9iuSviqqiQvn06vXHb5oul4woLoS5PsnuTi7t64IHNsVe227FkPTfLsJN9PctncpcsyW3V/cFU9acGzlp5/4crvBAAAAAAA1qGEr6r9q+r1VbX3svG9kvxRksOTfCvJR5ZF35fk1iQvq6pXzOUOSPKO6fSd84HuviKzUv2AJG+fy2xI8t4kuyR5T3ffM5e5O8nZ0+kfVtWec7mTkxyW5JPd/dmtfOsAAAAAAOxkNqzFi1TVS5K8eW5o12n8M3NjZ3b3RZl9sOnZSf6gqv4xyTeT7J/ZNi8/m+SWJCd0953zz+jum6vqNUk+lOT8qrokyU1Jjkmyb5KzuvuSBdM7McnlSU6qqqOSXJvk6UkeneTTSX5/QeZt0+s+K8n1VXVpkoMy+wPBd5O8ZnM/EwAAAAAAWKuV8PtnVlAvfdU0Pj+2/zR2U2ar0j+b5JAkxyc5MrPV7+9M8vjunt8e5ie6+8NJnpPk40menOTFSb6U5NXdfcoKmeune8+b5vDyJD9OcmaSo7v7rgWZjUmeP91zZ5LjMivhz0vylO7+8uZ+IAAAAAAAsCYr4bv7vMwK6i2597Ykv3sfnnVZkhdtZeZrma2I35rMD5OcPn0BAAAAAMBW264+mBUAAAAAAO5PlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZZkxK+qp5aVb9bVR+pqhurqquqtyD36qq6oqpur6qbq+qjVfWszWSOnO67ecpdUVWv2kzmwKo6t6q+UVUbq+q6qnpLVe2+SuaBVfXW6d6NU/acqnr45t4XAAAAAAAkyYY1ep03J3nZ1gSq6l1JTkrywyR/k2T3JC9I8otVdUJ3/8WCzPFJPpjZHw8+leR7SY5O8oGqOqy7T12QeUySy5M8JMk/J7k0ydOSnJ7k6Ko6urvvWpbZPcknkhyR5JtJLkjyqCQnJnlpVR3R3V/emvcLAAAAAMDOZ622o7k8yZlJ/ockD0ty12o3V9UxmRXwNyV5Yncf190vTPKcJP+W5Nyq2ndZ5sFJzknygCQndPfzuvuEJI9N8qUkp1TV8xY87rzMCvj3dPcTuvtXkvx8kj9PcmSSNy3InJZZAX95kkO6+1e6+/AkpyTZf5oHAAAAAACsak1K+O5+e3ef3t0Xdve3tiBy8nR8W3dfP/c6lyf54yT7JnntssxvJNknyQXd/ZG5zLeT/M50esp8oKqekVnR/p25e9LdP0ryuiT3JHljVW2Yy+ya5A3T6eu7+/a53FlJrk7y3Kp66ha8TwAAAAAAdmLb/INZq+qBSY6aTs9fcMvS2LHLxl+ySuaiJBuTHLNsn/elzIXLt5yZyvtLk+yX5BfmLh2Z5EFJbujuz23F/AAAAAAA4Kds8xI+s61gdkvy3e6+ccH1K6fjYcvGn7js+k90992Z7fe+e5JDtiSzyrPuTQYAAAAAADaxVh/MujUeOR0XFfDp7juq6pYk+1XV3t19W1Xtk9nq9BVz0/jTkhyU2ZYxm33W3PhBWzq/FTKrqqprVrh08Ja+BgAAAAAAO571WAm/13S8c5V77piOey/LrJZbntmSZ61VBgAAAAAANrEeK+F3Ot196KLxaYX847bxdAAAAAAA2EbWYyX87dNxj1Xu2XM63rYss1pueWZLnrVWGQAAAAAA2MR6lPBfnY4HLrpYVXsm2TfJ97v7tiTp7luT/GC13Nz4V7b0WWuYAQAAAACATaxHCf8vSe5Ksn9VPXzB9adMx6uXjV+17PpPVNUuSR6fZGOS67Yks8qz7k0GAAAAAAA2sc1L+O7+YZJPTKe/tOCWE6bjhcvGL1p2fd5Lk+ye5OLu3rggc2xV7TYfqKqHJnl2ku8nuWzu0mWZrbo/uKqetBXzAwAAAACAn7IeK+GT5KzpeFpV/dzSYFU9M8lvJbklyfuXZd6X5NYkL6uqV8xlDkjyjun0nfOB7r4is1L9gCRvn8tsSPLeJLskeU933zOXuTvJ2dPpH07b4yzlTk5yWJJPdvdnt+4tAwAAAACws9mwFi9SVS9J8ua5oV2n8c/MjZ3Z3RclSXdfXFXvTnJSks9X1d9OmRckqSQndvct88/o7pur6jVJPpTk/Kq6JMlNSY7JbA/5s7r7kgXTOzHJ5UlOqqqjklyb5OlJHp3k00l+f0HmbdPrPivJ9VV1aZKDkhye5LtJXrPZHwoAAAAAADu9tVoJv39mBfXSV03j82P7zwe6+7czK8i/mFn5/swkFyd5Tnf/xaKHdPeHkzwnyceTPDnJi5N8Kcmru/uUFTLXT/eeN83h5Ul+nOTMJEd3910LMhuTPH+6584kx2VWwp+X5Cnd/eVVfhYAAAAAAJBkjVbCd/d5mRXUw3PdfVmSF21l5muZFf5bk/lhktOnLwAAAAAA2GrrtSc8AAAAAADc7ynhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhAQAAAABgECU8AAAAAAAMooQHAAAAAIBBlPAAAAAAADCIEh4AAAAAAAZRwgMAAAAAwCAb1nsCAAAAANw7T/jAE9Z7CsAqvvA/fWG9p8B2wEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGCQdS3hq+qSqupVvl64Qu7VVXVFVd1eVTdX1Uer6lmbedaR0303T7krqupVm8kcWFXnVtU3qmpjVV1XVW+pqt3vy/sGAAAAAGDnsGG9JzD5cJLbF4x/fflAVb0ryUlJfpjkb5LsnuQFSX6xqk7o7r9YkDk+yQcz+6PDp5J8L8nRST5QVYd196kLMo9JcnmShyT55ySXJnlaktOTHF1VR3f3XVv9TgEAAAAA2GlsLyX8qd39r5u7qaqOyayAvynJM7v7+mn8mUkuSXJuVV3S3bfMZR6c5JwkD0hyfHd/ZBp/aJK/T3JKVf1Vd1+y7HHnZVbAv6e7T5oyG5J8KMnLk7wpyRn36t0CAAAAALBT2NH2hD95Or5tqYBPku6+PMkfJ9k3yWuXZX4jyT5JLlgq4KfMt5P8znR6ynygqp6R5Mgk35m7J939oySvS3JPkjdOpTwAAAAAACy0w5TwVfXAJEdNp+cvuGVp7Nhl4y9ZJXNRko1Jjlm2z/tS5sLlW85M5f2lSfZL8gtbNnsAAAAAAHZG20sJ/9qqem9VnV1Vb6yqRy645+eT7Jbku91944LrV07Hw5aNP3HZ9Z/o7rsz2+999ySHbElmM88CAAAAAICf2F62Uzlt2fl/qqozu/vMubGlYn5RAZ/uvqOqbkmyX1Xt3d23VdU+SR60Wm4af1qSg5JcvSXPmhs/aIXrP6Wqrlnh0sFbkgcAAAAAYMe03ivhP5Xk1zMro/fIbLX7/5HkR0neWlUnzd2713S8c5XXu2M67r0ss1pueWZLnrUoAwAAAAAAP2VdV8J39+nLhq5L8ntV9U9JPp7kjKr6k+7+4baf3drp7kMXjU8r5B+3jacDAAAAAMA2st4r4Rfq7r9J8k9J9k1y+DR8+3TcY5XontPxtmWZ1XLLM1vyrEUZAAAAAAD4KdtlCT+5fjo+bDp+dToeuOjmqtozs9L++919W5J0961JfrBabm78K3Njqz5rhQwAAAAAAPyU7bmE3286Lu2//i9J7kqyf1U9fMH9T5mOVy8bv2rZ9Z+oql2SPD7Jxsy2wtlsZjPPAgAAAACAn9guS/iq2j/Js6fTK5Nk2hf+E9PYLy2InTAdL1w2ftGy6/NemmT3JBd398YFmWOrardlc3voNLfvJ7ls9XcCAAAAAMDObN1K+Kp6VlUdV1UPWDb+qCR/ntm+63/Z3TfOXT5rOp5WVT83l3lmkt9KckuS9y971PuS3JrkZVX1irnMAUneMZ2+cz7Q3VdkVrAfkOTtc5kNSd6bZJck7+nue7biLQMAAAAAsJPZsI7PPiTJuUm+VVVXZlagH5TkqZmtTr8myW/OB7r74qp6d5KTkny+qv42ya5JXpCkkpzY3bcsy9xcVa9J8qEk51fVJUluSnJMZnvIn9XdlyyY34lJLk9yUlUdleTaJE9P8ugkn07y+/fp3QMAAAAAcL+3ntvR/EOSP0ryjczK7V/ObH/2zyc5JcnTu/s7y0Pd/duZFeRfzKx8f2aSi5M8p7v/YtGDuvvDSZ6T5ONJnpzkxUm+lOTV3X3KCpnrp3vPS7J/kpcn+XGSM5Mc3d13bfU7BgAAAABgp7JuK+G7+4tJ/td7mT0vs3J8azKXJXnRVma+llnhDwAAAAAAW227/GBWAAAAAAC4P1DCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyyYb0nAAAAOeNB6z0DYDVn/GC9ZwAAsMOyEh4AAAAAAAZRwgMAAAAAwCBKeAAAAAAAGEQJDwAAAAAAgyjhV1FVD6yqt1bVdVW1saq+UVXnVNXD13tuAAAAAABs/5TwK6iq3ZN8Ismbk+yV5IIkX0tyYpLPVdWj13F6AAAAAADsAJTwKzstyRFJLk9ySHf/SncfnuSUJPsnOWc9JwcAAAAAwPZPCb9AVe2a5A3T6eu7+/ala919VpKrkzy3qp66HvMDAAAAAGDHoIRf7MgkD0pyQ3d/bsH186fjsdtuSgAAAAAA7GiU8Is9cTpeucL1pfHDtsFcAAAAAADYQW1Y7wlspx45HW9c4frS+EFb8mJVdc0Klx57ww035NBDD92aubEd+8a3b9/8TcC6OfTCvdZ7CsBKvut3KGzX/h//ZoHt1Q233LDeUwBWceg7/A69v7jhhhuS5BH3JquEX2yppblzhet3TMe97+NzfnzXXXfdce21137tPr4OsPYOno7+H+39yLU3rfcMAHYKfofeH3332vWeAcDOwO/Q+6Frv+F36P3II7JyX7wqJfw20N3+5AU7mKX/gsX/fgFg6/gdCgD3jt+hcP9lT/jFlv576D1WuL7ndLxtG8wFAAAAAIAdlBJ+sa9OxwNXuL40/pVtMBcAAAAAAHZQSvjFrpqOT1nh+tL41dtgLgAAAAAA7KCU8ItdluQHSQ6uqictuH7CdLxwm80IAAAAAIAdjhJ+ge6+O8nZ0+kfVtXSHvCpqpOTHJbkk9392fWYHwAAAAAAO4bq7vWew3apqnZPckmSw5N8M8mlSQ6azr+b5Iju/vK6TRAAAAAAgO2eEn4VVfXAJG9K8sokj0hyc5KPJXlzd9+4nnMDAAAAAGD7p4QHAAAAAIBB7AkPAAAAAACDKOEBAAAAAGAQJTwAAAAAAAyihAcAAAAAgEGU8AAAAAAAMIgSHgAAAAAABlHCAwAAAADAIEp4AAAAAAAYRAkPAAAAAACDbFjvCQCst6p6cpJjkxyW5KAke0+XbkvylSRXJ7mwuz+3PjMEgO1TVW1I8rNJbu7uezZz74OT7NXdX90mkwOAHUxV7Zbk8CQPS3JHkiu7+xvrOytgLVR3r/ccANZFVT0qyTlJnrs0tMrtneSSJK/t7n8dOjEA2M5V1UOSvCvJK5LsluSeJH+d5PTu/sIKmXOT/Hp3WwgEwE6pqn4xyde7+5oF1/63JGck2XfZpQuS/M/d/b3hEwSGUcIDO6Wq+vdJrkxyQGYr3c+fzm/MbMVBkuyZ5MAkT0nyS0mekOTbSZ5qNQIAO6uq2jPJPyb5+Wz6B+y7k5za3WcvyJ2b5FXd/YDxswSA7U9V/TjJud392mXjpyV5S2a/V/8pyfVJ9kvy7Mz+XfqFJM/o7ru27YyBtWJPeGBndWZmBfzJ3f2k7n5bd3+0u6/u7humr6unsbd19xOTnJrkoUneuq4zB4D1dXKSxyb5fJJnZVYOPCHJ+5PskuTdVfWOdZsdAGzffuoP2FX1iCRvTvLDJP99dz+ju3+tu1+c5NFJPp3k8Un+l20+U2DNKOGBndULk/xDd79rSwPdfVaSf0jyolGTAoAdwPFJbk3y4u7+THf/sLuv6e7fzOwzVn6Q5JSq+s9VtdpWbwBAclxmf8R+W3f/7fyF7v5ukv8xyV1JfnnbTw1YK0p4YGf14CT/ei9yX5myALCzekyST3f3t5df6O6PZrY6/mtJXpPkg9OHtwIAix2S2WeQnb/o4vSZZJ9N8h+24ZyANaaEB3ZWX03y7KraY0sD073PzqxYAICd1QMyWwm/UHf/tyRHJvlvma2av6Cqdt9GcwOAHc1SN7favzO/ktn2b8AOSgkP7Kw+mOTfJ/l4VR22uZunez6e5N8l+bPBcwOA7dlXMtubdkXd/fUkv5DZh8u9MMnHkuwzfmoAsN3bq6oeufSV5KZp/GGrZPZN8v3hMwOGqe5e7zkAbHPTiry/S3J4Zv/p3w1JrkxyY5I7p9v2SHJgkqckOTizD9D5TJLn+1R6AHZWVfX+JK9O8h+6+7rN3Ltnkr9M8vzMft+mux8weo4AsD2qqh9n+n24wK939yYLvqrqZzL7d+rXu/vpI+cHjGN/RmCn1N0bq+p5mX0K/esz29/2MUuXp+P8h8n9IMnZmX1YjgIegJ3ZXyY5Mcn/nuR1q93Y3XdU1YuS/N+ZffCcFUAA7Mw+lZV/Fx6ywvixmf0X2R8ZMiNgm7ASHtjpVdUume1d+8Qkj0yy13Tp9sz2jr8qyWXdfc/6zBAAth9V9cAkr0xyT3f/X1uY+Zkkb0iyX3e/ZeT8AOD+pKqemVlB/w/T564AOyAlPAAAAAAADOKDWQEAAAAAYBAlPAAAAAAADKKEBwAAAACAQZTwAAAAAAAwiBIeAAAAAAAGUcIDAAAAAMAgSngAAAAAABhECQ8AAAAAAIMo4QEAAAAAYBAlPAAAAAAADKKEBwAAAACAQZTwAAAAAAAwyP8HJMWOWb7+FSUAAAAASUVORK5CYII=\n", 543 | "text/plain": [ 544 | "
" 545 | ] 546 | }, 547 | "metadata": { 548 | "needs_background": "light" 549 | }, 550 | "output_type": "display_data" 551 | } 552 | ], 553 | "source": [ 554 | "plt.figure(figsize=(12,4),dpi=150)\n", 555 | "train['label'].value_counts().sort_index().plot(kind = 'bar')" 556 | ] 557 | }, 558 | { 559 | "cell_type": "code", 560 | "execution_count": 17, 561 | "metadata": {}, 562 | "outputs": [ 563 | { 564 | "data": { 565 | "text/plain": [ 566 | "" 567 | ] 568 | }, 569 | "execution_count": 17, 570 | "metadata": {}, 571 | "output_type": "execute_result" 572 | }, 573 | { 574 | "data": { 575 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAf8AAAHjCAYAAAAt5RZeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAABcSAAAXEgFnn9JSAAA0s0lEQVR4nO3deZhcVYH38d/pTtIJIWSBACGQXAhZQAgJARISFHx1FKgRBHFFLARlwNFR2byvyrzozDuW+row7uPo6OiMOqiE5SKLYRUUZEm4IBACFLsgSXens3S6u+q8f9yKNKGT9HJvnbt8P8/TTyddVbd+5NH+1bn33HOMtVYAAKA4WlwHAAAAzUX5AwBQMJQ/AAAFQ/kDAFAwlD8AAAVD+QMAUDCUPwAABUP5AwBQMJQ/AAAFQ/kDAFAwlD8AAAVD+QMAUDCUPwAABUP5AwBQMJQ/AAAFQ/kDAFAwlD8AAAVD+QMAUDCUPwAABUP5AwBQMJQ/AAAFQ/kDAFAwlD8AAAVD+QMAUDCUPwAABUP5AwBQMJQ/AAAFQ/kDAFAwlD8AAAVD+QMAUDCUPwAABUP5AwBQMJQ/AAAFQ/kDAFAwlD8AAAVD+QMAUDCUPwAABUP5AwBQMJQ/AAAFQ/kDAFAwlD8AAAVD+QMAUDCUPwAABUP5AwBQMJQ/AAAFQ/kDAFAwlD8AAAVD+QMAUDCUPwAABUP5AwBQMJQ/AAAFQ/kDAFAwlD8AAAVD+QMAUDCUPwAABUP5AwBQMJQ/AAAFQ/kDAFAwlD8AAAUzynUAAM3n+cEukiZK2m0HX/0fnyBptKSapHrje/8/90javJ2vdZKe3/pVrZQ2NOO/EcD2GWut6wwAYuT5gZE0TdIB23zNkrS/pKly+8F/g6IPAi/olQ8F2/75uWqltNFZQiDnKH8ggzw/GCNpjl5b8AdI8iSNcxYuPi9IeqDfVyjp4Wql1OM0FZADlD+Qco2R/GxJR0la3Pi+QNIYh7Fc6ZX0qF79oeCBaqX0nNNUQMZQ/kDKeH4wVa+U/GJJR0qa7DRU+q1VdGZglaTbJN1crZTa3UYC0ovyBxzy/GC0onJf3O/Lc5kpJ+qSVkpa0fi6vVopbXKaCEgRyh9oMs8PZko6vvH1vxTNpkeyeiTdpeiDwE2S/lCtlHrdRgLcofyBhDUm5x0r6URFhT/PbSJI2ijpdkUfBFZIWlmtlOpuIwHNQ/kDCfD8YKKisj9Z0glidJ92L0r6taRfSrq1WinVHOcBEkX5AzHx/GAfSacqKvxjFS2Kg+x5SdIVij4I3MwHAeQR5Q+MQOOU/kmSzpL0FkmtbhMhZi8r+hDw02qldIfrMEBcKH9gGDw/mK+o8E+XtIfjOGiOJyT9t6IPAo+6DgOMBOUPDJLnB5MlvU9R6R/uOA7cukfSjyX9uFopdbkOAwwV5Q/sgOcHLZLeLOmDkt4uaazTQEibLkn/Ielfq5XS467DAINF+QMD8PxghqQPSSpLmuE4DtKvLimQdFm1UlrhOgywM5Q/0I/nB6+T5Et6j9jyGsPzoKR/VTQ3YLPrMMBAKH9AkucHiyV9WtLbJBnHcZAPayX9m6RvsfEQ0obyR6F5fvA3kv63pDe6zoLc6pP0K0WXBH7vOgwgUf4ooMYkvlMUnd4/wnEcFMudkj5drZRudR0ExUb5ozAaO+i9X9LFYn19uHW9og8B97kOgmKi/JF7nh+MlXSOpAsl7ec4DrCVlXS5pEuqldJq12FQLJQ/cs3zg3dJ+qIkz3EUYHv6FK0V8DkmBqJZKH/kkucHiyR9XdIxjqMAg9Ut6ZuSvlCtlNa5DoN8o/yRK54fTJP0L4oW5+GWPWRRp6T/J+lr1Uppo+swyCfKH7nQuK5/vqLb9nZ1HAeIw4uS/lnSd9hWGHGj/JF5nh+8U9KXxHV95NP9kj5crZTudR0E+UH5I7M8Pzhc0XX91zuOAiStpmjJ4Eu4FIA4UP7IHM8P9pRUUXRdv8VxHKCZnpJ0XrVS+o3rIMg2yh+Z4vnB6ZIuk7S76yyAQ7+Q9PFqpfSi6yDIJsofmdCYxf9dSSe5zgKkRLui1Sp/UK2U+EWOIaH8kXqeH5QlfU3SZNdZgBS6VdLfVSulR10HQXZQ/kitxmj/3yWd6DoLkHJbJP1fSZVqpdTrOgzSj/JHKnl+cJqk70ma4joLkCEPSnpvtVJ60HUQpBvlj1Tx/GA3RUucnuE6C5BR3ZIuqFZK33YdBOlF+SM1PD94g6T/lDTTdRYgB5ZLOpt9AjAQyh/OeX4wStEypheJ+/aBOD0j6f3VSuk210GQLpQ/nPL8YKqk/5F0nOMoQF7VJH1e0j9XK6W66zBIB8ofzjS23f21pBmuswAFcL2k06uV0lrXQeAep1jhhOcHZ0j6nSh+oFneKul+zw8Wuw4C9xj5o6ka1/e/KuljrrMABdUj6cJqpfQN10HgDuWPpmlc379c0rGuswDQzyWdVa2UNrsOguaj/NEUjev7V0jaz3UWAH91l6S3VSulv7gOgubimj8S11ib/3ei+IG0WSzp954fzHYdBM3FyB+J4fo+kBlrJZ1UrZTudB0EzUH5IxGeH0xUtMLYcW6TABikbkULAv3KdRAkj9P+iJ3nB3tKulkUP5AlYyX9j+cHn3QdBMlj5I9YeX4wQ9KNkua4zgJg2C6TdD4rAuYX5Y/YeH4wV1HxM7EPyL4rFK0IyK2AOUT5IxaeHyxUtHzoVNdZAMTmD4puBXzZdRDEi2v+KWaMGWeM+bwxZrUxptsY87wx5ofGmOmus/Xn+cExiq7xU/xAvixRdCvgga6DIF6M/FPKGDNWUaEukfSCpNsleZKOkvQXSUustU84C9jg+cHxkn4laRfXWQAk5nlJx1YrpTWugyAejPzT67NqfOqWNMda+25r7WJJFygaYf/QZThJ8vzgXZKuEsUP5N0+km7y/GB/10EQD0b+KWSMGSPpJUkTJR1urb1/m8dXSZov6Qhr7b0OIsrzgw9J+p74AAkUyVOS3lCtlJ52HQQjwy/udFqmqPgf37b4G37Z+P625kV6hecHF0r6vvjfD1A0MyXd7PlBquYdYej45Z1OhzW+37edx7f+fH4TsrxKYwGQLzf7fQGkxgGKLgFMcx0Ew0f5p9OMxvdnt/P41p/PbEKWv/L84AxJX2nmewJIpTmSVjRW80QGUf7ptGvj+6btPL6x8X1CE7JIkjw/KCmaZGia9Z4AUu0gRR8A9nAdBENH+WOnPD9YJulySaNcZwGQKodI+q3nB1NcB8HQUP7ptKHxfXu30I1vfO9KOojnB4dIulrSuKTfC0AmHSbpBs8PJrkOgsGj/NNp6200+27n8a0/fyrJEJ4feIqW7J2c5PsAyLxFkq73/GA310EwOJR/Oq1qfD98O49v/fkDSQVoTOS5QdHiHgCwM0dJ+oXnB62ug2DnKP90ukNSp6RZxpgFAzx+WuP71Um8uecHEyT9RtLsJI4PILeOl/RV1yGwc5R/CllreyR9s/HXbxljtl7jlzHmfEX399+axOp+nh+0SbpS2z/rAAA78g+eH5zrOgR2jOV9U6qxsc8tkhbrlY19Zjb+nsjGPp4ftCia1X9qnMcFUDh9ko6vVkorXAfBwBj5p5S1tlvSGyX9k6L7/d+uqPx/pGi9/yR29LtMFD+AkRsl6XLPD7h0mFKM/CFJ8vzgbEn/7joHgFxZLWlJtVJqdx0Er0b5Q54fLFF0iaHNcRQA+bNC0SWAPtdB8ApO+xdcY3OOX4niB5CMN0n6husQeDXKv8A8PxijqPi5lx9Aks71/OBjrkPgFZR/sX1T0tGuQwAohK95fvBW1yEQ4Zp/QXl+8EFFu/QBQLN0SlpcrZQedR2k6Cj/AvL8YL6kP4jNegA03ypFdwB0uw5SZJz2L5jG0r2Xi+IH4MZhkr7iOkTRUf7F8wNJc1yHAFBoH/H84BTXIYqM0/4F0pht+6+ucwCApHZJC6qV0tM7fSZiR/kXhOcHCxVd5x/jOgsANPxO0nHVSqnmOkjRcNq/ADw/GK1oTwCKH0CaHCPpEtchiojyL4ZLFG0DDABp8xnPD45yHaJoOO2fc43T/Xcr2mULANJotaSF1Uppk+sgRcHIP8cap/v/QxQ/gHSbI+nLrkMUCeWfb59RdE8tAKTdRzw/ON51iKLgtH9OeX6wQNHp/tGOowDAYL0g6ZBqpbTOdZC8Y+SfQ/1m91P8ALJkmqSvug5RBJR/Pn1anO4HkE0f8PzgGNch8o7T/jnj+cFhkv4oRv0AsusBSYez+E9yGPnniOcHo8TpfgDZN1/SR12HyDPKP18+LWmB6xAAEIPPeX6wt+sQeUX554TnBwcourUPAPJgorj3PzGUf358UazdDyBf3u/5wetdh8gjJvzlgOcHyxTtjgUAeRMqmvzX5zpInjDyzzjPD4ykr7jOAQAJOVTSx1yHyBvKP/veLWmx6xAAkKDPeX4wzXWIPKH8M8zzgzZJX3CdAwASNkGc4YwV5Z9tH5fkuQ4BAE3wXs8PjnMdIi+Y8JdRnh/sIWmNotthAKAI7pe0qFopUVwjxMg/uy4VxQ+gWBZKOtl1iDxg5J9Bnh/MU3T7yyjXWQCgyVYquvWP8hoBRv7Z9CVR/ACKaYEY/Y8YI/+M8fzgjZJucp0DABxaKUb/I8LIP3sqrgMAgGMLJL3dcYZMY+SfIYz6AeCvVklayOh/eBj5Z8vFrgMAQEocJukU1yGyipF/Rnh+cKikB1znAIAUYfQ/TIz8s4NRPwC8GqP/YWLknwGeH8yQ9Li4vQ8AtvWApAWM/oeGkX82nC+KHwAGMl/Sqa5DZA3ln3KeH0yR9CHXOQAgxf7RdYCsofzT7yOSxrsOAQApNt/zg2Ndh8gSyj/FPD8YK+ljrnMAQAac6zpAllD+6XampD1dhwCADDjV8wN+Xw4S5Z9Snh+0SLrAdQ4AyIgxks5yHSIrKP/0OlXSga5DAECGnNMYOGEn+EdKr4+6DgAAGbO/pLe6DpEFlH8KeX5woCRmrgLA0DHxbxAo/3T6oOsAAJBRJc8P9nMdIu0o/5Tx/KBV0Sx/AMDQtUr6sOsQaUf5p8/xkvZxHQIAMuxszw9YEn0HKP/04VYVABiZfSSd5DpEmlH+KeL5wVRJb3OdAwBy4DzXAdKM8k+Xd0sa7ToEAOTAmzw/mOU6RFpR/unyPtcBACAnjKT3uA6RVpR/Snh+cICko13nAIAceafrAGlF+acHo34AiNdhnh/McR0ijSj/9DjddQAAyCFG/wOg/FPA84PDJc1znQMAcuhdrgOkEeWfDnwyBYBkzOfU/2tR/ulwousAAJBjp7kOkDaUv2OeH+wjab7rHACQYye7DpA2lL97x7sOAAA5d6TnB3u7DpEmlL97J7gOAAA5Z8TS6a9C+TvU2HXqb1znAIAC4NR/P5S/W0dLmug6BAAUwJs8P9jFdYi0oPzd4pQ/ADTHWElvcR0iLSh/t5jsBwDNQ/k3UP6ONGaeLnCdAwAK5BjXAdKC8nfneEUzUAEAzXGI5weTXIdIA8rfHa73A0BzGUnLXIdIA8rfAc8PWsUtfgDgwutdB0gDyt+NoyRNdh0CAAqI6/6i/F1Z6joAABTUkZ4fjHUdwjXK341FrgMAQEGNkXSk6xCuUf5uHOE6AAAUWOGv+1P+Teb5wURJB7rOAQAFVvjr/pR/8y0S9/cDgEtLPT8odP8V+j/eEU75A4BbEyUd6jqES5R/8zHZDwDcK/R1f8q/+Rj5A4B7hV7pj/JvIs8PJks6wHUOAIAOcR3AJcq/uRj1A0A6zC7ypL/C/oc7QvkDQDq0SdrfdQhXKP/movwBID3muQ7gCuXfXMz0B4D0oPyRrMZkv5mucwAA/oryR+IKe20JAFKK8kfiPNcBAACvQvkjcZzyB4B02cPzg91dh3CB8m8ez3UAAMBrFHL0T/k3DyN/AEgfyh+J8lwHAAC8BuWPRDHyB4D0ofyRDM8PJkqa5DoHAOA1ZrsO4MKokR7AGPPDEbzcWmvPHmmGDGDUDwDptKfrAC4Ya+3IDmBMfQQvt9ba1hEFyADPD06SdKXrHACA17CSRlcrpZrrIM004pG/pDfGcIy881wHAAAMyEjaXdJLroM004jL31p7axxBco7T/gCQXnuoYOXPhL/m8FwHAABs1x6uAzRbHKf9B2SMGSWpJOkoRf+wd1lrf9h4bJ/Gz/5kre1LKkOKTHcdAACwXZR/HIwxx0j6qaT9FF1PsZJGS9p6Z8DRkv5H0jsl/TqJDCkzyXUAAMB2Fa78Yz/tb4w5WNJ1kqZJ+oakdyn6ANDf1ZI2SXpH3O+fUhNdBwAAbFfhyj+Jkf8lksZKOtFae4MkGfPq7rfW9hhj7pO0MIH3TyPKHwDSq3Dln8SEvzdKuntr8e/Ac5L2SeD9U8Xzg9GSxrnOAQDYrsJt65tE+U+S9Mwgnjde0TyAvGPUDwDpxsg/Bi9JOnAQzztIg/uQkHW7uQ4AANghyj8GN0laYIzZ7sp/xphTFH1AuDGB908byh8A0o3yj0FFUo+k5caY84wxe299wBgz2RhzlqQfSNoo6asJvH/acL0fANJtsusAzRZ7+VtrH5H03saxv6loYp+VVJb0sqTvS2qTdLq19sm43z+FxroOAADYoTGuAzRbIsv7WmuXSzpE0X3+j0jqVnQ24AlJ35M031p7VRLvnUKUPwCkW+53l91WYsv7WmufkvSJpI6fIZz2B4B0S6wL04qNfZLHyB8A0q1wI//Eyt8Y02aMeZ8x5jvGmCsbX98xxpxujClSIRbpvxUAssh4frDtMvS5ltTGPm+W9CNF6/tv+w96jqQvGWPOtNYW4Va/wp1OAoAMGiWp13WIZom9mIwxiyVdo2j25F2Sfiap2nh4pqI7AZZIutoYc6y19q64M6RMj+sAQJxaTXfX6H1+da9e+8EeyKz6lj1bol3oiyGJUek/KVq29zxr7fcGePwbxphzJH1X0uclvTWBDGnS7ToAEKeabdt1yvgHJ3e32sNcZwFiVHcdoJmSuOa/WNI92yl+SZK19t8k/VHRGYC82+w6ABAvYxZ2TC7C0twolprrAM2URPnXJa0ZxPPWKFr8J+8Y+SN3Dm2fPt5Yu851DiAmNiyHjPxH6G5J8wfxvPmN5+Yd5Y/c+W3f4umLu7tD1zmAmBRq1C8lU/6XSJptjPmcMeY1xzeRz0ma3Xhu3nHaH7mzys6a/Ym166fI2iKcvUP+Fa78RzzhzxjzgQF+/GNJn5V0hjHmV5Keavx8pqRTJXmK1vifq+iOgDxj5I8cMmZiz4TNk+v1le2trQtdpwFGiPIfhh9p4Gv3RlHJX9Dv8f63Bp0j6cOS/jOGDGlG+SOXbqot3HxOx+9rX9x9iusowEgV7vd0HOX/eRVj4t5wcdofubS8tmyvX6y/cdaXpkz+izVmqus8wAisdR2g2UZc/tbaS2PIkWeF+0SJYrjfHjhnlFXXGzZv/tOtu+xyrOs8wAgUrvzZ2Cd5jPyRS1YtLS9o99UXr+2YJWsLdZsUcofyR+wY+SO3bqkdtmlGX9++U2u1+1xnAUaA8o+LMeYYY8yXjTHLjTErjDE3DfC1Iqn3T4tqpdSnAs4kRTEsry2bKkl/397JyB9ZVrjyT2JjHyPpB5LKemV2v9WrZ/pv/XtRJgp2SZrkOgQQt3vs3LnWav3bN2xc9Pk9prxQN2aa60zAMBSu/JMY+Z8r6UxJ90r6G0m/bvx8rqQTFN0aWJf0ZUkHJPD+afS86wBAEupqaX1Rk1e3Sq1v3rR5tes8wDC97DpAsyVR/mdK2ijpBGvtCkWjXllrH7PWXm+tPUvRtr4XSlqQwPun0bOuAwBJubU2f4MkXbi2fY6s5RIXsoiRfwwOknSntXbrP6aVJGNM69YnWGt/qejMwIUJvH8aUf7IrSvr0XX/abXatGl9tXtc5wGGgfKP6Zj9/yE3Nb5P3uZ5j0k6NIH3TyO2P0Vu3VU/aK612iBJ/9De0bqz5wMpRPnH4DlJ+/T7+9Z1/bdd/3uOpL4E3j+NGPkjt2pqHfUXTXxUkk7cuOnwUdbyYRdZQ/nH4D5JB/c7zX+Dopn9XzLGzDPGTDDGXCRpkaT7E3j/NKL8kWu31+d3SVKL1HLCho2Pu84DDBHlH4OrJO0hqSRJ1tpVkn4u6TBJD0nqkFRRNOr/TALvn0aMhJBry2vLdt/650+2d7xO1va6zAMMwcawHBZuMbbYy99a+zNJ4yQF/X5clvRpSX+UtEbStZLeZK29O+73TylG/si139cPnmdtNL9naq0+dUZfHxP/kBVPuA7gQiIr/Flrt9h+t/xYa3uttRVr7RJr7Vxr7dustbcn8d5pVK2UOtW45RHIoz6NGr1Wuz269e+fXNcx1mUeYAgKuT4Fa/s3z3OuAwBJuqN+SOfWP79p0+YFo6190mUeYJAofySK6/7IteW1ZX+9nddI5u1dG552mQcYpEd3/pT8GfHa/saYkVwvsdbaWSPNkBFc90eu3VE/ZJ616jZGYyXpY+2dh14+YdduGcMlAKRZIUf+cWzs48VwjCKg/JFrPRrd1q4JK6eoa4EkTa7Xp8zq7b3j8TFjljmOBuxIIct/xKf9rbUtI/mK4z8iIwo5oxTFcmf94I7+f79gXcdujqIAg7E2LIeFu8df4pp/M61yHQBI2pW1ZZP6//31m7sPbavXH3MUB9iZQo76Jcq/mR6SxMInyLXb6vPnWast/X/2zq4NbGmNtKL8kaxqpdQj6U+ucwBJ2qIxYzs0/lWzp8/r6Fwgazdt7zWAQ5Q/mmKl6wBA0u6qH7yu/993q9uJB/X03ucqD7ADhbzNT6L8m60oGxmhwJbXlr5mkt9F69qnuMgC7AQjfzTFStcBgKTdWl8wz1r19P/Zkd1bDh5Xrz/sKhMwACupsJNRKf/mWqnof3BAbm1W2y7rtcsj2/789PVdL7vIA2zHI0XczW8ryr+JGhv8VF3nAJJ2d33eum1/9qGO9QtlLRtcIS3udB3AJcq/+bjuj9y7srZswrY/G2/trvO39Kx0EAcYCOWPplrpOgCQtJvqC+dZq75tf+6vbd/LRR5gAJQ/moqRP3Jvk8aO79K411z3P7SnZ86u9fqDLjIB/axVgW/zkyh/F1a6DgA0wz31uQNO8Duzc31Hk6MA2/pDWA4LPfma8m+yaqX0rCRmPSP3rqot3XWgn5c7uxYZazuaHAfor9Cn/CXK35U/uA4AJO3G+qK51qq27c/HWjvuiO4tD7jIBDRQ/q4DFNTNrgMASduocRM2auyA11U/tbZ932bnARr6JN3tOoRrlL8blD8K4b767L8M9PO5vb0HTKzV2OYaLqwKy2HhN5qi/N1YKek1i6AAeXNVfem47T324Y71G5uZBWgo/Cl/ifJ3olopWUm3uM4BJO362hFzrVV9oMfeu77rCGMtk1/RbJS/KH+XbnIdAEhal8ZP3KS2Aa/7j5HGLN3c/VCzM6HwKH9R/i6tcB0AaIaV9QNf2t5jn1rX7snaQt9vjaZaE5bDp12HSAPK35FqpfSIpGdd5wCSdnX96LHbe2z/3r6Zu9fq9zUzDwrtStcB0oLyd+s3rgMASbuuduRca7e/lfW5HZ29zcyDQrvKdYC0oPzdovyRex2aMGmzxqze3uOndW04osXaF5uZCYW0VtIdrkOkBeXv1m8lMepB7j1gZ/15e4+NkkYdt2nzw83Mg0K6NiyHr1lxsqgof4eqlVKX+CSKArimtqRtR49ftK59tqwd8JZAICac8u+H8nePU//IvWtrR83e0XX/fftq0/eq1e5tZiYUyhZJ17kOkSaUv3vXug4AJG2dJu6+RaPX7Og5H23vbFYcFM/NYTnc4DpEmlD+jlUrpQcl/cl1DiBpod3/hR09ftKGjYtarX2uWXlQKJzy3wblnw7/5ToAkLRraktG7+jxFqnlLRs37fDsADBMlP82KP90+G9p+9dDgTwIaksO3NlzLljXMU/W9jUjDwrjvrAcckZpG5R/ClQrpaqY9Y+ce1mTpm6xox/f0XP2qtX22revdk+zMqEQWNVvAJR/enDqH7n3kJ250xHYx9s7xjQjCwqD8h8A5Z8el4sFf5BzQW3JqJ09560bNy0cZe1TzciD3HswLIerXIdII8o/JaqV0lpxHypyLqgtmbWz5xjJvG3DxiebkQe59wPXAdKK8k8XTv0j1/6sKXv12FE7LfaPr+s4RNb2NCMTcqtH0k9ch0gryj9drpLU5ToEkKSH7YydbmW9e72+h9fbx8Q/jMTysByudR0irSj/FKlWSpslXeE6B5Cka2uLB/V754L2jl2SzoJc45T/DlD+6cOpf+TaNbUlBwzmecdt2rxgjLVPJJ0HufSUpBtdh0gzyj99Vkja7vanQNY9p6nTem3roGbzn9q1YaeXCIAB/EdYDlk4bQco/5SpVko1ST93nQNI0qN2v6cH87yPtnfOl7XdSedBrtQl/dB1iLSj/NPpu2K5X+TYb2pHmcE8b2K9Pml2by9b/WIobgzL4TOuQ6Qd5Z9C1UrpUbHVL3Ls6vrR3mCfe9HajokJRkH+MNFvECj/9PqK6wBAUp62e+3ba1sHdT3/6O7uQ8bW648mnQm58LJYzndQKP+UqlZKN0u633UOIClr7PTqYJ/7nvUbXkwwCvLjJ2E5ZHGoQaD80+1rrgMASbmuduSgn/t3HZ0LZe2GBOMg++qSvuc6RFZQ/un2c0nPuw4BJOGq+tKZg33urtZOOKSnhzNh2JErwnLI5aFBovxTrFop9Ur6huscQBKetNP267Mtg/5we9Ha9j2SzIPM+xfXAbKE8k+/70na6DoEkITH7T6D3r3v8C09B+1Sr/8pyTzIrOvDcnif6xBZQvmnXLVSapf0I9c5gCTcUD+iPpTnf6Cza11SWZBpjPqHiPLPhq8rmswC5MqVtaUzhvL8szrXHy5rO5PKg0y6IyyHt7kOkTWUfwZUK6U1kq52nQOI2xq778yaNYPey2Kctbss3LJlVZKZkDlfcB0giyj/7GDRH+TSk3bakHbu+9TajmlJZUHmrArLYeA6RBZR/hlRrZRul3S36xxA3G6sL+obyvNf19Mze0Kt/kBSeZApjPqHifLPls+4DgDE7crasv2G+pqzOtd3JZEFmfKYpMtdh8gqyj9DqpXSbyXd4DoHEKdH7Iz9a9a8NJTXnLF+/SJjLTP/i+2LYTlkIvQwUf7Z8ymx3S9y5im71+NDeX6b1djF3VvCpPIg9Z6R9J+uQ2QZ5Z8x1UpppaT/dp0DiNNv64f3DvU1n1rbPqTbBJErXwzL4ZD/N4NXUP7Z9FlJW1yHAOJyZW3Z9KG+5sDe3v0n12orE4iDdHtYbOAzYpR/BlUrpaqkb7vOAcTlIbv/rLo1Lw/1ded0dG5OIg9S7YKwHA7pDhG8FuWfXf9XEiudITeesVPXDPU1716/4Qhj7V+SyINUui4sh79xHSIPKP+MqlZKayV90XUOIC4r6ocP+VLWaGn06zd3s9lPMfRJOt91iLyg/LPt65Kecx0CiMOVtWX7DOd1F69tP0DWcstX/n03LIcPuw6RF5R/hlUrpc2S/o/rHEAcVtkDDqxbM+R792f29e03tVZjO9d8axe/62JF+WffjyRx2hM5YMxzdvfHhvPKj3R0MvLPt8+F5ZBFnWJE+WdctVKqSfJd5wDicHN9YfdwXndK18ZFLda+EHcepMIjkr7lOkTeUP45UK2UrpZ0nescwEgtry3beziva5Va37Rp86Nx50EqXMitffGj/PPjPEmbXIcARuJ+e+Bsa4d3C+uF69rnytpa3Jng1PVs2ZsMyj8nGgv/XOo4BjAiVi0tz2v3YY3g9+mrTZtWq90bdyY4w619CaL88+Vrkla5DgGMxK21w4a9at8/rOvgd1p+XBaWQyYzJ4T/o+RItVLqk/RhScx8RmYtry3bc7ivPXHjpsNbrX02zjxw4jFJl7gOkWeUf85UK6U/ipmxyLB77Nw51mr9cF7bIrWcuGHTkLYHRupYSWeH5ZB9GxJE+efTpyVVXYcAhqOultYXNXnYM/c/2d5+sKxlu9fs+nZYDm93HSLvKP8cqlZKGySdregTNJA5t9YOG/adK1Nr9akz+vruiTMPmqYq1i1pCso/p6qV0k1iz2tk1BX1ZXuM5PWfWNcxNq4saKpzwnK4wXWIIqD88+0iSU+5DgEM1R/r8+Zaq2GXwJs3bV4w2ton48yExH07LIc3xnEgY8wtxhi7g6/j43ifLKP8c6zf6X8gU2pqHfUXTXpkuK83kjm5a8PTcWZColYrGqzE7VeSfjzAV+F3QzXWclk47zw/+I6kc13nAIbiK6O/c+s7Wm8/drivb29pWfeGGdPHy5i2OHMhdjVJy8JyeFdcBzTG3CLpWEn7W2urcR03Txj5F8P5kkLXIYChuKJ2zJSRvH5yvT5lVm8vK/6l37/EWfwYHMq/AKqV0mZJp0nqcp0FGKw/1A+aZ+3I9qu4YF3HrnHlQSLulfR51yGKiPIviGqltFpc/0eG9GnU6LXabdjX/SXp9Zu757fV62viyoRYdUk6PeEd+842xnzbGPNNY8w/GGNmJPhemUL5F0i1Urpc0jdc5wAG63f1Q0Z8tuqdXRuejyMLYndmWA6T3ob5s4p2PP17SZdJWmOMYdlgUf5FdKGku12HAAZjee2YySM9xnkdnYfJWra7TpcvhuXw1wke/zZJZ0iaJWkXSXMlfUbRToGfN8Z8PMH3zgRm+xeQ5wczJN0vaUQTqoCkjVZfz+q2D9SM0biRHOed++z9u0faxhwTVy6MyI2STgjLYa3Zb2yMeYuk6yV1SNrHWlvY/QMY+RdQtVJ6WtGnYj75IdV6NWrMOk0Y0XV/SbpoXTsfdNPhKUnvdVH8kmStvUHSPZImSVrsIkNaUP4FVa2UrpX0Bdc5gJ35ff11w9rhr7+jurccPK5eH/GHCIxIt6RTw3K41nGOxxrfpzlN4RjlX2z/KOlm1yGAHbmitmxiHMd53/quv8RxHAzbeWE5vM91CElb55FsdJrCMa75F5znB3spuv5f6E/BSK829XQ/0namMUYjWqlvozEblszc18qYCXFlw6B9NyyH57kOYYyZKulJSeMl7WetfdZxJGcY+RdctVJ6UdJ7FC2xCaTOFo0Z26FdR3zKfry1u87f0rMyhkgYmt9LatrsemPMUmPM240xrdv83JN0haLiv6rIxS9R/pBUrZRuU3QvLJBKf6gf1BHHcT61rn3POI6DQXtR0mlhOexp4nvOUVTyzxpjAmPMfxljfifpYUnLJD0k6cNNzJNKlD8kSdVK6ftimU2k1PLast3iOM78LT1zd63XH4rjWNipPknvCsthsxdZukvSdyQ9L+lISe+SdIiklZIukHSktfalJmdKHa7541U8P/i+pA+5zgH0N1ZbNj/c9sFWYzRmpMf67qTd7vjW5EnL4siF7bKSPhCWw5+6DoKBMfLHts6VdI3rEEB/3Wobt167xHKr3pmdXYfL2s44joXtOp/iTzfKH69SrZRqkt6t6NQZkBp31w9qj+M4Y60dd0T3llVxHAsD+kJYDr/uOgR2jPLHa1QrpU2S/lbSatdZgK2W15bFtj3vp9a1T4/rWHiVfw/L4addh8DOUf4YULVSelnS8ZL+7DoLIEk31xfMs1axbP86r6d31sRajdF/vK5QdNkQGUD5Y7uqldKTkk5UtO824NQmjR3fpXGxLdF7duf6Qq/wFrNb5HDNfgwd5Y8dqlZK90s6VVKv6yzAPfW5sa0Lf3pn1xHG2pfjOl6B3S/p5LAcbnEdBINH+WOnqpXSbyWdJXYBhGNX1paNj+tYY6QxSzd3PxjX8QpqjaTjw3I44s2X0FyUPwalWin9VNInXedAsf22fvhca+Nbivride37i8VOhusFSW8Jy2HhF8zJIsofg1atlC6T9BFxBgCObNS4CRs19tG4jndAb9/M3ev1++M6XoG0S3prWA6fdB0Ew0P5Y0iqldJ3JJ0pNgKCI/fW58S6Ne+57Z3NXHc+D16UdFxYDkPXQTB8lD+GrFop/aek94pJgHDgqtrSXeI83mldG45osfbFOI+ZY09Len1YDh9wHQQjQ/ljWKqV0uWK7gJghi+a6ob6ojnWqh7X8UZJo47dtDm2WwhzbLWi4n/MdRCMHOWPYatWStdIepukTa6zoDi6NH7iJrXFdt1fki5e136grI3tA0UOrVJU/E+7DoJ4UP4YkWqldKOilQBZCAhNs7J+YKwzzPftq03fq1a7N85j5sjvFV3jZ1Z/jlD+GLFqpXS7pDcpmgEMJO7K+tJxcR/z79vZ6G8AKyT9TVgOO1wHQbwof8SiWin9UdIbJcU6ExsYyPW1I2O97i9JJ2/YuKjV2ufiPGbGXSmpFJZDlkHOIcofsalWSqskvUHS866zIN86teukzRoT68SzFqnlLRs3MZkt8lNJp7Fkb35R/ohVtVJ6RNJSSSybikQ9YGfFfnve+es65snaWHYOzLDvSPpAWA6L/u+Qa5Q/YletlJ5S9AHgWtdZkF9X1Y4eE/cx967V9p7eV9iJfzVJF4bl8CNhOWQVz5yj/JGIaqXUJekkSZe5zoJ8+k3tqDnWxr/U9MfbO0bFfcwMaJd0YlgOv+I6CJrDsKcFkub5wd9J+qakIv5SRYIeaSs/Ntb0zo7zmFayh3v7PdNnzIw4j5tiDynakvdx10HQPIz8kbhqpfQ9RWsBcCsgYhXaA16I+5hGMn+7YWNRNqxZLmkJxV88lD+aoloprZB0hCTWBEdsrq4tGZ3EcT+xruN1sjbPG/5YSZ+XdGpYDje4DoPmo/zRNNVK6QlJR0v6hessyIdra4sPTOK4u9fre3i9fXmd+LdB0jvCcvh/mNhXXJQ/mqpaKW2qVkrvkXSx2BYYI/SyJk3dYkcncsr6/PaOWHcPTIknJB0dlsMrXAeBW5Q/nKhWSl+WdIKkda6zINsesjMTWVTqjZs2HzbG2ieSOLYjKyQdGZZD1uAA5Q93GpsCLZB0k+MoyLBrake3JnXsU7o2PJPUsZuoLumLkt4alkM+bEMSt/ohBTw/MJI+KelfJLU5joOM2UvrXrpr7Ef3TOLYnS0tHcfMmD5WxoxN4vhNUFW0Wt/troMgXRj5w7lqpWSrldJXFd0NsMp1HmTLi5qyZ48dlciteRPr9Umze3uzOvHvR5LmU/wYCOWP1KhWSg9KOkrSl6R4d2xDvj1sZzyb1LEvXNuxW1LHTsjLim7h+2BYDrtch0E6Uf5IlWql1FOtlD6laHvgp1znQTYEtSWJ/S5b2t196Nh6fXVSx49ZIOkQZvNjZyh/pFK1UrpN0nxJP3GdBel3TW3JAUke/91dG/6c5PFjsFHSuWE5/NuwHMa+2yHyhwl/SD3PD06T9D1JU1xnQXo91nbGU6NNbWYSx+4yZv3Smfu2ypjxSRx/hP4g6YywHK5xHQTZwcgfqVetlH4p6VBJ17vOgvR61O6X2G15E6zd7XU9Pfcldfxh6pP0j5KOofgxVJQ/MqFaKT1frZSOl3SGpNg3c0H2XVtbbJI8/sVr2/dI8vhDdKukw8Ny+E9hOWSlTAwZp/2ROZ4fTJB0iaRPSEpkYxdkz37mpedub/vE9CTfY/HMff+0qaXl4CTfYyeelXRhWA7ZHwMjwsgfmVOtlLqqldLFii4FXOc6D9LhGbvn9F7bmtgtf5L0/s4uVyvkbVG0CNY8ih9xYOSPzPP84CRJX5OU6IxvpN+1Y/w7Dm55ellSx99kzMbFM/ftkzETk3qPAQSSPsF1fcSJkT8yr1opXSXpYEmflbTJcRw4dF3tqERHM7tYO37hli3NWoVyjaS/bdy+R/EjVoz8kSueH+wn6f9JepfrLGg+z7zwzC1tF+yX5Hs8OGbMY++dvvfsBN9io6JT/F8Jy+GWBN8HBUb5I5c8PzhO0r8qmheAAlnT9v7nR5n6Pkm+x9IZ+4ZdrS1J/G/rF4om9CU6dwHgtD9yqVop3SJpoaQPSnrMbRo00+N2n0Q2+envrM7162M8nJV0haSFYTl8D8WPZmDkj9zz/KBV0rslfUbR3ADk2CdHXX77x0dd8fok32OLUfeRM/fbbI2ZPILDWEm/kvRPYTl8IKZowKBQ/igMzw9aJL1D0cTA+Y7jICGzzHNPrWi7KJFlfvv70N573nrXuLHHDuOldUmXKyr9h2KOBQwK5Y/C8fzASDpJ0UJBixzHQQIeb3v/C62mPi3J91gzevSTp+w7bf8hvKQu6eeS/jkshw8nFAsYFMofheb5wQmKPgQc7ToL4nPjmAvvmN3yfGL3+2/1hhnT729vbV24k6fVJP1MUek/mnQmYDCY8IdCq1ZKv6lWSkslvVnReunIgRvqR9Sb8T4f7ljfvYOHeyX9WNJBYTk8g+JHmjDyB/rx/OAYSR+TdIrYNyCz5pqnn7y+zR/KKflh6ZV6F3n7dVhjpvb78fOS/k3Sv4XlkE2okEqUPzAAzw/2lnS2pHMkzXAcB8PweNvpL7Uau2fS7/P3e0295bZdxh2n6MzRtyRdEZbDvqTfFxgJyh/YgcZtgiVJ50l6q6REt41FfG4ac8GdB7S8sDTht+l4dMzo7542fdp/heXwwYTfC4gN5Q8MkucHMyWd2fjyXGbBzv3vUf9129+NCt6Q0OFvl/R9Sb/UpZ2bE3oPIDGUPzBEjVsF36ho9cB3SBrnNhEGcrCpPn5t26dnxXjIP0v6iaR/16Wdq2M8LtB0lD8wAp4fTFS0idA7FH0gGOM2Efp7ou30l1uM3WMEh3hJ0Sp8/yPpNl3a2ZS7CICkUf5ATDw/2E3SCZJOlnSipGbu+Y4B3DLmk7/3Wl4c6hoOLytaa/8Xkm7RpZ21+JMBblH+QAI8Pxgt6VhFHwROlpToNrMY2CWjfnLr2aN+M5gleJ+QdKWk5ZLuoPCRd5Q/0ASeHyzUKx8EFrhNUxzzzeOPXdV2yewBHqpLulfS1ZKW69LOsLnJALcof6DJGncNnCzpLYqWFZ7iNlGeWftE2/vbW4ydIulhSSsk3aTodH6722yAO5Q/4FDjzoG5kpb2+5on1hMYKSvpT5J+99+j//nGpa1/ulOXdrLaHtBA+QMp4/nBZElL9MqHgaMk7eo0VPptkfRHSb9rfN1ZrZQY2QPbQfkDKddYZXC+og8CR0s6WNJsFfMDQa+k1YpG9Q/1+/5YtVLqdRkMyBLKH8gozw+mKfoQMKfxfeufZ0ka6zBaHHokPaao2LctedbNB0aI8gdyxvODFkn76tUfCg6UtLekqY2v8c4CSu2SXlS0gM5L/f78oqJV9FYrxSVvjNlF0WTNt0k6RtJMSTVJaxQtCPRVa+0GdwmBnaP8gQLy/GCcpD0UfRDYQ9IkSbsN8LWrosmHNUW3xw30faCf9Ulap9cW/EtZPz1vjPmQonX9pegOggcV/VstlTRB0iOSjrXWvuQmIbBzlD8ADIExpqyo6L9urX2438+nSQokLZT0M2vt+xxFBHaK8geAmBhjjpZ0p6K7D3az1vY4jgQMqMV1AADIkVWN722SdncZBNgRyh8A4nNA43uvojkPQCpR/gAQn483vl9nrd3iNAmwA1zzB4AYGGNOlHSNojsdjrTWrtrJSwBnGPkDwAgZY+ZJ+qmi2yIvoviRdpQ/AIyAMWa6pOskTVa0wM9ljiMBO8VpfwAYJmPMFEm3K9pv4T8knW35pYoMoPwBYBiMMbtKWqFo18VfS3qXtbbmNhUwOJz2B4AhMsa0SbpSUfFfL+m9FD+yhPIHgCEwxrRK+pmk/6XolP+prOSHrBnlOgAAZMxHJZ3S+PPLkr5tjBnoeRdaa19uWipgCCh/ABiayf3+fMp2nyVdqujDAZA6TPgDAKBguOYPAEDBUP4AABQM5Q8AQMFQ/gAAFAzlDwBAwVD+AAAUDOUPAEDBUP4AABQM5Q8AQMFQ/gAAFAzlDwBAwVD+AAAUDOUPAEDBUP4AABQM5Q8AQMFQ/gAAFAzlDwBAwVD+AAAUDOUPAEDBUP4AABQM5Q8AQMFQ/gAAFAzlDwBAwVD+AAAUDOUPAEDBUP4AABQM5Q8AQMFQ/gAAFAzlDwBAwVD+AAAUDOUPAEDBUP4AABQM5Q8AQMFQ/gAAFAzlDwBAwVD+AAAUDOUPAEDBUP4AABQM5Q8AQMFQ/gAAFAzlDwBAwVD+AAAUDOUPAEDBUP4AABQM5Q8AQMH8f8TljZcbqcRxAAAAAElFTkSuQmCC\n", 576 | "text/plain": [ 577 | "
" 578 | ] 579 | }, 580 | "metadata": {}, 581 | "output_type": "display_data" 582 | } 583 | ], 584 | "source": [ 585 | "plt.figure(figsize=(4,4),dpi=150)\n", 586 | "train['label'].value_counts().sort_index().plot(kind = 'pie')" 587 | ] 588 | }, 589 | { 590 | "cell_type": "markdown", 591 | "metadata": {}, 592 | "source": [ 593 | "## 2.2 测试集探索" 594 | ] 595 | }, 596 | { 597 | "cell_type": "markdown", 598 | "metadata": {}, 599 | "source": [ 600 | "### 2.2.1 数据信息" 601 | ] 602 | }, 603 | { 604 | "cell_type": "code", 605 | "execution_count": 18, 606 | "metadata": {}, 607 | "outputs": [ 608 | { 609 | "data": { 610 | "text/html": [ 611 | "
\n", 612 | "\n", 625 | "\n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | "
file_idapitidindex
01RegOpenKeyExA2332.00.0
11CopyFileA2332.01.0
21OpenSCManagerA2332.02.0
31CreateServiceA2332.03.0
41RegOpenKeyExA2468.00.0
\n", 673 | "
" 674 | ], 675 | "text/plain": [ 676 | " file_id api tid index\n", 677 | "0 1 RegOpenKeyExA 2332.0 0.0\n", 678 | "1 1 CopyFileA 2332.0 1.0\n", 679 | "2 1 OpenSCManagerA 2332.0 2.0\n", 680 | "3 1 CreateServiceA 2332.0 3.0\n", 681 | "4 1 RegOpenKeyExA 2468.0 0.0" 682 | ] 683 | }, 684 | "execution_count": 18, 685 | "metadata": {}, 686 | "output_type": "execute_result" 687 | } 688 | ], 689 | "source": [ 690 | "test.head()" 691 | ] 692 | }, 693 | { 694 | "cell_type": "code", 695 | "execution_count": 19, 696 | "metadata": {}, 697 | "outputs": [ 698 | { 699 | "name": "stdout", 700 | "output_type": "stream", 701 | "text": [ 702 | "\n", 703 | "RangeIndex: 39173 entries, 0 to 39172\n", 704 | "Data columns (total 4 columns):\n", 705 | "file_id 39173 non-null int64\n", 706 | "api 39173 non-null object\n", 707 | "tid 39172 non-null float64\n", 708 | "index 39172 non-null float64\n", 709 | "dtypes: float64(2), int64(1), object(1)\n", 710 | "memory usage: 1.2+ MB\n" 711 | ] 712 | } 713 | ], 714 | "source": [ 715 | "test.info()" 716 | ] 717 | }, 718 | { 719 | "cell_type": "markdown", 720 | "metadata": {}, 721 | "source": [ 722 | "### 2.2.2 缺失值探索" 723 | ] 724 | }, 725 | { 726 | "cell_type": "code", 727 | "execution_count": 20, 728 | "metadata": {}, 729 | "outputs": [ 730 | { 731 | "data": { 732 | "text/plain": [ 733 | "file_id 0\n", 734 | "api 0\n", 735 | "tid 1\n", 736 | "index 1\n", 737 | "dtype: int64" 738 | ] 739 | }, 740 | "execution_count": 20, 741 | "metadata": {}, 742 | "output_type": "execute_result" 743 | } 744 | ], 745 | "source": [ 746 | "test.isnull().sum()" 747 | ] 748 | }, 749 | { 750 | "cell_type": "markdown", 751 | "metadata": {}, 752 | "source": [ 753 | "### 2.2.3 数据分布探索" 754 | ] 755 | }, 756 | { 757 | "cell_type": "code", 758 | "execution_count": 21, 759 | "metadata": {}, 760 | "outputs": [ 761 | { 762 | "data": { 763 | "text/plain": [ 764 | "file_id 10\n", 765 | "api 146\n", 766 | "tid 125\n", 767 | "index 5001\n", 768 | "dtype: int64" 769 | ] 770 | }, 771 | "execution_count": 21, 772 | "metadata": {}, 773 | "output_type": "execute_result" 774 | } 775 | ], 776 | "source": [ 777 | "test.nunique()" 778 | ] 779 | }, 780 | { 781 | "cell_type": "markdown", 782 | "metadata": {}, 783 | "source": [ 784 | "### 2.2.4 奇异值探索" 785 | ] 786 | }, 787 | { 788 | "cell_type": "code", 789 | "execution_count": 22, 790 | "metadata": {}, 791 | "outputs": [ 792 | { 793 | "data": { 794 | "text/plain": [ 795 | "count 39172.000000\n", 796 | "mean 1729.569284\n", 797 | "std 1486.018402\n", 798 | "min 0.000000\n", 799 | "25% 405.750000\n", 800 | "50% 1342.000000\n", 801 | "75% 2876.000000\n", 802 | "max 5000.000000\n", 803 | "Name: index, dtype: float64" 804 | ] 805 | }, 806 | "execution_count": 22, 807 | "metadata": {}, 808 | "output_type": "execute_result" 809 | } 810 | ], 811 | "source": [ 812 | "test['index'].describe()" 813 | ] 814 | }, 815 | { 816 | "cell_type": "code", 817 | "execution_count": 23, 818 | "metadata": { 819 | "scrolled": true 820 | }, 821 | "outputs": [ 822 | { 823 | "data": { 824 | "text/plain": [ 825 | "count 39172.000000\n", 826 | "mean 2158.769938\n", 827 | "std 464.152821\n", 828 | "min 504.000000\n", 829 | "25% 2092.000000\n", 830 | "50% 2224.000000\n", 831 | "75% 2500.000000\n", 832 | "max 2920.000000\n", 833 | "Name: tid, dtype: float64" 834 | ] 835 | }, 836 | "execution_count": 23, 837 | "metadata": {}, 838 | "output_type": "execute_result" 839 | } 840 | ], 841 | "source": [ 842 | "test['tid'].describe()" 843 | ] 844 | }, 845 | { 846 | "cell_type": "markdown", 847 | "metadata": {}, 848 | "source": [ 849 | "## 2.3 数据集联合分析" 850 | ] 851 | }, 852 | { 853 | "cell_type": "markdown", 854 | "metadata": {}, 855 | "source": [ 856 | "### 2.3.1 file_id分析" 857 | ] 858 | }, 859 | { 860 | "cell_type": "code", 861 | "execution_count": 24, 862 | "metadata": {}, 863 | "outputs": [], 864 | "source": [ 865 | "train_fileids = train['file_id'].unique()\n", 866 | "test_fileids = test['file_id'].unique()" 867 | ] 868 | }, 869 | { 870 | "cell_type": "code", 871 | "execution_count": 25, 872 | "metadata": {}, 873 | "outputs": [ 874 | { 875 | "data": { 876 | "text/plain": [ 877 | "0" 878 | ] 879 | }, 880 | "execution_count": 25, 881 | "metadata": {}, 882 | "output_type": "execute_result" 883 | } 884 | ], 885 | "source": [ 886 | "len(set(train_fileids)-set(test_fileids)) " 887 | ] 888 | }, 889 | { 890 | "cell_type": "code", 891 | "execution_count": 26, 892 | "metadata": {}, 893 | "outputs": [ 894 | { 895 | "data": { 896 | "text/plain": [ 897 | "1" 898 | ] 899 | }, 900 | "execution_count": 26, 901 | "metadata": {}, 902 | "output_type": "execute_result" 903 | } 904 | ], 905 | "source": [ 906 | "len(set(test_fileids)-set(train_fileids)) " 907 | ] 908 | }, 909 | { 910 | "cell_type": "markdown", 911 | "metadata": {}, 912 | "source": [ 913 | "### 2.3.2 API分析" 914 | ] 915 | }, 916 | { 917 | "cell_type": "code", 918 | "execution_count": 27, 919 | "metadata": {}, 920 | "outputs": [], 921 | "source": [ 922 | "train_apis = train['api'].unique()\n", 923 | "test_apis = test['api'].unique()" 924 | ] 925 | }, 926 | { 927 | "cell_type": "code", 928 | "execution_count": 28, 929 | "metadata": { 930 | "scrolled": true 931 | }, 932 | "outputs": [ 933 | { 934 | "data": { 935 | "text/plain": [ 936 | "{'CertCreateCertificateContext',\n", 937 | " 'CertOpenSystemStoreA',\n", 938 | " 'CoInitializeSecurity',\n", 939 | " 'CreateServiceA',\n", 940 | " 'CryptAcquireContextW',\n", 941 | " 'FindWindowA',\n", 942 | " 'FindWindowExW',\n", 943 | " 'GetComputerNameA',\n", 944 | " 'GetFileVersionInfoSizeW',\n", 945 | " 'GetFileVersionInfoW',\n", 946 | " 'IWbemServices_ExecQuery',\n", 947 | " 'LookupAccountSidW',\n", 948 | " 'LookupPrivilegeValueW',\n", 949 | " 'OpenServiceW',\n", 950 | " 'OutputDebugStringA',\n", 951 | " 'R',\n", 952 | " 'SendNotifyMessageW',\n", 953 | " 'SetStdHandle',\n", 954 | " 'StartServiceA',\n", 955 | " 'StartServiceW',\n", 956 | " 'UnhookWindowsHookEx',\n", 957 | " 'connect',\n", 958 | " 'timeGetTime'}" 959 | ] 960 | }, 961 | "execution_count": 28, 962 | "metadata": {}, 963 | "output_type": "execute_result" 964 | } 965 | ], 966 | "source": [ 967 | "set(test_apis)-set(train_apis)" 968 | ] 969 | }, 970 | { 971 | "cell_type": "code", 972 | "execution_count": 29, 973 | "metadata": {}, 974 | "outputs": [ 975 | { 976 | "data": { 977 | "text/plain": [ 978 | "{'CertControlStore',\n", 979 | " 'CryptAcquireContextA',\n", 980 | " 'CryptCreateHash',\n", 981 | " 'CryptExportKey',\n", 982 | " 'CryptHashData',\n", 983 | " 'DeviceIoControl',\n", 984 | " 'DrawTextExA',\n", 985 | " 'EncryptMessage',\n", 986 | " 'EnumServicesStatusW',\n", 987 | " 'FindResourceExA',\n", 988 | " 'GetAdaptersAddresses',\n", 989 | " 'GetAddrInfoW',\n", 990 | " 'GetAsyncKeyState',\n", 991 | " 'GetBestInterfaceEx',\n", 992 | " 'GetFileInformationByHandle',\n", 993 | " 'GetFileVersionInfoExW',\n", 994 | " 'GetFileVersionInfoSizeExW',\n", 995 | " 'GetUserNameA',\n", 996 | " 'GetVolumePathNameW',\n", 997 | " 'GlobalMemoryStatus',\n", 998 | " 'HttpOpenRequestA',\n", 999 | " 'InternetConnectA',\n", 1000 | " 'InternetOpenA',\n", 1001 | " 'IsDebuggerPresent',\n", 1002 | " 'Module32FirstW',\n", 1003 | " 'Module32NextW',\n", 1004 | " 'NtDeleteValueKey',\n", 1005 | " 'NtReadVirtualMemory',\n", 1006 | " 'OpenServiceA',\n", 1007 | " 'ReadProcessMemory',\n", 1008 | " 'RegEnumKeyExA',\n", 1009 | " 'RegEnumValueA',\n", 1010 | " 'RtlAddVectoredContinueHandler',\n", 1011 | " 'RtlAddVectoredExceptionHandler',\n", 1012 | " 'RtlRemoveVectoredExceptionHandler',\n", 1013 | " 'SetFileAttributesW',\n", 1014 | " 'SetFileTime',\n", 1015 | " 'SetWindowsHookExA',\n", 1016 | " 'Thread32First',\n", 1017 | " 'Thread32Next',\n", 1018 | " 'WriteConsoleA',\n", 1019 | " 'bind',\n", 1020 | " 'listen'}" 1021 | ] 1022 | }, 1023 | "execution_count": 29, 1024 | "metadata": {}, 1025 | "output_type": "execute_result" 1026 | } 1027 | ], 1028 | "source": [ 1029 | "set(train_apis) - set(test_apis)" 1030 | ] 1031 | } 1032 | ], 1033 | "metadata": { 1034 | "kernelspec": { 1035 | "display_name": "Python 3", 1036 | "language": "python", 1037 | "name": "python3" 1038 | }, 1039 | "language_info": { 1040 | "codemirror_mode": { 1041 | "name": "ipython", 1042 | "version": 3 1043 | }, 1044 | "file_extension": ".py", 1045 | "mimetype": "text/x-python", 1046 | "name": "python", 1047 | "nbconvert_exporter": "python", 1048 | "pygments_lexer": "ipython3", 1049 | "version": "3.6.2" 1050 | }, 1051 | "latex_envs": { 1052 | "LaTeX_envs_menu_present": true, 1053 | "autoclose": false, 1054 | "autocomplete": true, 1055 | "bibliofile": "biblio.bib", 1056 | "cite_by": "apalike", 1057 | "current_citInitial": 1, 1058 | "eqLabelWithNumbers": true, 1059 | "eqNumInitial": 1, 1060 | "hotkeys": { 1061 | "equation": "Ctrl-E", 1062 | "itemize": "Ctrl-I" 1063 | }, 1064 | "labels_anchors": false, 1065 | "latex_user_defs": false, 1066 | "report_style_numbering": false, 1067 | "user_envs_cfg": false 1068 | }, 1069 | "tianchi_metadata": { 1070 | "competitions": [], 1071 | "datasets": [], 1072 | "description": "", 1073 | "notebookId": "116127", 1074 | "source": "dsw" 1075 | }, 1076 | "toc": { 1077 | "nav_menu": {}, 1078 | "number_sections": true, 1079 | "sideBar": true, 1080 | "skip_h1_title": false, 1081 | "title_cell": "Table of Contents", 1082 | "title_sidebar": "Contents", 1083 | "toc_cell": true, 1084 | "toc_position": { 1085 | "height": "calc(100% - 180px)", 1086 | "left": "10px", 1087 | "top": "150px", 1088 | "width": "384px" 1089 | }, 1090 | "toc_section_display": true, 1091 | "toc_window_display": true 1092 | } 1093 | }, 1094 | "nbformat": 4, 1095 | "nbformat_minor": 4 1096 | } 1097 | -------------------------------------------------------------------------------- /阿里云安全恶意程序检测/阿里云安全恶意程序检测-特征工程与基线模型.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## 第三节:特征工程与基线模型" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 2, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import numpy as np\n", 17 | "import pandas as pd\n", 18 | "from tqdm import tqdm \n", 19 | "\n", 20 | "class _Data_Preprocess:\n", 21 | " def __init__(self):\n", 22 | " self.int8_max = np.iinfo(np.int8).max\n", 23 | " self.int8_min = np.iinfo(np.int8).min\n", 24 | "\n", 25 | " self.int16_max = np.iinfo(np.int16).max\n", 26 | " self.int16_min = np.iinfo(np.int16).min\n", 27 | "\n", 28 | " self.int32_max = np.iinfo(np.int32).max\n", 29 | " self.int32_min = np.iinfo(np.int32).min\n", 30 | "\n", 31 | " self.int64_max = np.iinfo(np.int64).max\n", 32 | " self.int64_min = np.iinfo(np.int64).min\n", 33 | "\n", 34 | " self.float16_max = np.finfo(np.float16).max\n", 35 | " self.float16_min = np.finfo(np.float16).min\n", 36 | "\n", 37 | " self.float32_max = np.finfo(np.float32).max\n", 38 | " self.float32_min = np.finfo(np.float32).min\n", 39 | "\n", 40 | " self.float64_max = np.finfo(np.float64).max\n", 41 | " self.float64_min = np.finfo(np.float64).min\n", 42 | "\n", 43 | " def _get_type(self, min_val, max_val, types):\n", 44 | " if types == 'int':\n", 45 | " if max_val <= self.int8_max and min_val >= self.int8_min:\n", 46 | " return np.int8\n", 47 | " elif max_val <= self.int16_max <= max_val and min_val >= self.int16_min:\n", 48 | " return np.int16\n", 49 | " elif max_val <= self.int32_max and min_val >= self.int32_min:\n", 50 | " return np.int32\n", 51 | " return None\n", 52 | "\n", 53 | " elif types == 'float':\n", 54 | " if max_val <= self.float16_max and min_val >= self.float16_min:\n", 55 | " return np.float16\n", 56 | " if max_val <= self.float32_max and min_val >= self.float32_min:\n", 57 | " return np.float32\n", 58 | " if max_val <= self.float64_max and min_val >= self.float64_min:\n", 59 | " return np.float64\n", 60 | " return None\n", 61 | "\n", 62 | " def _memory_process(self, df):\n", 63 | " init_memory = df.memory_usage().sum() / 1024 ** 2 / 1024\n", 64 | " print('Original data occupies {} GB memory.'.format(init_memory))\n", 65 | " df_cols = df.columns\n", 66 | "\n", 67 | " \n", 68 | " for col in tqdm_notebook(df_cols):\n", 69 | " try:\n", 70 | " if 'float' in str(df[col].dtypes):\n", 71 | " max_val = df[col].max()\n", 72 | " min_val = df[col].min()\n", 73 | " trans_types = self._get_type(min_val, max_val, 'float')\n", 74 | " if trans_types is not None:\n", 75 | " df[col] = df[col].astype(trans_types)\n", 76 | " elif 'int' in str(df[col].dtypes):\n", 77 | " max_val = df[col].max()\n", 78 | " min_val = df[col].min()\n", 79 | " trans_types = self._get_type(min_val, max_val, 'int')\n", 80 | " if trans_types is not None:\n", 81 | " df[col] = df[col].astype(trans_types)\n", 82 | " except:\n", 83 | " print(' Can not do any process for column, {}.'.format(col)) \n", 84 | " afterprocess_memory = df.memory_usage().sum() / 1024 ** 2 / 1024\n", 85 | " print('After processing, the data occupies {} GB memory.'.format(afterprocess_memory))\n", 86 | " return df" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "## 3.3 基线模型" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "### 3.3.1 数据读取" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 1, 106 | "metadata": {}, 107 | "outputs": [], 108 | "source": [ 109 | "import pandas as pd\n", 110 | "import numpy as np\n", 111 | "import seaborn as sns\n", 112 | "import matplotlib.pyplot as plt\n", 113 | "\n", 114 | "import lightgbm as lgb\n", 115 | "from sklearn.model_selection import train_test_split\n", 116 | "from sklearn.preprocessing import OneHotEncoder\n", 117 | "\n", 118 | "import warnings\n", 119 | "warnings.filterwarnings('ignore')\n", 120 | "%matplotlib inline" 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": {}, 126 | "source": [ 127 | "### 数据读取" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": 3, 133 | "metadata": {}, 134 | "outputs": [], 135 | "source": [ 136 | "path = '../security_data/'\n", 137 | "train = pd.read_csv(path + 'security_train.csv')\n", 138 | "test = pd.read_csv(path + 'security_test.csv')" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 4, 144 | "metadata": {}, 145 | "outputs": [ 146 | { 147 | "data": { 148 | "text/html": [ 149 | "
\n", 150 | "\n", 163 | "\n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | "
file_idlabelapitidindex
015LdrLoadDll24880
115LdrGetProcedureAddress24881
215LdrGetProcedureAddress24882
315LdrGetProcedureAddress24883
415LdrGetProcedureAddress24884
\n", 217 | "
" 218 | ], 219 | "text/plain": [ 220 | " file_id label api tid index\n", 221 | "0 1 5 LdrLoadDll 2488 0\n", 222 | "1 1 5 LdrGetProcedureAddress 2488 1\n", 223 | "2 1 5 LdrGetProcedureAddress 2488 2\n", 224 | "3 1 5 LdrGetProcedureAddress 2488 3\n", 225 | "4 1 5 LdrGetProcedureAddress 2488 4" 226 | ] 227 | }, 228 | "execution_count": 4, 229 | "metadata": {}, 230 | "output_type": "execute_result" 231 | } 232 | ], 233 | "source": [ 234 | "train.head()" 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | "### 3.3.2 特征工程 " 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": 5, 247 | "metadata": {}, 248 | "outputs": [], 249 | "source": [ 250 | "def simple_sts_features(df):\n", 251 | " simple_fea = pd.DataFrame()\n", 252 | " simple_fea['file_id'] = df['file_id'].unique()\n", 253 | " simple_fea = simple_fea.sort_values('file_id')\n", 254 | " \n", 255 | " df_grp = df.groupby('file_id')\n", 256 | " simple_fea['file_id_api_count'] = df_grp['api'].count().values\n", 257 | " simple_fea['file_id_api_nunique'] = df_grp['api'].nunique().values\n", 258 | " \n", 259 | " simple_fea['file_id_tid_count'] = df_grp['tid'].count().values\n", 260 | " simple_fea['file_id_tid_nunique'] = df_grp['tid'].nunique().values\n", 261 | " \n", 262 | " simple_fea['file_id_index_count'] = df_grp['index'].count().values\n", 263 | " simple_fea['file_id_index_nunique'] = df_grp['index'].nunique().values\n", 264 | " \n", 265 | " return simple_fea" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": 6, 271 | "metadata": {}, 272 | "outputs": [ 273 | { 274 | "name": "stdout", 275 | "output_type": "stream", 276 | "text": [ 277 | "Wall time: 1.4 s\n" 278 | ] 279 | } 280 | ], 281 | "source": [ 282 | "%%time\n", 283 | "simple_train_fea1 = simple_sts_features(train)" 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": 7, 289 | "metadata": {}, 290 | "outputs": [ 291 | { 292 | "name": "stdout", 293 | "output_type": "stream", 294 | "text": [ 295 | "Wall time: 23.9 ms\n" 296 | ] 297 | } 298 | ], 299 | "source": [ 300 | "%%time\n", 301 | "simple_test_fea1 = simple_sts_features(test)" 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": 8, 307 | "metadata": {}, 308 | "outputs": [], 309 | "source": [ 310 | "def simple_numerical_sts_features(df):\n", 311 | " simple_numerical_fea = pd.DataFrame()\n", 312 | " simple_numerical_fea['file_id'] = df['file_id'].unique()\n", 313 | " simple_numerical_fea = simple_numerical_fea.sort_values('file_id')\n", 314 | " \n", 315 | " df_grp = df.groupby('file_id')\n", 316 | " \n", 317 | " simple_numerical_fea['file_id_tid_mean'] = df_grp['tid'].mean().values\n", 318 | " simple_numerical_fea['file_id_tid_min'] = df_grp['tid'].min().values\n", 319 | " simple_numerical_fea['file_id_tid_std'] = df_grp['tid'].std().values\n", 320 | " simple_numerical_fea['file_id_tid_max'] = df_grp['tid'].max().values\n", 321 | " \n", 322 | " \n", 323 | " simple_numerical_fea['file_id_index_mean']= df_grp['index'].mean().values\n", 324 | " simple_numerical_fea['file_id_index_min'] = df_grp['index'].min().values\n", 325 | " simple_numerical_fea['file_id_index_std'] = df_grp['index'].std().values\n", 326 | " simple_numerical_fea['file_id_index_max'] = df_grp['index'].max().values\n", 327 | " \n", 328 | " return simple_numerical_fea" 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": 9, 334 | "metadata": { 335 | "scrolled": true 336 | }, 337 | "outputs": [ 338 | { 339 | "name": "stdout", 340 | "output_type": "stream", 341 | "text": [ 342 | "Wall time: 172 ms\n" 343 | ] 344 | } 345 | ], 346 | "source": [ 347 | "%%time\n", 348 | "simple_train_fea2 = simple_numerical_sts_features(train)" 349 | ] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "execution_count": 10, 354 | "metadata": {}, 355 | "outputs": [ 356 | { 357 | "name": "stdout", 358 | "output_type": "stream", 359 | "text": [ 360 | "Wall time: 18 ms\n" 361 | ] 362 | } 363 | ], 364 | "source": [ 365 | "%%time\n", 366 | "simple_test_fea2 = simple_numerical_sts_features(test)" 367 | ] 368 | }, 369 | { 370 | "cell_type": "markdown", 371 | "metadata": {}, 372 | "source": [ 373 | "### 3.3.3 基线构建" 374 | ] 375 | }, 376 | { 377 | "cell_type": "code", 378 | "execution_count": 11, 379 | "metadata": {}, 380 | "outputs": [], 381 | "source": [ 382 | "train_label = train[['file_id','label']].drop_duplicates(subset = ['file_id','label'], keep = 'first')\n", 383 | "test_submit = test[['file_id']].drop_duplicates(subset = ['file_id'], keep = 'first')" 384 | ] 385 | }, 386 | { 387 | "cell_type": "code", 388 | "execution_count": 12, 389 | "metadata": {}, 390 | "outputs": [], 391 | "source": [ 392 | "### 训练集&测试集构建\n", 393 | "train_data = train_label.merge(simple_train_fea1, on ='file_id', how='left')\n", 394 | "train_data = train_data.merge(simple_train_fea2, on ='file_id', how='left')\n", 395 | "\n", 396 | "test_submit = test_submit.merge(simple_test_fea1, on ='file_id', how='left')\n", 397 | "test_submit = test_submit.merge(simple_test_fea2, on ='file_id', how='left')" 398 | ] 399 | }, 400 | { 401 | "cell_type": "code", 402 | "execution_count": 14, 403 | "metadata": {}, 404 | "outputs": [], 405 | "source": [ 406 | "def lgb_logloss(preds,data):\n", 407 | " labels_ = data.get_label() \n", 408 | " classes_ = np.unique(labels_) \n", 409 | " preds_prob = []\n", 410 | " for i in range(len(classes_)):\n", 411 | " preds_prob.append(preds[i*len(labels_):(i+1) * len(labels_)] )\n", 412 | " \n", 413 | " preds_prob_ = np.vstack(preds_prob) \n", 414 | " \n", 415 | " loss = []\n", 416 | " for i in range(preds_prob_.shape[1]): # 样本个数\n", 417 | " sum_ = 0\n", 418 | " for j in range(preds_prob_.shape[0]): #类别个数\n", 419 | " pred = preds_prob_[j,i] # 第i个样本预测为第j类的概率\n", 420 | " if j == labels_[i]:\n", 421 | " sum_ += np.log(pred)\n", 422 | " else:\n", 423 | " sum_ += np.log(1 - pred)\n", 424 | " loss.append(sum_) \n", 425 | " return 'loss is: ',-1 * (np.sum(loss) / preds_prob_.shape[1]),False" 426 | ] 427 | }, 428 | { 429 | "cell_type": "markdown", 430 | "metadata": {}, 431 | "source": [] 432 | }, 433 | { 434 | "cell_type": "code", 435 | "execution_count": 15, 436 | "metadata": {}, 437 | "outputs": [], 438 | "source": [ 439 | "### 模型验证\n", 440 | "train_features = [col for col in train_data.columns if col not in ['label','file_id']]\n", 441 | "train_label = 'label'" 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": 16, 447 | "metadata": { 448 | "scrolled": true 449 | }, 450 | "outputs": [ 451 | { 452 | "name": "stdout", 453 | "output_type": "stream", 454 | "text": [ 455 | "fold n°0\n", 456 | "Training until validation scores don't improve for 100 rounds\n", 457 | "[50]\ttraining's multi_logloss: 1.83717\ttraining's loss is: : 2.41456\tvalid_1's multi_logloss: 1.28536\tvalid_1's loss is: : 0.941228\n", 458 | "[100]\ttraining's multi_logloss: 1.83717\ttraining's loss is: : 2.41456\tvalid_1's multi_logloss: 1.28536\tvalid_1's loss is: : 0.941228\n", 459 | "Early stopping, best iteration is:\n", 460 | "[1]\ttraining's multi_logloss: 1.83717\ttraining's loss is: : 2.41456\tvalid_1's multi_logloss: 1.28536\tvalid_1's loss is: : 0.941228\n", 461 | "fold n°1\n", 462 | "Training until validation scores don't improve for 100 rounds\n", 463 | "[50]\ttraining's multi_logloss: 1.77226\ttraining's loss is: : 2.32838\tvalid_1's multi_logloss: 2.10695\tvalid_1's loss is: : 1.86109\n", 464 | "[100]\ttraining's multi_logloss: 1.77226\ttraining's loss is: : 2.32838\tvalid_1's multi_logloss: 2.10695\tvalid_1's loss is: : 1.86109\n", 465 | "Early stopping, best iteration is:\n", 466 | "[1]\ttraining's multi_logloss: 1.77226\ttraining's loss is: : 2.32838\tvalid_1's multi_logloss: 2.10695\tvalid_1's loss is: : 1.86109\n", 467 | "fold n°2\n", 468 | "Training until validation scores don't improve for 100 rounds\n", 469 | "[50]\ttraining's multi_logloss: 1.79063\ttraining's loss is: : 2.32093\tvalid_1's multi_logloss: 1.67573\tvalid_1's loss is: : 1.91268\n", 470 | "[100]\ttraining's multi_logloss: 1.79063\ttraining's loss is: : 2.32093\tvalid_1's multi_logloss: 1.67573\tvalid_1's loss is: : 1.91268\n", 471 | "Early stopping, best iteration is:\n", 472 | "[1]\ttraining's multi_logloss: 1.79063\ttraining's loss is: : 2.32093\tvalid_1's multi_logloss: 1.67573\tvalid_1's loss is: : 1.91268\n", 473 | "fold n°3\n", 474 | "Training until validation scores don't improve for 100 rounds\n", 475 | "[50]\ttraining's multi_logloss: 1.79651\ttraining's loss is: : 2.36572\tvalid_1's multi_logloss: 1.92355\tvalid_1's loss is: : 1.42824\n", 476 | "[100]\ttraining's multi_logloss: 1.79651\ttraining's loss is: : 2.36572\tvalid_1's multi_logloss: 1.92355\tvalid_1's loss is: : 1.42824\n", 477 | "Early stopping, best iteration is:\n", 478 | "[1]\ttraining's multi_logloss: 1.79651\ttraining's loss is: : 2.36572\tvalid_1's multi_logloss: 1.92355\tvalid_1's loss is: : 1.42824\n", 479 | "fold n°4\n", 480 | "Training until validation scores don't improve for 100 rounds\n", 481 | "[50]\ttraining's multi_logloss: 1.70379\ttraining's loss is: : 2.27265\tvalid_1's multi_logloss: 2.91788\tvalid_1's loss is: : 3.32694\n", 482 | "[100]\ttraining's multi_logloss: 1.70379\ttraining's loss is: : 2.27265\tvalid_1's multi_logloss: 2.91788\tvalid_1's loss is: : 3.32694\n", 483 | "Early stopping, best iteration is:\n", 484 | "[1]\ttraining's multi_logloss: 1.70379\ttraining's loss is: : 2.27265\tvalid_1's multi_logloss: 2.91788\tvalid_1's loss is: : 3.32694\n", 485 | "Wall time: 9.94 s\n" 486 | ] 487 | } 488 | ], 489 | "source": [ 490 | "%%time\n", 491 | "from sklearn.model_selection import StratifiedKFold,KFold\n", 492 | "params = {\n", 493 | " 'task':'train', \n", 494 | " 'num_leaves': 255,\n", 495 | " 'objective': 'multiclass',\n", 496 | " 'num_class': 8,\n", 497 | " 'min_data_in_leaf': 50,\n", 498 | " 'learning_rate': 0.05,\n", 499 | " 'feature_fraction': 0.85,\n", 500 | " 'bagging_fraction': 0.85,\n", 501 | " 'bagging_freq': 5, \n", 502 | " 'max_bin':128,\n", 503 | " 'random_state':100\n", 504 | " } \n", 505 | "\n", 506 | "folds = KFold(n_splits=5, shuffle=True, random_state=15)\n", 507 | "oof = np.zeros(len(train))\n", 508 | "\n", 509 | "predict_res = 0\n", 510 | "models = []\n", 511 | "for fold_, (trn_idx, val_idx) in enumerate(folds.split(train_data)):\n", 512 | " print(\"fold n°{}\".format(fold_))\n", 513 | " trn_data = lgb.Dataset(train_data.iloc[trn_idx][train_features], label=train_data.iloc[trn_idx][train_label].values)\n", 514 | " val_data = lgb.Dataset(train_data.iloc[val_idx][train_features], label=train_data.iloc[val_idx][train_label].values) \n", 515 | " \n", 516 | " clf = lgb.train(params, trn_data, num_boost_round=2000,valid_sets=[trn_data,val_data], verbose_eval=50, early_stopping_rounds=100, feval=lgb_logloss) \n", 517 | " models.append(clf)" 518 | ] 519 | }, 520 | { 521 | "cell_type": "code", 522 | "execution_count": 17, 523 | "metadata": { 524 | "scrolled": true 525 | }, 526 | "outputs": [ 527 | { 528 | "name": "stdout", 529 | "output_type": "stream", 530 | "text": [ 531 | "fold n°0\n", 532 | "Training until validation scores don't improve for 100 rounds\n", 533 | "[50]\ttraining's multi_logloss: 1.83717\ttraining's loss is: : 2.41456\tvalid_1's multi_logloss: 1.28536\tvalid_1's loss is: : 0.941228\n", 534 | "[100]\ttraining's multi_logloss: 1.83717\ttraining's loss is: : 2.41456\tvalid_1's multi_logloss: 1.28536\tvalid_1's loss is: : 0.941228\n", 535 | "Early stopping, best iteration is:\n", 536 | "[1]\ttraining's multi_logloss: 1.83717\ttraining's loss is: : 2.41456\tvalid_1's multi_logloss: 1.28536\tvalid_1's loss is: : 0.941228\n", 537 | "fold n°1\n", 538 | "Training until validation scores don't improve for 100 rounds\n", 539 | "[50]\ttraining's multi_logloss: 1.77226\ttraining's loss is: : 2.32838\tvalid_1's multi_logloss: 2.10695\tvalid_1's loss is: : 1.86109\n", 540 | "[100]\ttraining's multi_logloss: 1.77226\ttraining's loss is: : 2.32838\tvalid_1's multi_logloss: 2.10695\tvalid_1's loss is: : 1.86109\n", 541 | "Early stopping, best iteration is:\n", 542 | "[1]\ttraining's multi_logloss: 1.77226\ttraining's loss is: : 2.32838\tvalid_1's multi_logloss: 2.10695\tvalid_1's loss is: : 1.86109\n", 543 | "fold n°2\n", 544 | "Training until validation scores don't improve for 100 rounds\n", 545 | "[50]\ttraining's multi_logloss: 1.79063\ttraining's loss is: : 2.32093\tvalid_1's multi_logloss: 1.67573\tvalid_1's loss is: : 1.91268\n", 546 | "[100]\ttraining's multi_logloss: 1.79063\ttraining's loss is: : 2.32093\tvalid_1's multi_logloss: 1.67573\tvalid_1's loss is: : 1.91268\n", 547 | "Early stopping, best iteration is:\n", 548 | "[1]\ttraining's multi_logloss: 1.79063\ttraining's loss is: : 2.32093\tvalid_1's multi_logloss: 1.67573\tvalid_1's loss is: : 1.91268\n", 549 | "fold n°3\n", 550 | "Training until validation scores don't improve for 100 rounds\n", 551 | "[50]\ttraining's multi_logloss: 1.79651\ttraining's loss is: : 2.36572\tvalid_1's multi_logloss: 1.92355\tvalid_1's loss is: : 1.42824\n", 552 | "[100]\ttraining's multi_logloss: 1.79651\ttraining's loss is: : 2.36572\tvalid_1's multi_logloss: 1.92355\tvalid_1's loss is: : 1.42824\n", 553 | "Early stopping, best iteration is:\n", 554 | "[1]\ttraining's multi_logloss: 1.79651\ttraining's loss is: : 2.36572\tvalid_1's multi_logloss: 1.92355\tvalid_1's loss is: : 1.42824\n", 555 | "fold n°4\n", 556 | "Training until validation scores don't improve for 100 rounds\n", 557 | "[50]\ttraining's multi_logloss: 1.70379\ttraining's loss is: : 2.27265\tvalid_1's multi_logloss: 2.91788\tvalid_1's loss is: : 3.32694\n", 558 | "[100]\ttraining's multi_logloss: 1.70379\ttraining's loss is: : 2.27265\tvalid_1's multi_logloss: 2.91788\tvalid_1's loss is: : 3.32694\n", 559 | "Early stopping, best iteration is:\n", 560 | "[1]\ttraining's multi_logloss: 1.70379\ttraining's loss is: : 2.27265\tvalid_1's multi_logloss: 2.91788\tvalid_1's loss is: : 3.32694\n", 561 | "Wall time: 9.7 s\n" 562 | ] 563 | } 564 | ], 565 | "source": [ 566 | "%%time\n", 567 | "from sklearn.model_selection import StratifiedKFold,KFold\n", 568 | "params = {\n", 569 | " 'task':'train', \n", 570 | " 'num_leaves': 255,\n", 571 | " 'objective': 'multiclass',\n", 572 | " 'num_class': 8,\n", 573 | " 'min_data_in_leaf': 50,\n", 574 | " 'learning_rate': 0.05,\n", 575 | " 'feature_fraction': 0.85,\n", 576 | " 'bagging_fraction': 0.85,\n", 577 | " 'bagging_freq': 5, \n", 578 | " 'max_bin':128,\n", 579 | " 'random_state':100\n", 580 | " } \n", 581 | "\n", 582 | "folds = KFold(n_splits=5, shuffle=True, random_state=15)\n", 583 | "oof = np.zeros(len(train))\n", 584 | "\n", 585 | "predict_res = 0\n", 586 | "models = []\n", 587 | "for fold_, (trn_idx, val_idx) in enumerate(folds.split(train_data)):\n", 588 | " print(\"fold n°{}\".format(fold_))\n", 589 | " trn_data = lgb.Dataset(train_data.iloc[trn_idx][train_features], label=train_data.iloc[trn_idx][train_label].values)\n", 590 | " val_data = lgb.Dataset(train_data.iloc[val_idx][train_features], label=train_data.iloc[val_idx][train_label].values) \n", 591 | " \n", 592 | " clf = lgb.train(params, trn_data, num_boost_round=2000,valid_sets=[trn_data,val_data], verbose_eval=50, early_stopping_rounds=100, feval=lgb_logloss) \n", 593 | " models.append(clf)" 594 | ] 595 | }, 596 | { 597 | "cell_type": "markdown", 598 | "metadata": {}, 599 | "source": [ 600 | "### 3.3.4 特征重要性分析" 601 | ] 602 | }, 603 | { 604 | "cell_type": "code", 605 | "execution_count": 18, 606 | "metadata": {}, 607 | "outputs": [], 608 | "source": [ 609 | "feature_importance = pd.DataFrame()\n", 610 | "feature_importance['fea_name'] = train_features\n", 611 | "feature_importance['fea_imp'] = clf.feature_importance()\n", 612 | "feature_importance = feature_importance.sort_values('fea_imp',ascending = False)" 613 | ] 614 | }, 615 | { 616 | "cell_type": "code", 617 | "execution_count": 19, 618 | "metadata": {}, 619 | "outputs": [ 620 | { 621 | "data": { 622 | "text/plain": [ 623 | "" 624 | ] 625 | }, 626 | "execution_count": 19, 627 | "metadata": {}, 628 | "output_type": "execute_result" 629 | }, 630 | { 631 | "data": { 632 | "image/png": "iVBORw0KGgoAAAANSUhEUgAABKMAAAJOCAYAAABr8MR3AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOzde7htZV0v8O8PUMRUEPCCgG5MyvTJKFeYprXLS1opmpxETbE0K/WYmqcsKxWrx0tlxzCVlCKOx0uWRmreUNK8IAvdctEURDwQpBhIIqmB7/ljjMWeLNZl7rX2eufea38+z7OeNeYY7xjjnb/1jjnn+s4x5qzWWgAAAACgh71m3QEAAAAA9hzCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdLPPrDvQ28EHH9y2bNky624AAAAAbBpnn332V1trt5um7R4XRm3ZsiXz8/Oz7gYAAADAplFVX5q2rcv0AAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdCOMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdCOMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuZh5GVdVDq+pzVXVhVT1vieX7VtWbx+VnVtWWRcvvXFXXVNVze/UZAAAAgLWZaRhVVXsneVWShyW5R5LHVtU9FjV7cpKrWmt3S/KKJC9dtPwVSf5po/sKAAAAwPrN+syoo5Nc2Fq7qLX27SRvSnLMojbHJDllnH5rkgdWVSVJVT0yyUVJzu/UXwAAAADWYdZh1KFJLpm4fek4b8k2rbXrklyd5KCq+q4kv5XkRR36CQAAAMBOMOswqpaY16Zs86Ikr2itXbPqTqqeWlXzVTV/xRVXrKGbAAAAAOwM+8x4/5cmOXzi9mFJLlumzaVVtU+S/ZNcmeQ+SY6tqpclOSDJd6rqm621ExfvpLV2UpKTkmRubm5x2AUAAABAJ7MOo85KcmRVHZHk35Icl+Rxi9qcluT4JB9LcmySD7TWWpIHLDSoqhcmuWapIAoAAACAXcdMw6jW2nVV9Ywk70myd5KTW2vnV9UJSeZba6cleX2SU6vqwgxnRB03ux4DAAAAsB41nGS055ibm2vz8/Oz7gYAAADAplFVZ7fW5qZpO+sPMAcAAABgDyKMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdCOMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdCOMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdCOMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgm5mHUVX10Kr6XFVdWFXPW2L5vlX15nH5mVW1ZZz/4Ko6u6rOHX//ZO++AwAAALBjZhpGVdXeSV6V5GFJ7pHksVV1j0XNnpzkqtba3ZK8IslLx/lfTfLw1tr3Jzk+yal9eg0AAADAWs36zKijk1zYWruotfbtJG9KcsyiNsckOWWcfmuSB1ZVtdY+1Vq7bJx/fpJbVNW+XXoNAAAAwJrMOow6NMklE7cvHect2aa1dl2Sq5MctKjNo5N8qrX2raV2UlVPrar5qpq/4oordkrHAQAAANhxsw6jaol5bUfaVNU9M1y69yvL7aS1dlJrba61Nne7291uTR0FAAAAYP1mHUZdmuTwiduHJblsuTZVtU+S/ZNcOd4+LMnbkjyxtfaFDe8tAAAAAOsy6zDqrCRHVtURVXXzJMclOW1Rm9MyfEB5khyb5AOttVZVByR5Z5Lfbq19pFuPAQAAAFizmYZR42dAPSPJe5J8NslbWmvnV9UJVfWIsdnrkxxUVRcmeU6S543zn5Hkbkl+r6q2jT+373wXAAAAANgB1drij2ja3Obm5tr8/PysuwEAAACwaVTV2a21uWnazvoyPQAAAAD2IMIoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdCOMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdCOMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdCOMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG52KIyqqttU1a03qjMAAAAAbG5ThVFVNVdV5yY5J8l5VfXpqrr3xnYNAAAAgM1mnynbnZzkaa21DydJVd0/yV8luddGdQwAAACAzWfay/S+vhBEJUlr7V+SfH1jugQAAADAZjXtmVGfqKrXJnljkpbkMUnOqKofSpLW2ic3qH8AAAAAbCLThlFHjb9fsGj+/TKEUz+503oEAAAAwKY1VRjVWvuJje4IAAAAAJvfVGFUVR2Q5IlJtkyu01p75sZ0CwAAAIDNaNrL9N6V5ONJzk3ynY3rDgAAAACb2bRh1C1aa8/Z0J4AAAAAsOntNWW7U6vql6vqkKo6cOFnQ3sGAAAAwKYz7ZlR307y8iTPz/DteRl/33UjOgUAAADA5jRtGPWcJHdrrX11IzsDAAAAwOY27WV65ye5diM7AgAAAMDmN20YdX2SbVX12qp65cLPzuhAVT20qj5XVRdW1fOWWL5vVb15XH5mVW2ZWPbb4/zPVdVP7Yz+AAAAALBxpr1M7+3jz05VVXsneVWSBye5NMlZVXVaa+0zE82enOSq1trdquq4JC9N8piqukeS45LcM8mdkry/qr6ntXb9zu4nAAAAADvHVGFUa+2UDdr/0UkubK1dlCRV9aYkxySZDKOOSfLCcfqtSU6sqhrnv6m19q0kX6yqC8ftfWyD+goAAADAOq0YRlXVW1prP19V52b7t+jdoLV2r3Xu/9Akl0zcvjTJfZZr01q7rqquTnLQOP/ji9Y9dKmdVNVTkzw1Se585zuvs8sAAAAArNVqZ0b9+vj7Zzdo/7XEvMWh13Jtpll3mNnaSUlOSpK5ubkl2wAAAACw8VYMo1prl4+/v7RSu6r6WGvtvmvY/6VJDp+4fViSy5Zpc2lV7ZNk/yRXTrkuAAAAALuQab9NbzW3WON6ZyU5sqqOqKqbZ/hA8tMWtTktyfHj9LFJPtBaa+P848Zv2zsiyZFJPrHGfgAAAADQwbTfpreaNV36Nn4G1DOSvCfJ3klObq2dX1UnJJlvrZ2W5PVJTh0/oPzKDIFVxnZvyfBh59clebpv0gMAAADYtdVwktE6N1L1ydbaD+2E/my4ubm5Nj8/P+tuAAAAAGwaVXV2a21umrY76zK9pT5MHAAAAABuZGeFUU/YSdsBAAAAYBObKoyqqh+pqrOq6pqq+nZVXV9V/7mwvLV23sZ1EQAAAIDNYtozo05M8tgkFyTZL8lTkvz5RnUKAAAAgM1p6m/Ta61dWFV7j99Y91dV9dEN7BcAAAAAm9C0YdS1VXXzJNuq6mVJLk/yXRvXLQAAAAA2o2kv03vC2PYZSb6R5PAkj96oTgEAAACwOU11ZlRr7UtVtV+SQ1prL9rgPgEAAACwSU37bXoPT7ItybvH20dV1Wkb2TEAAAAANp9pL9N7YZKjk3wtSVpr25Js2ZguAQAAALBZTRtGXddau3pDewIAAADApjftt+mdV1WPS7J3VR2Z5JlJPrpx3QIAAABgM1rxzKiqOnWc/EKSeyb5VpI3JvnPJM/a2K4BAAAAsNmsdmbUvavqLkkek+QnkvzJxLJbJvnmRnUMAAAAgM1ntTDqNRm+Qe+uSeYn5leSNs4HAAAAgKmseJlea+2VrbXvS3Jya+2uEz9HtNYEUQAAAADskKm+Ta+19msb3REAAAAANr+pwigAAAAA2BmEUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdCOMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdCOMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdCOMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG5mFkZV1YFV9b6qumD8fdtl2h0/trmgqo4f592yqt5ZVf9aVedX1Uv69h4AAACAtZjlmVHPS3J6a+3IJKePt2+kqg5M8oIk90lydJIXTIRWf9xau3uSH0zyo1X1sD7dBgAAAGCtZhlGHZPklHH6lCSPXKLNTyV5X2vtytbaVUnel+ShrbVrW2sfTJLW2reTfDLJYR36DAAAAMA6zDKMukNr7fIkGX/ffok2hya5ZOL2peO8G1TVAUkenuHsqiVV1VOrar6q5q+44op1dxwAAACAtdlnIzdeVe9PcsclFj1/2k0sMa9NbH+fJG9M8srW2kXLbaS1dlKSk5Jkbm6uLdcOAAAAgI21oWFUa+1Byy2rqi9X1SGttcur6pAkX1mi2aVJtk7cPizJGRO3T0pyQWvtz3ZCdwEAAADYYLO8TO+0JMeP08cn+Ycl2rwnyUOq6rbjB5c/ZJyXqvqDJPsneVaHvgIAAACwE8wyjHpJkgdX1QVJHjzeTlXNVdXrkqS1dmWSFyc5a/w5obV2ZVUdluFSv3sk+WRVbauqp8ziTgAAAAAwvWptz/oIpbm5uTY/Pz/rbgAAAABsGlV1dmttbpq2szwzCgAAAIA9jDAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdCOMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdCOMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdCOMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuZhZGVdWBVfW+qrpg/H3bZdodP7a5oKqOX2L5aVV13sb3GAAAAID1muWZUc9Lcnpr7cgkp4+3b6SqDkzygiT3SXJ0khdMhlZV9XNJrunTXQAAAADWa5Zh1DFJThmnT0nyyCXa/FSS97XWrmytXZXkfUkemiRVdaskz0nyBx36CgAAAMBOMMsw6g6ttcuTZPx9+yXaHJrkkonbl47zkuTFSf4kybWr7aiqnlpV81U1f8UVV6yv1wAAAACs2T4bufGqen+SOy6x6PnTbmKJea2qjkpyt9bas6tqy2obaa2dlOSkJJmbm2tT7hsAAACAnWxDw6jW2oOWW1ZVX66qQ1prl1fVIUm+skSzS5Nsnbh9WJIzktw3yb2r6uIM9+H2VXVGa21rAAAAANhlzfIyvdOSLHw73vFJ/mGJNu9J8pCquu34weUPSfKe1tqrW2t3aq1tSXL/JJ8XRAEAAADs+mYZRr0kyYOr6oIkDx5vp6rmqup1SdJauzLDZ0OdNf6cMM4DAAAAYDdUre1ZH6E0NzfX5ufnZ90NAAAAgE2jqs5urc1N03aWZ0YBAAAAsIcRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdCOMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdCOMAgAAAKAbYRQAAAAA3QijAAAAAOhGGAUAAABAN8IoAAAAALoRRgEAAADQjTAKAAAAgG6EUQAAAAB0I4wCAAAAoBthFAAAAADdCKMAAAAA6EYYBQAAAEA3wigAAAAAuhFGAQAAANCNMAoAAACAboRRAAAAAHQjjAIAAACgG2EUAAAAAN0IowAAAADoRhgFAAAAQDfCKAAAAAC6EUYBAAAA0I0wCgAAAIBuhFEAAAAAdFOttVn3oauquiLJl2bdj2UcnOSrs+7Ebkz91kf91kf91k7t1kf91kf91k7t1kf91kf91kf91k7t1kf91mdXr99dWmu3m6bhHhdG7cqqar61Njfrfuyu1G991G991G/t1G591G991G/t1G591G991G991G/t1G591G99NlP9XKYHAAAAQDfCKAAAAAC6EUbtWk6adQd2c+q3Puq3Puq3dmq3Puq3Puq3dmq3Puq3Puq3Puq3dmq3Puq3Ppumfj4zCgAAAIBunBkFAAAAQDfCKAAAAAC6EUYBAAAA0M0eFUZV1TOr6rNVdVVVPW+c98Kqeu4atvWrVfXEJeZvqarzdkZ/J7b5rqo6YGduc5X93ahOVfWsqvpDdVr/vnaFMVhVR1XVT0/cfsRCX5Zoe82O9mu9FtXo3eM8NRr29dFVlu8K42uuql65g/taUx/XYmfUqKq2VtX9NmuNdtRydZhYviuMy13ymF6rPbGmq42zNWxv09ewqn5nhWU7fF/3hJrtqF2kJpvmOUU9d8xyrwvVccV9bK2q+y0xf6FmTc362WfWHejsaUke1lr74no31Fp7zU7oz7T7+unVW+1UN6pTVV2c5C1r2dBmrdM69rUrjMGjkswlede4ndOSnLbe/uxEN9RovS8kN1uNWms3efJcZObjq7U2n2R+vfvfQDujRluTXNNa++O1rLwb1GiHTDFWZj4us4se0+uwx9V0A15P7Ak1/J0kf7QTt7cn1GxHzbwmm+w5RT13wAqvC9VxeVuTXJNkcZD3tCQPS3Jua+0l69nBJqzZhtljzoyqqtckuWuS06rq2VV14hJtvruq3l1VZ1fVh6vq7its74YUsqruXVWfrqqPJXn6Kv3YMm77k+PP/cb5W6vqQ1X1tqr6TFW9pqr2GpddXFUHr7DNJ1bVOWMfTh3n3aWqTh/nn15Vdx7n/3VVHTux7jUT+z+jqr6Q5HuTfGqs04eS3CnJ8Ul+dXet07i9z1bVX1bV+VX13qrab1x2RlXNjdMHj+FbqupJVfX34329oKpeNrG9G/ZVVc+vqs9V1fur6o0T93fxdr+e7WPwA1X15ao6J8m9J7a7obWtqpsnOSHJY6pqW1U9ZryfJ47Lj6iqj1XVWVX14pX+RmP736yqc8d9v2Scd1RVfXwce2+rqtvuQJ2/luTIsUbvT3LLqtqW5Od2xxpNHFdvrap/rao3VFWNyybH0FxVnTHR55PH9S6qqmdObG/heK2qOnE8Bt5Zw5l6780wvj5XVb87Lp9L8qRxne+qqr+tqqur6trxONiIY3drVb1jivtyw3GT4TFnYf6Sf9+q+oca32Wqql+pqjes0Ie71XA8frqGx4/vHmu2Lcn3JPlsVZ0y1mhrksdNrPs345g+u6q+Obb55Djv7lW1JcNj4bOr6t+r6lW7aY3OqKqXVtUnqurzVfWAcf4NY328/Y6xRqmqa2o4S/bTNRzjd5i4DzcZK1X18qq6MsO4/NDY14VtPy7Jd4/rHF/Du7fXVtUVVfVDK/R7Vzim/7mq3jLW7SVV9fixjudW1cJ9ul1V/d24zbOq6kfH+UdX1Uer6lPj7++dqPuSzzdL9GHmr2dmVNPJfi45fqe1CWt4SA2vjbZV1XlV9YAanpP3G+e9YWy35GPKNDZhzaYddw+vqjPHY/b9tf1x75VVddZYkw9U1RdmUZOJ+7IrPKe8YhyHn62qH67hMe2CqvqDiXa/MNZ4W1W9tqr2Hue/uqq+kuE5+uMLY6yG14pbkzxr/Ls8WD1vVM/F/8e9taquyvbX0rP8n3dXqOMza3itfE5Vvalu/BpuWw2PlUdU1b9nGHtnJrnZHl6zacfe28f9nF9VTx3n3WVsd3BV7TX24SEr3ee01vaYnyQXJzk4wz9nJ47zXpjkueP06UmOHKfvk+QDK2xrcr1zkvz4OP3yJOetsN4tk9xinD4yyfw4vTXJNzM8qe2d5H1Jjp3s9zLbu2eSzy0sT3Lg+Psfkxw/Tv9SkreP03+9sN3x9jUT+786yWHj/s5K8odJThxvv2w3r9OWJNclOWq8/ZYkvzBOn5Fkbpw+OMnF4/STklyUZLF1M68AAA5bSURBVP8kt0jypSSHLxpL905y7tjf2yS5cOL+3mS7489zkvzdWNt9k1yW5I861vZJGcf/4tsZ3l184jj99IXxscx2HpbhXYVbLhp7k305Icmf7WCdr0tyr3H+t2d0nO6sGm3N9uNqryQfS3L/xeM1w7u8Z0z0+aPj2Dg4yX8kudmi4/XnMoz9vTOExV9Lcuy4zf+X5BkZxtfcOO+5Gd4dPz/D8XTA2O6MDajd1iTvWOm+ZOXjZsm/b5I7jO0ekOTzGcfbMn04M8mjxulbjPt59Fizi5N839iXk8f+fn5i/5ck+a1x+rIknx+nn5bkdZO12c1rdEaSPxmnfzrJ+5cZ++9IsnWcbkkePk6/LMnvTjNWxpo/I8n7s/04+nySV4/3+aok9xrnn5jkol38mP5akkPGv9m/JXnRuOzXs/3x7v9m+7F+5ySfHadvk2SfcfpBSf5umcfBG55vlunHxZn965neNZ3s5xlZYvzuyM8mq+FvJHn+OL13kluP09dMtFn2MWUPrdm04+62SWqcfsrEuLtlhufUfx9r+b9mWJOt2TWeU146Ub/LJmp7aZKDMjz3/mO2v6b5i4m/18JryIuTfCTJ72b7/yDvyvCc+7Tx76SeyUGTx3hu+nrzm0l+JrM9VneFOl6WZN9x+oDF92fycSPDWPvNsXZ7cs3OyHRjb+GY3S/Da72F+U9J8tYMj4mvXW4/Cz972mV6y6qqWyW5X5K/reHEhWQo+mrr7Z9hcP/zOOvUDP+kL+dmSU6sqqOSXJ8hhV3widbaReN235jk/hn+mCv5ySRvba19NUlaa1eO8++b7WeTnJrhH4fVfKK1dul4/8/LMPD/e7LBblynJPlia23bOH12hoBqNae31q4e9/WZJHfJ8M/qggckeVtr7dqxzTSnim9NcnSGF4z3z3AQH9yxtiv50Qz/tC9s56UrtH1Qkr9auO+ttSuX6MspSf52iv1O1vnbSQ5fqtFuWKNkPK7GfmzLMO7+ZZV13tla+1aSb9XwTuEdMjwBLPixJG9srV2f5LKq+sAU/X5ohheC54y3986i43spO6F2S92XJY+blf6+rbUvV9XvJ/lghqDpyiyhqm6d5NDW2tvG9b45zr9/kjcm+f0kVyT51yS3X7TurTI84T69qh6b4THwG+PiszNxht6i9XarGk34+4n7tmWKfn47Qzi1sM6DJxeusQ5bM4SjZ473pzK8EFzRjI/ps1prl4/9+EKS947zz03yE+P0g5LcY+JvdJtxbO6f5JSqOjJDuHezie2u9nwzld30cXKami62o+N3arthDc9KcnJV3SzDm4/blmizltcrU9sNa5ZMN+4OS/Lmqjokyc2TfDFJWmvXVtUvZwhO/izD88pdJjc+w5rM6jllYUydm+T8idpelOF13f0z/CN91riv/ZJ8ZVzn58ezK+6U5FYZXqtcPy777Pj7M0nuuFRfV7KJ6/kfi9pPvt5ceC19k+fTPWxcnpPkDVX19iRvX6bNwuPGCRlOVjhhcYM9rGbJdGPvmVX1qLHd4Rne7P6P1trrqup/ZDgD7ajV7qAwaru9knyttbZq0RapDC8op/XsJF9O8gPZnlwvWLydabY77f4X2lw37jc1jNSbT7T51sT09Vn6n9XdtU7JTe/ffuP0DTXJ8I70Sussdcwst//ltltJ/k+GM7+eUVUvzHDtcq/armbabe3ofqetc7L8Y9PuVqNk+TG0keNu4Vlo8bj7amvt9jddZUXrrd1y92Wpba729/3+DE+Ad1phf7WD8yfrtVeGwOX3W2t/XcPlAfcdly33d1jY9u5Uo8X7XW5cJjceQ//dxre9snQ9VqrDZJ0zse5eSb7ZWtvvpqusaFc5pr8zcfs7ufH9um9r7b8mV6yqP0/ywdbao2q4XOCMZba70nhbze7+OLlcTZdbZz21Ws5uVcPW2oeq6scynAlxalW9vLX2N2vd3hrtVjUbTTPu/jzJn7bWTqvhkuUXTqzz/WPbO2Y4Q2qxWdVk1s8pk7VcuL1Phvt1SmvttydXqqojMpz59MNJPpXhLPKbZXsYtfC7JblOPZd9XNzVX0vPoo4/k+EN3Eck+b2quucy7Va7X3tSzSb3u+TYGx8LH5Thdc61NXzUyC2SpKpumSHET4Zg+esr7WiP+cyo1bTW/jPJF8ckLzX4gSnW+1qSq8d33ZPk8aussn+Sy1tr30nyhAyBz4Kja7huda8kj8nqZ08kw+l4P19VB439PnCc/9Ekx030aWFbF2f7ZxQdkxu/K7ucr2cMrXbjOq3k4myvybErtFvKh5I8qqr2G9/1fvgU2/1ghjPaFo6/g5LcvGNtv57k1sss+0huPG5W8t4kvzQ+6KSqDhzf1b+qtn9+xxOSLLwbcHF2vM7fGd/pTdJ1/O2sGq3k4myvx6NXaLeUDyU5rqr2Ht+tnTxz4JJsP0tgcrv/lOT6idr94AbVbhpLHjcr/X2r6ugM7yb9YJLnji9el+rvfya5tKoeOa637zhGP5Th8SIZjrnvzfCO7JeS3C7DY0xlOC5/eGKTS71wudH42N1qtIqLkxxVw7X+h2c4i3Mqq9ThkgyXq2Xc7qHj/A8m2auq/ue47JZV9Yh17mspPY7pSe/NcGlikqSGs3yT4bnt38bpJ+2kfd3IJnucnIndrYZVdZckX2mt/WWS1ydZ+Ny1/554Dl3p9cq67W412wGTx+zxCzPHmv9GhktYHpjh4yNupGNNpjGr55RJpyc5tqpuP27/wLGOt8lwFvLVGZ6DH7jM+t9I8k31XJ89ZVzW8H/i4a21D2a4/O6AbA9HJh9DJh83lvz/ZE+p2Q7YP8lVYxB19yQ/MrHspUnekOFKhL9cbUPCqBt7fJInV9WnM1wHfsyU6/1iklfV8MFk/7VK279IcnxVfTzDpWffmFj2sSQvyXCJ3BeTvG21HbfWzs/w2U7/PPb7T8dFz0zyizV8QPYTMlzzmQyD4ser6hMZriX9RlZ3UpJfyPgB5tkN67SKP07yazV8PeqyHxS/lNbaJ5O8Ocm2DJ8D9eEptntqhhc2j6nh6z1/NtuPxR61/WCGy0e2VdVjFi379QyXKJ2V4YFmWa21d2c4jXO+hsvPFr5W9PgkLx/H3lHZfrrrWup8foZTbCcvj9ptarSKFyX531X14Wx/129ab0tyQYbTZ1+d7YFfMlyP/vgMNZvc7osz3K+Tq+q/MjyZbUTtVrXKcXOTv29V7ZvhseuXWmuXZfgH4OSqqiztCRlOHz4nQzB/xww1OyfDO0Jvy3Aq9rWttUvG/fxGhifPM5L82Lj/O2Xp06n/McmjMjwmLjyh7241Ws5HMjyunpvhmP3kDq6/XB3OTPLVJI8dt3t5krTWrhjX+f/t3V+IVGUYx/HvDw00gyAh8KKSLoJCKUsRKkqJiiCKyOiioroLIutCgihCyIjqQoyK/tA/aq+KKO2ilEKyIBTK/FsEdVEURUQ3QQr6dHGONG2ruzuzntnV7weWnTk773ueeZiZM/uc933PuvZ1+Qf/Fg373ddYunhP91oNLE2zaOo+/j1+PgU8keRz/nuSZaqdKJ+TwzSTcrgC2JnkK5qTEBva7S8Bu5KMjPOZMlVmUs4mai3NtJdtNJ9hR2YWvELzvecQ8ED7HMZ6T3eRk3EN8ZjSG8M+mrWgNrfH5y3Agqr6mmZE1F6a74fbj9HN/tGxTnD3J1w+B3QyvC5nAW8l2U3z+lrfFoc20RRzdqY5eX4/zRpzC2gKo0dzMuRsoj6kGSG1i+b/iy8AklxJc0L3yaoaAQ4muftYHR1ZkE9Dlma425qqun7YsUxn0z1PaafcVZ+XfJf6keR1mgUPJ7J2mtSJNNPQPqiqRUMORZIkSdOMI6MkSZIkSZLUGUdGjSPJw8Atoza/XVWPj9PuWv5/FY8fquqmsR4/gTjm08y1Hu2qqhp9NYXOmafjZ9i5TbKYZmphrwNVtXwy/RxP5qh/w87dIJI8R3MVlF4bquq1Kd6POerYsHM+k9/TR2NOB2cOJ8+c/d+wczKI6XhMMZ9TFot5nPx+zdmgcViMkiRJkiRJUlecpidJkiRJkqTOWIySJEmSJElSZyxGSZIkSZIkqTMWoyRJkvqUZHWS/UlGhh2LJEnSTOEC5pIkSX1K8g1wXVX9MOxYJEmSZgpHRkmSJPUhyQvAucDGJA8neTXJjiRfJbmxfczCJNuSfNn+XHqM/lYk2ZrknSTfJBlJkvZvj7Z970nyUs/2rUnWJ/m0HaG1LMm7Sb5Lsq6n79uTbE+yM8mLSWYd3+xIkiQdncUoSZKkPlTVPcDPwEpgHvBJVS1r7z+dZB7wG3B1VV0M3Ao8M063S4AHgAtoCl2XtdufraplVbUImAtc39PmYFVdAbwAvA/cCywC7koyP8n57b4vq6qLgEPAbYM9e0mSpP7NHnYAkiRJJ4BrgBuSrGnvzwHOpilWPZvkSBHovHH62V5VPwEk2QksBD4DViZ5EDgVOAPYC2xq22xsf+8G9lbVL23774GzgMuBS4Ad7YCquTRFMkmSpKGwGCVJkjS4ADdX1bf/2ZisBX4FLqQZkf73OP0c6Ll9CJidZA7wPLC0qn5s+5wzRpvDo9ofpvmuF+CNqnpoMk9IkiTpeHGaniRJ0uA+Au7rWctpSbv9dOCXqjoM3AH0s1bTkcLT70lOA1ZNsv3HwKokZ7axnZHknD7ikCRJmhIWoyRJkgb3GHAKsCvJnvY+NCOa7kzyBc0Uvb8m23FV/Qm8TDMN7z1gxyTb7wMeATYn2QVsARZMNg5JkqSpkqoadgySJEmSJEk6STgySpIkSZIkSZ1xAXNJkqQOJVkMvDlq84GqWj6MeCRJkrrmND1JkiRJkiR1xml6kiRJkiRJ6ozFKEmSJEmSJHXGYpQkSZIkSZI6YzFKkiRJkiRJnfkHzsY0rysG+q0AAAAASUVORK5CYII=\n", 633 | "text/plain": [ 634 | "
" 635 | ] 636 | }, 637 | "metadata": { 638 | "needs_background": "light" 639 | }, 640 | "output_type": "display_data" 641 | } 642 | ], 643 | "source": [ 644 | "plt.figure(figsize=[20, 10,])\n", 645 | "sns.barplot(x = feature_importance['fea_name'], y = feature_importance['fea_imp'])\n", 646 | "#sns.barplot(x=\"fea_name\",y=\"fea_imp\",data=feature_importance)" 647 | ] 648 | }, 649 | { 650 | "cell_type": "markdown", 651 | "metadata": {}, 652 | "source": [ 653 | "### 3.3.5 模型测试" 654 | ] 655 | }, 656 | { 657 | "cell_type": "code", 658 | "execution_count": 20, 659 | "metadata": {}, 660 | "outputs": [], 661 | "source": [ 662 | "pred_res = 0\n", 663 | "fold = 5\n", 664 | "for model in models:\n", 665 | " pred_res +=model.predict(test_submit[train_features]) * 1.0 / fold " 666 | ] 667 | }, 668 | { 669 | "cell_type": "code", 670 | "execution_count": 21, 671 | "metadata": {}, 672 | "outputs": [], 673 | "source": [ 674 | "test_submit['prob0'] = 0\n", 675 | "test_submit['prob1'] = 0\n", 676 | "test_submit['prob2'] = 0\n", 677 | "test_submit['prob3'] = 0\n", 678 | "test_submit['prob4'] = 0\n", 679 | "test_submit['prob5'] = 0\n", 680 | "test_submit['prob6'] = 0\n", 681 | "test_submit['prob7'] = 0" 682 | ] 683 | }, 684 | { 685 | "cell_type": "code", 686 | "execution_count": 22, 687 | "metadata": { 688 | "scrolled": true 689 | }, 690 | "outputs": [], 691 | "source": [ 692 | "test_submit[['prob0','prob1','prob2','prob3','prob4','prob5','prob6','prob7']] = pred_res\n", 693 | "test_submit[['file_id','prob0','prob1','prob2','prob3','prob4','prob5','prob6','prob7']].to_csv('baseline.csv',index = None)" 694 | ] 695 | } 696 | ], 697 | "metadata": { 698 | "kernelspec": { 699 | "display_name": "Python 3", 700 | "language": "python", 701 | "name": "python3" 702 | }, 703 | "language_info": { 704 | "codemirror_mode": { 705 | "name": "ipython", 706 | "version": 3 707 | }, 708 | "file_extension": ".py", 709 | "mimetype": "text/x-python", 710 | "name": "python", 711 | "nbconvert_exporter": "python", 712 | "pygments_lexer": "ipython3", 713 | "version": "3.7.1" 714 | }, 715 | "latex_envs": { 716 | "LaTeX_envs_menu_present": true, 717 | "autoclose": false, 718 | "autocomplete": true, 719 | "bibliofile": "biblio.bib", 720 | "cite_by": "apalike", 721 | "current_citInitial": 1, 722 | "eqLabelWithNumbers": true, 723 | "eqNumInitial": 1, 724 | "hotkeys": { 725 | "equation": "Ctrl-E", 726 | "itemize": "Ctrl-I" 727 | }, 728 | "labels_anchors": false, 729 | "latex_user_defs": false, 730 | "report_style_numbering": false, 731 | "user_envs_cfg": false 732 | }, 733 | "toc": { 734 | "nav_menu": {}, 735 | "number_sections": true, 736 | "sideBar": true, 737 | "skip_h1_title": false, 738 | "title_cell": "Table of Contents", 739 | "title_sidebar": "Contents", 740 | "toc_cell": true, 741 | "toc_position": { 742 | "height": "calc(100% - 180px)", 743 | "left": "10px", 744 | "top": "150px", 745 | "width": "384px" 746 | }, 747 | "toc_section_display": true, 748 | "toc_window_display": true 749 | } 750 | }, 751 | "nbformat": 4, 752 | "nbformat_minor": 2 753 | } 754 | --------------------------------------------------------------------------------