├── 图片资源 ├── 图片1.png ├── 图片2.png ├── 图片3.png ├── 图片4.png ├── 图片5.png ├── 图片6.png ├── 图片7.png ├── 图片8.png ├── 图片9.png ├── .DS_Store ├── 图片10.png ├── 图片11.png ├── 图片12.png ├── 图片13.png ├── 图片14.png └── 图片15.jpg ├── .ipynb_checkpoints ├── 特征工程-checkpoint.ipynb ├── 测试数据特征处理与填充-checkpoint.ipynb ├── 数据探索-checkpoint.ipynb └── 预测建模-checkpoint.ipynb ├── README.md ├── 测试数据特征处理与填充.ipynb ├── 预测建模.ipynb └── 特征工程.ipynb /图片资源/图片1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/2017403603/Data_mining/HEAD/图片资源/图片1.png -------------------------------------------------------------------------------- /图片资源/图片2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/2017403603/Data_mining/HEAD/图片资源/图片2.png -------------------------------------------------------------------------------- /图片资源/图片3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/2017403603/Data_mining/HEAD/图片资源/图片3.png -------------------------------------------------------------------------------- /图片资源/图片4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/2017403603/Data_mining/HEAD/图片资源/图片4.png -------------------------------------------------------------------------------- /图片资源/图片5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/2017403603/Data_mining/HEAD/图片资源/图片5.png -------------------------------------------------------------------------------- /图片资源/图片6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/2017403603/Data_mining/HEAD/图片资源/图片6.png -------------------------------------------------------------------------------- /图片资源/图片7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/2017403603/Data_mining/HEAD/图片资源/图片7.png -------------------------------------------------------------------------------- /图片资源/图片8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/2017403603/Data_mining/HEAD/图片资源/图片8.png -------------------------------------------------------------------------------- /图片资源/图片9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/2017403603/Data_mining/HEAD/图片资源/图片9.png -------------------------------------------------------------------------------- /图片资源/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/2017403603/Data_mining/HEAD/图片资源/.DS_Store -------------------------------------------------------------------------------- /图片资源/图片10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/2017403603/Data_mining/HEAD/图片资源/图片10.png -------------------------------------------------------------------------------- /图片资源/图片11.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/2017403603/Data_mining/HEAD/图片资源/图片11.png -------------------------------------------------------------------------------- /图片资源/图片12.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/2017403603/Data_mining/HEAD/图片资源/图片12.png -------------------------------------------------------------------------------- /图片资源/图片13.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/2017403603/Data_mining/HEAD/图片资源/图片13.png -------------------------------------------------------------------------------- /图片资源/图片14.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/2017403603/Data_mining/HEAD/图片资源/图片14.png -------------------------------------------------------------------------------- /图片资源/图片15.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/2017403603/Data_mining/HEAD/图片资源/图片15.jpg -------------------------------------------------------------------------------- /.ipynb_checkpoints/特征工程-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 5 6 | } 7 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/测试数据特征处理与填充-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 5 6 | } 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

天猫复购预测赛技术报告

2 | 3 |
小组成员:李航程、姚远舟、黄建辉、刘杭达
4 | 5 | ## 一、问题描述 6 | 7 | ### 1.1 问题背景 8 | 9 | ​ 商家有时会在特定日期,例如Boxing-day,黑色星期五或是双十一(11月11日)开展大型促销活动或者发放优惠券以吸引消费者,然而很多被吸引来的买家都是一次性消费者,这些促销活动可能对销售业绩的增长并没有长远帮助,因此为解决这个问题,商家需要识别出哪类消费者可以转化为重复购买者。通过对这些潜在的忠诚客户进行定位,商家可以大大降低促销成本,提高投资回报率。 10 | 11 | ### 1.2 数据描述 12 | 13 | ​ 现在给定四个数据文件,分别为训练数据,测试数据,用户画像以及用户历史记录。训练数据提供纬度为用户、商家,以及该用户是否为该商家的重复购买者(即label)。用户画像数据集提供对应用户id的年龄和性别信息;用户历史记录提供用户过去六个月在不同店铺的多种活跃状态以及点击时间等;测试数据集为用户和商家的组合,用以预测该用户是否为该商家的重复购买者。 14 | 15 | ### 1.3 问题描述 16 | 17 | ​ 根据给定的四个数据形式,在测试数据中给定了用户id和商家id的组合,需要预测该名用户在对应商家的重复购买概率值。 18 | 19 | ## 二、数据探索 20 | 21 | ### 2.1 加载数据集 22 | 23 | ```python 24 | train_data = pd.read_csv("../DataMining/data_format1/train_format1.csv") 25 | test_data = pd.read_csv("../DataMining/data_format1/test_format1.csv") 26 | user_info = pd.read_csv("../DataMining/data_format1/user_info_format1.csv") 27 | user_log = pd.read_csv("../DataMining/data_format1/user_log_format1.csv") 28 | ``` 29 | 30 | ### 2.2 查看用户画像中年龄和性别缺失率 31 | 32 | ```python 33 | (user_info.shape[0] - user_info["age_range"].count())/user_info.shape[0] 34 | (user_info.shape[0] - user_info["gender"].count()) / user_info.shape[0] 35 | ``` 36 | 37 | ​ 其中年龄缺失率为0.52%,性别缺失率为1.5%。缺失比率较小,因此其对最终的分类结果影响较小。后面将直接将NaN(由-1代替)当作特征输入进模型进行训练和学习 38 | 39 | ### 2.3 查看用户信息数据的缺失—用户行为日志数据缺失 40 | 41 | ```python 42 | user_log.isna().sum() 43 | ``` 44 | 45 | 图片1 46 | 47 | ​ 用户行为日志主要缺失特征为购买品牌的缺失,其他特征均无缺失。 48 | 49 | ### 2.4 查看用户画像和历史记录基本数据描述 50 | 51 | ```python 52 | user_info.describe() 53 | ``` 54 | 55 | 图片2 56 | 57 | ​ 用户画像的基本数据分析显示用户的平均年龄在30岁左右,且方差较大。且购买者的性别主要为女性。 58 | 59 | ```python 60 | user_log.describe() 61 | ``` 62 | 63 | 图片3 64 | 65 | ### 2.5 查看样本label比例 66 | 67 | 图片4 68 | 69 | ​ 样本不均衡,非重复购买者比例远远大于重复购买者,因此需要采取一定措施解决此类样本不平衡问题 70 | 71 | ### 2.6 对top 5店铺进行画图分析 72 | 73 | ```python 74 | train_data.merchant_id.value_counts().head(5) 75 | train_data_merchant["TOP5"]=train_data_merchant["merchant_id"].map(lambda x: 1 if x in[4044,3828,4173,1102,4976] else 0) 76 | train_data_merchant=train_data_merchant[train_data_merchant["TOP5"]==1] 77 | plt.figure(figsize=(8,6)) 78 | plt.title("Merchant VS Label")sax=sns.countplot("merchant_id",hue="label",data=train_data_merchant) 79 | ``` 80 | 81 | 图片5 82 | 83 | ​ 采用分布直方图对前五名店铺进行比例分析,可得前五名店铺占据了接近一半的数据量,且重复购买的比例都远远小于非重复购买 84 | 85 | ### 2.7 对商家的重复购买比例进行绘图分析 86 | 87 | ```python 88 | train_data.groupby(["merchant_id"])["label"].mean() 89 | merchant_repeat_buy=[rate for rate in train_data.groupby(["merchant_id"])["label"].mean() if rate<=1 and rate > 0] 90 | plt.figure(figsize=(8,4)) 91 | ax=plt.subplot(1,2,1) 92 | sns.distplot(merchant_repeat_buy,fit=stats.norm) 93 | ax=plt.subplot(1,2,2) 94 | res=stats.probplot(merchant_repeat_buy,plot=plt) 95 | ``` 96 | 97 | 图片6 98 | 99 | ​ 由于数据的特征维度并不具有连续性,无法使用插值法进行填补,并且空缺比率较小,因此我们直接将空缺数据视为一个特征,用-1填补并代表此类特征 100 | 101 | ## 三、特征工程 102 | 103 | ### 3.1 数据集合并 104 | 105 | 1. 将训练集df_train和用户基本信息user_info_format.csv合并得到df_train,合并依据是用户user_id。 106 | 107 | ```python 108 | df_train = pd.merge(df_train,user_info,on="user_id",how="left") 109 | ``` 110 | 111 | 2. 将df_train和用户行为日志user_log_format1.csv合并得到新的df_train,合并依据是用户user_id和商家merchant_id。 112 | 113 | ```python 114 | df_train = pd.merge(df_train,total_logs_temp,on=["user_id","merchant_id"],how="left") 115 | ``` 116 | 117 | ### 3.2 特征生成 118 | 119 | 1. 通过简单合并生成特征 120 | + 每个用户在每个商家交互过的商品总和(不分种类)。***total_item_id*** 121 | + 每个用户在每个商家交互过的商品种类总和。***unique_item_id*** 122 | + 每个用户在每个商家交互过的商品所属品类总和***total_cat_id*** 123 | + 每个用户在每个商家交互过的天数总和。***total_time_temp*** 124 | + 每个用户在每个商家点击次数总和。***clicks*** 125 | + 每个用户在每个商家加入购物车次数总和。***shopping_cart*** 126 | + 每个用户在每个商家购买商品次数总和。***purchases*** 127 | + 每个用户在每个商家收藏商品次数总和。***favourites*** 128 | 129 | 2. 通过分析生成特征 130 | 131 | + 用户每月使用次数 132 | 133 | ```python 134 | month_temp=user_log.groupby(['user_id','month']).size().reset_index().rename(columns={0:'cnt'}) 135 | month_temp=pd.get_dummies(month_temp, columns=['month'],prefix='user_mcnt') 136 | for i in range(5,12): 137 | month_temp['user_mcnt_'+str(i)]=month_temp['cnt']*month_temp['user_mcnt_'+str(i)] 138 | month_temp=month_temp.groupby(['user_id']).sum().drop(['cnt'],axis=1).reset_index() 139 | ``` 140 | 141 | ​ 意义:用户每月使用天猫的次数可以反映用户行为在时间上的特征,用户在一年中不同的月份的消费表现可能不同,例如在年尾,春节,“双十一”等期间可能消费水平高一些,在夏冬两季的消费水平可能会低一些,通过统计每月使用次数可以有效反映出这些特征。 142 | 143 | + 商家的特征 144 | 145 | ```python 146 | temp = groups.size().reset_index().rename(columns={0:'merchantf1'}) 147 | matrix = matrix.merge(temp, on='merchant_id', how='left') 148 | temp = groups['user_id', 'item_id', 'cat_id', 'brand_id'].nunique().reset_index().rename(columns={'user_id':'merchantf2', 'item_id':'merchantf3', 'cat_id':'merchantf4', 'brand_id':'merchantf5'}) 149 | matrix = matrix.merge(temp, on='merchant_id', how='left') 150 | temp = groups['action_type'].value_counts().unstack().reset_index().rename(columns={0:'merchantf6', 1:'merchantf7', 2:'merchantf8', 3:'merchantf9'}) 151 | matrix = matrix.merge(temp, on='merchant_id', how='left') 152 | ``` 153 | 154 | ​ 商家售出的某个商品、品牌的数量,能够反映某些商品或者品牌的受欢迎程度,一定程度上也可以导致顾客回购率。 155 | 156 | + 商家与用户的综合特征 157 | 158 | ```python 159 | matrix['ratiof1'] = matrix['userf9']/matrix['userf7'] # 用户购买点击比 160 | matrix['ratiof2'] = matrix['merchantf8']/matrix['merchantf6'] # 商家购买点击比 161 | ``` 162 | 163 | ​ 用户点击或者该商家被点击最终转化为顾客购买的比率能够很好的反映物品的受欢迎程度 164 | 165 | ## 四、候选模型简介 166 | 167 | 1. 逻辑回归[1](Logistic Regression,LR)是一种广义线性回归(Generalized Linear Model),在机器学习中是最常见的一种用于二分类的算法模型。 168 | 2. 决策树[2](Decision Tree,DT)是一种基本的分类与回归方法,本文主要讨论分类决策树,决策树模型呈树形结构,在分类问题中,表示基于特征对数据进行分类的过程。 169 | 3. 随机森林[3](Random Forest,RF)指的是利用多棵决策树对样本进行训练并预测的一种分类器,可回归可分类,所以随机森林是基于多颗决策树的一种集成学习算法。 170 | 4. 梯度提升树[4](Gradient Descent Decision Tree,GBDT),梯度提升树是以 CART 作为基函数,采用加法模型和前向分步算法的一种梯度提升方法。 171 | 5. XGBoost[5]是陈天奇等人开发的一个开源机器学习项目,高效地实现了GBDT算法并进行了算法和工程上的许多改进,被广泛应用在Kaggle竞赛及其他许多机器学习竞赛中并取得了不错的成绩。 172 | 173 | ## 五、候选模型预测对比 174 | 175 | ### 5.1 加载训练数据和测试数据 176 | 177 | ```python 178 | #读取数据 179 | df_train = pd.read_csv(r'df_train.csv') 180 | #加载最终测试数据 181 | test_data= pd.read_csv(r'test_data.csv') 182 | test_data 183 | ``` 184 | 185 | 图片7 186 | 187 | ### 5.2 建模前预处理数据集 188 | 189 | ```python 190 | #建模前预处理 191 | y = df_train["label"] 192 | X = df_train.drop(["user_id", "merchant_id", "label"], axis=1) 193 | X.head(10) 194 | #分割数据 195 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=8) 196 | ``` 197 | 198 | 图片8 199 | 200 | ### 5.3 候选模型预测:逻辑回归 201 | 202 | ```python 203 | #logistic回归 204 | Logit = LogisticRegression(solver='liblinear') 205 | Logit.fit(X_train, y_train) 206 | Predict = Logit.predict(X_test) 207 | Predict_proba = Logit.predict_proba(X_test) 208 | print(Predict.shape) 209 | print(Predict[0:20]) 210 | print(Predict_proba[:]) 211 | Score = accuracy_score(y_test, Predict) 212 | Score 213 | ``` 214 | 215 | 图片9 216 | 217 | ```python 218 | #逻辑回归最终结果获取 219 | Logit_Ans_Predict_proba = Logit.predict_proba(test_data) 220 | df_test['prob']=Logit_Ans_Predict_proba[:,1] 221 | #最终答案保存 222 | df_test.to_csv("Logit_Ans.csv",index=None) 223 | ``` 224 | 225 | ​ 提交得到评分为:0.4564939 226 | 227 | ### 5.4 候选模型预测:决策树 228 | 229 | ```python 230 | #决策树 231 | from sklearn.tree import DecisionTreeClassifier 232 | tree = DecisionTreeClassifier(max_depth=4,random_state=0) 233 | tree.fit(X_train, y_train) 234 | Predict_proba = tree.predict_proba(X_test) 235 | print(Predict_proba[:]) 236 | print("Accuracy on training set: {:.3f}".format(tree.score(X_train, y_train))) 237 | print("Accuracy on test set: {:.3f}".format(tree.score(X_test, y_test))) 238 | ``` 239 | 240 | 图片10 241 | 242 | ```python 243 | #决策树最终结果获取 244 | Tree_Ans_Predict_proba = tree.predict_proba(test_data) 245 | df_test['prob']=Tree_Ans_Predict_proba[:,1] 246 | #最终答案保存 247 | df_test.to_csv("Tree_Ans.csv",index=None) 248 | ``` 249 | 250 | ​ 提交得到评分为:0.5833852 251 | 252 | ### 5.5 候选模型预测:随机森林 253 | 254 | ```python 255 | #随机森林 256 | from sklearn.ensemble import RandomForestClassifier 257 | rfc = RandomForestClassifier(n_estimators=50,random_state=90,max_depth=5) 258 | rfc = rfc.fit(X_train, y_train) 259 | Predict_proba = rfc.predict_proba(X_test) 260 | print(Predict_proba[:]) 261 | print("Accuracy on training set: {:.3f}".format(rfc.score(X_train, y_train))) 262 | print("Accuracy on test set: {:.3f}".format(rfc.score(X_test, y_test))) 263 | ``` 264 | 265 | 图片11 266 | 267 | ```python 268 | #随机森林最终结果获取 269 | RFC_Ans_Predict_proba = rfc.predict_proba(test_data) 270 | df_test['prob']=RFC_Ans_Predict_proba[:,1] 271 | #最终答案保存 272 | df_test.to_csv("RFC_Ans.csv",index=None) 273 | ``` 274 | 275 | ​ 提交得到评分为:0.6252815 276 | 277 | 278 | 279 | ### 5.6 候选模型预测:随机森林调参 280 | 281 | ```python 282 | # 调参,绘制学习曲线来调参n_estimators(对随机森林影响最大) 283 | score_lt = [] 284 | # 每隔10步建立一个随机森林,获得不同n_estimators的得分 285 | for i in range(0,200,10): 286 | print("进度:",i) 287 | rfc = RandomForestClassifier(n_estimators=i+1,random_state=90,max_depth=8) 288 | rfc = rfc.fit(X_train, y_train) 289 | score = rfc.score(X_test, y_test) 290 | score_lt.append(score) 291 | score_max = max(score_lt) 292 | print('最大得分:{}'.format(score_max),'子树数量为:{}'.format(score_lt.index(score_max)*10+1)) 293 | # 绘制学习曲线 294 | x = np.arange(1,201,10) 295 | plt.subplot(111) 296 | plt.plot(x, score_lt, 'r-') 297 | plt.show() 298 | ``` 299 | 300 | image-20211125145343834 301 | 302 | ​ 上图中横坐标为参数n_estimators的值,纵坐标为模型在测试集上的准确率,每迭代一次n_estimators增加10,画出每次迭代准确率的折线图,由图可知当n_estimators=100时随机森林模型的效果最好,经调参后提交得到评分为:0.6256826。 303 | 304 | ### 5.7 候选模型预测:XGboost 305 | 306 | ```python 307 | import xgboost as xgb 308 | def xgb_train(X_train, y_train, X_valid, y_valid, verbose=True): 309 | model_xgb = xgb.XGBClassifier( 310 | max_depth=10, # raw8 311 | n_estimators=1000, 312 | min_child_weight=300, 313 | colsample_bytree=0.8, 314 | subsample=0.8, 315 | eta=0.3, 316 | seed=42 317 | ) 318 | model_xgb.fit( 319 | X_train, 320 | y_train, 321 | eval_metric='auc', 322 | eval_set=[(X_train, y_train), (X_valid, y_valid)], 323 | verbose=verbose, 324 | early_stopping_rounds=10 # 早停法,如果auc在10epoch没有进步就stop 325 | ) 326 | print(model_xgb.best_score) 327 | print("Accuracy on training set: {:.3f}".format(model.score(X_train, y_train))) 328 | print("Accuracy on test set: {:.3f}".format(model.score(X_test, y_test))) 329 | return model_xgb 330 | ``` 331 | 332 | 图片12 333 | 334 | ```python 335 | #XGboost最终结果获取 336 | model_xgb = xgb_train(X_train, y_train, X_valid, y_valid, verbose=False) 337 | prob = model_xgb.predict_proba(test_data) 338 | submission['prob'] = pd.Series(prob[:,1]) 339 | submission.drop(['origin'], axis=1, inplace=True) 340 | submission.to_csv('submission_xgb.csv', index=False) 341 | ``` 342 | 343 | ​ 提交得到评分为:0.6562986 344 | 345 | ## 六、最终成绩及排名 346 | 347 |
小组成员:李航程、姚远舟、黄建辉、刘杭达
348 | 349 | 图片15 350 | 351 | ## 七、天猫复购预测总结 352 | 353 | ​ 本次比赛最终成绩和排名并不是很高,思考其原因主要还是在数据预处理和特征工程阶段没有做好,在数据集中,年龄和性别的缺失值差不多有九万个,巨大的特征值数据缺失是预测准确率不高的主要原因之一,其次是特征工程,我们抽取特征的方法还是使用传统的方法,相对比较简单,这也是导致模型预测准确率不高的原因之一;在选用模型上我们使用了逻辑回归、决策树、随机森林、Xgboost等热门模型,训练后这些模型在训练集上的表现区别并不明显,经比较Xgboost模型在测试集的效果最好,后期工作准备再重新做一下特征工程,在模型选取方面,计划使用bagging集成多种分类算法的思想对模型进行改进,进一步提高预测准确率。 354 | 355 | ## 八、参考 356 | 357 | [1] [https://www.cnblogs.com/phyger/p/14188712.html](https://www.cnblogs.com/phyger/p/14188712.html) 358 | 359 | [2] [https://blog.csdn.net/qq_34807908/article/details/81539536](https://blog.csdn.net/qq_34807908/article/details/81539536) 360 | 361 | [3] [https://blog.csdn.net/lovenankai/article/details/99966142](https://blog.csdn.net/lovenankai/article/details/99966142) 362 | 363 | [4] [https://www.jianshu.com/p/d1f696266814](https://www.jianshu.com/p/d1f696266814) 364 | 365 | [5] [http://cran.fhcrc.org/web/packages/xgboost/vignettes/xgboost.pdf](http://cran.fhcrc.org/web/packages/xgboost/vignettes/xgboost.pdf) 366 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/数据探索-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "id": "0d97411e", 7 | "metadata": {}, 8 | "outputs": [], 9 | "source": [ 10 | "import numpy as np\n", 11 | "import pandas as pd\n", 12 | "import matplotlib.pyplot as plt\n", 13 | "import seaborn as sns\n", 14 | "from scipy import stats\n", 15 | "import warnings\n", 16 | "\n", 17 | "warnings.filterwarnings(\"ignore\")" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": 2, 23 | "id": "30627d48", 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "#导入数据\n", 28 | "train_data = pd.read_csv(\"../DataMining/data_format1/train_format1.csv\")\n", 29 | "test_data = pd.read_csv(\"../DataMining/data_format1/test_format1.csv\")\n", 30 | "\n", 31 | "user_info = pd.read_csv(\"../DataMining/data_format1/user_info_format1.csv\")\n", 32 | "user_log = pd.read_csv(\"../DataMining/data_format1/user_log_format1.csv\")" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 3, 38 | "id": "7005f9dd", 39 | "metadata": {}, 40 | "outputs": [ 41 | { 42 | "data": { 43 | "text/plain": [ 44 | "(424170, 3)" 45 | ] 46 | }, 47 | "execution_count": 3, 48 | "metadata": {}, 49 | "output_type": "execute_result" 50 | } 51 | ], 52 | "source": [ 53 | "#1.查看用户信息缺失值-年龄值\n", 54 | "#shape大小:\n", 55 | "user_info.shape" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 4, 61 | "id": "ee333a9d", 62 | "metadata": {}, 63 | "outputs": [ 64 | { 65 | "data": { 66 | "text/plain": [ 67 | "421953" 68 | ] 69 | }, 70 | "execution_count": 4, 71 | "metadata": {}, 72 | "output_type": "execute_result" 73 | } 74 | ], 75 | "source": [ 76 | "#年龄数据总个数:\n", 77 | "user_info[\"age_range\"].count()" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 5, 83 | "id": "c2488c3c", 84 | "metadata": {}, 85 | "outputs": [ 86 | { 87 | "data": { 88 | "text/plain": [ 89 | "0.005226677982884221" 90 | ] 91 | }, 92 | "execution_count": 5, 93 | "metadata": {}, 94 | "output_type": "execute_result" 95 | } 96 | ], 97 | "source": [ 98 | "#缺失率查看:\n", 99 | "(user_info.shape[0]-user_info[\"age_range\"].count())/user_info.shape[0]" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 6, 105 | "id": "8bbe97c9", 106 | "metadata": {}, 107 | "outputs": [ 108 | { 109 | "data": { 110 | "text/plain": [ 111 | "user_id 95131\n", 112 | "age_range 92914\n", 113 | "gender 90664\n", 114 | "dtype: int64" 115 | ] 116 | }, 117 | "execution_count": 6, 118 | "metadata": {}, 119 | "output_type": "execute_result" 120 | } 121 | ], 122 | "source": [ 123 | "##当年龄为空或者等于0时默认为缺失\n", 124 | "#缺失值查看:\n", 125 | "user_info[user_info['age_range'].isna()|(user_info['age_range']==0)].count()" 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": 7, 131 | "id": "34af97b9", 132 | "metadata": {}, 133 | "outputs": [ 134 | { 135 | "data": { 136 | "text/html": [ 137 | "
\n", 138 | "\n", 151 | "\n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | "
user_id
age_range
0.092914
1.024
2.052871
3.0111654
4.079991
5.040777
6.035464
7.06992
8.01266
\n", 201 | "
" 202 | ], 203 | "text/plain": [ 204 | " user_id\n", 205 | "age_range \n", 206 | "0.0 92914\n", 207 | "1.0 24\n", 208 | "2.0 52871\n", 209 | "3.0 111654\n", 210 | "4.0 79991\n", 211 | "5.0 40777\n", 212 | "6.0 35464\n", 213 | "7.0 6992\n", 214 | "8.0 1266" 215 | ] 216 | }, 217 | "execution_count": 7, 218 | "metadata": {}, 219 | "output_type": "execute_result" 220 | } 221 | ], 222 | "source": [ 223 | "#数据分组查看:\n", 224 | "user_info.groupby(['age_range'])[['user_id']].count()" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 8, 230 | "id": "02c03a0f", 231 | "metadata": {}, 232 | "outputs": [ 233 | { 234 | "data": { 235 | "text/plain": [ 236 | "2217" 237 | ] 238 | }, 239 | "execution_count": 8, 240 | "metadata": {}, 241 | "output_type": "execute_result" 242 | } 243 | ], 244 | "source": [ 245 | "#空值查看:\n", 246 | "user_info.shape[0]-user_info[\"age_range\"].count()" 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": 9, 252 | "id": "b25292e3", 253 | "metadata": {}, 254 | "outputs": [ 255 | { 256 | "data": { 257 | "text/plain": [ 258 | "0.01517316170403376" 259 | ] 260 | }, 261 | "execution_count": 9, 262 | "metadata": {}, 263 | "output_type": "execute_result" 264 | } 265 | ], 266 | "source": [ 267 | "##2.查看用户信息数据的缺失——性别值\n", 268 | "#缺失率查看:\n", 269 | "(user_info.shape[0] - user_info[\"gender\"].count()) / user_info.shape[0]" 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": 10, 275 | "id": "7d86b971", 276 | "metadata": {}, 277 | "outputs": [ 278 | { 279 | "data": { 280 | "text/plain": [ 281 | "user_id 16862\n", 282 | "age_range 14664\n", 283 | "gender 10426\n", 284 | "dtype: int64" 285 | ] 286 | }, 287 | "execution_count": 10, 288 | "metadata": {}, 289 | "output_type": "execute_result" 290 | } 291 | ], 292 | "source": [ 293 | "# 当性别为空或者等于2时默认为缺失\n", 294 | "# 缺失值查看:\n", 295 | "user_info[user_info['gender'].isna() | (user_info['gender'] == 2)].count()" 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": 11, 301 | "id": "f294b0d4", 302 | "metadata": {}, 303 | "outputs": [ 304 | { 305 | "data": { 306 | "text/html": [ 307 | "
\n", 308 | "\n", 321 | "\n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | "
user_id
gender
0.0285638
1.0121670
2.010426
\n", 347 | "
" 348 | ], 349 | "text/plain": [ 350 | " user_id\n", 351 | "gender \n", 352 | "0.0 285638\n", 353 | "1.0 121670\n", 354 | "2.0 10426" 355 | ] 356 | }, 357 | "execution_count": 11, 358 | "metadata": {}, 359 | "output_type": "execute_result" 360 | } 361 | ], 362 | "source": [ 363 | "#数据分组查看:\n", 364 | "user_info.groupby(['gender'])[['user_id']].count()" 365 | ] 366 | }, 367 | { 368 | "cell_type": "code", 369 | "execution_count": 12, 370 | "id": "7cbe3a5f", 371 | "metadata": {}, 372 | "outputs": [ 373 | { 374 | "data": { 375 | "text/plain": [ 376 | "6436" 377 | ] 378 | }, 379 | "execution_count": 12, 380 | "metadata": {}, 381 | "output_type": "execute_result" 382 | } 383 | ], 384 | "source": [ 385 | "#空值查看:\n", 386 | "user_info.shape[0] - user_info[\"gender\"].count()" 387 | ] 388 | }, 389 | { 390 | "cell_type": "code", 391 | "execution_count": 13, 392 | "id": "c6b8e6da", 393 | "metadata": {}, 394 | "outputs": [ 395 | { 396 | "data": { 397 | "text/plain": [ 398 | "user_id 106330\n", 399 | "age_range 104113\n", 400 | "gender 99894\n", 401 | "dtype: int64" 402 | ] 403 | }, 404 | "execution_count": 13, 405 | "metadata": {}, 406 | "output_type": "execute_result" 407 | } 408 | ], 409 | "source": [ 410 | "# 查看用户信息数据的缺失——年龄或性别:\n", 411 | "user_info[user_info['age_range'].isna() | (user_info['age_range'] == 0) | user_info['gender'].isna() | (user_info['gender'] == 2)].count()" 412 | ] 413 | }, 414 | { 415 | "cell_type": "code", 416 | "execution_count": 14, 417 | "id": "e1b779ef", 418 | "metadata": {}, 419 | "outputs": [ 420 | { 421 | "data": { 422 | "text/plain": [ 423 | "user_id 0\n", 424 | "item_id 0\n", 425 | "cat_id 0\n", 426 | "seller_id 0\n", 427 | "brand_id 91015\n", 428 | "time_stamp 0\n", 429 | "action_type 0\n", 430 | "dtype: int64" 431 | ] 432 | }, 433 | "execution_count": 14, 434 | "metadata": {}, 435 | "output_type": "execute_result" 436 | } 437 | ], 438 | "source": [ 439 | "#3.查看用户信息数据的缺失——用户行为日志数据缺失\n", 440 | "user_log.isna().sum()" 441 | ] 442 | }, 443 | { 444 | "cell_type": "code", 445 | "execution_count": 15, 446 | "id": "88a6aa39", 447 | "metadata": {}, 448 | "outputs": [ 449 | { 450 | "data": { 451 | "text/html": [ 452 | "
\n", 453 | "\n", 466 | "\n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | "
user_idage_rangegender
count424170.000000421953.000000417734.000000
mean212085.5000002.9302620.341179
std122447.4761781.9429780.524112
min1.0000000.0000000.000000
25%106043.2500002.0000000.000000
50%212085.5000003.0000000.000000
75%318127.7500004.0000001.000000
max424170.0000008.0000002.000000
\n", 526 | "
" 527 | ], 528 | "text/plain": [ 529 | " user_id age_range gender\n", 530 | "count 424170.000000 421953.000000 417734.000000\n", 531 | "mean 212085.500000 2.930262 0.341179\n", 532 | "std 122447.476178 1.942978 0.524112\n", 533 | "min 1.000000 0.000000 0.000000\n", 534 | "25% 106043.250000 2.000000 0.000000\n", 535 | "50% 212085.500000 3.000000 0.000000\n", 536 | "75% 318127.750000 4.000000 1.000000\n", 537 | "max 424170.000000 8.000000 2.000000" 538 | ] 539 | }, 540 | "execution_count": 15, 541 | "metadata": {}, 542 | "output_type": "execute_result" 543 | } 544 | ], 545 | "source": [ 546 | "#查看user_info基本数据描述:\n", 547 | "user_info.describe()" 548 | ] 549 | }, 550 | { 551 | "cell_type": "code", 552 | "execution_count": 16, 553 | "id": "3ff86b41", 554 | "metadata": {}, 555 | "outputs": [ 556 | { 557 | "data": { 558 | "text/html": [ 559 | "
\n", 560 | "\n", 573 | "\n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | "
user_iditem_idcat_idseller_idbrand_idtime_stampaction_type
count5.492533e+075.492533e+075.492533e+075.492533e+075.483432e+075.492533e+075.492533e+07
mean2.121568e+055.538613e+058.770308e+022.470941e+034.153348e+039.230953e+022.854458e-01
std1.222872e+053.221459e+054.486269e+021.473310e+032.397679e+031.954305e+028.075806e-01
min1.000000e+001.000000e+001.000000e+001.000000e+001.000000e+005.110000e+020.000000e+00
25%1.063360e+052.731680e+055.550000e+021.151000e+032.027000e+037.300000e+020.000000e+00
50%2.126540e+055.555290e+058.210000e+022.459000e+034.065000e+031.010000e+030.000000e+00
75%3.177500e+058.306890e+051.252000e+033.760000e+036.196000e+031.109000e+030.000000e+00
max4.241700e+051.113166e+061.671000e+034.995000e+038.477000e+031.112000e+033.000000e+00
\n", 669 | "
" 670 | ], 671 | "text/plain": [ 672 | " user_id item_id cat_id seller_id brand_id \\\n", 673 | "count 5.492533e+07 5.492533e+07 5.492533e+07 5.492533e+07 5.483432e+07 \n", 674 | "mean 2.121568e+05 5.538613e+05 8.770308e+02 2.470941e+03 4.153348e+03 \n", 675 | "std 1.222872e+05 3.221459e+05 4.486269e+02 1.473310e+03 2.397679e+03 \n", 676 | "min 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 \n", 677 | "25% 1.063360e+05 2.731680e+05 5.550000e+02 1.151000e+03 2.027000e+03 \n", 678 | "50% 2.126540e+05 5.555290e+05 8.210000e+02 2.459000e+03 4.065000e+03 \n", 679 | "75% 3.177500e+05 8.306890e+05 1.252000e+03 3.760000e+03 6.196000e+03 \n", 680 | "max 4.241700e+05 1.113166e+06 1.671000e+03 4.995000e+03 8.477000e+03 \n", 681 | "\n", 682 | " time_stamp action_type \n", 683 | "count 5.492533e+07 5.492533e+07 \n", 684 | "mean 9.230953e+02 2.854458e-01 \n", 685 | "std 1.954305e+02 8.075806e-01 \n", 686 | "min 5.110000e+02 0.000000e+00 \n", 687 | "25% 7.300000e+02 0.000000e+00 \n", 688 | "50% 1.010000e+03 0.000000e+00 \n", 689 | "75% 1.109000e+03 0.000000e+00 \n", 690 | "max 1.112000e+03 3.000000e+00 " 691 | ] 692 | }, 693 | "execution_count": 16, 694 | "metadata": {}, 695 | "output_type": "execute_result" 696 | } 697 | ], 698 | "source": [ 699 | "#查看user_log基本数据描述:\n", 700 | "user_log.describe()" 701 | ] 702 | }, 703 | { 704 | "cell_type": "code", 705 | "execution_count": null, 706 | "id": "d48b1f40", 707 | "metadata": {}, 708 | "outputs": [], 709 | "source": [] 710 | } 711 | ], 712 | "metadata": { 713 | "kernelspec": { 714 | "display_name": "Python 3", 715 | "language": "python", 716 | "name": "python3" 717 | }, 718 | "language_info": { 719 | "codemirror_mode": { 720 | "name": "ipython", 721 | "version": 3 722 | }, 723 | "file_extension": ".py", 724 | "mimetype": "text/x-python", 725 | "name": "python", 726 | "nbconvert_exporter": "python", 727 | "pygments_lexer": "ipython3", 728 | "version": "3.8.8" 729 | } 730 | }, 731 | "nbformat": 4, 732 | "nbformat_minor": 5 733 | } 734 | -------------------------------------------------------------------------------- /测试数据特征处理与填充.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "id": "6c451add", 7 | "metadata": {}, 8 | "outputs": [], 9 | "source": [ 10 | "#导包\n", 11 | "import numpy as np\n", 12 | "import pandas as pd\n", 13 | "import matplotlib.pyplot as plt\n", 14 | "plt.rcParams[\"font.sans-serif\"] = \"SimHei\" #解决中文乱码问题\n", 15 | "import seaborn as sns\n", 16 | "import random\n", 17 | "from sklearn.model_selection import train_test_split\n", 18 | "from sklearn.linear_model import LogisticRegression\n", 19 | "from sklearn.preprocessing import LabelEncoder\n", 20 | "from sklearn.metrics import accuracy_score\n", 21 | "from sklearn import model_selection\n", 22 | "from sklearn.neighbors import KNeighborsRegressor" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 2, 28 | "id": "33a0082f", 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "df_test = pd.read_csv(r'../DataMining/data_format1\\test_format1.csv')\n", 33 | "user_info = pd.read_csv(r'../DataMining/data_format1\\user_info_format1.csv')\n", 34 | "user_log = pd.read_csv(r'../DataMining/data_format1\\user_log_format1.csv')" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 3, 40 | "id": "d2c315d6", 41 | "metadata": {}, 42 | "outputs": [ 43 | { 44 | "name": "stdout", 45 | "output_type": "stream", 46 | "text": [ 47 | "\n", 48 | "RangeIndex: 424170 entries, 0 to 424169\n", 49 | "Data columns (total 3 columns):\n", 50 | " # Column Non-Null Count Dtype \n", 51 | "--- ------ -------------- ----- \n", 52 | " 0 user_id 424170 non-null int64 \n", 53 | " 1 age_range 329039 non-null float64\n", 54 | " 2 gender 407308 non-null float64\n", 55 | "dtypes: float64(2), int64(1)\n", 56 | "memory usage: 9.7 MB\n" 57 | ] 58 | } 59 | ], 60 | "source": [ 61 | "#使用空值去替换\n", 62 | "user_info['age_range'].replace(0.0,np.nan,inplace=True)\n", 63 | "user_info['gender'].replace(2.0,np.nan,inplace=True)\n", 64 | "user_info.info()" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 4, 70 | "id": "d5b34bee", 71 | "metadata": {}, 72 | "outputs": [], 73 | "source": [ 74 | "user_info['age_range'].replace(np.nan,-1,inplace=True)\n", 75 | "user_info['gender'].replace(np.nan,-1,inplace=True)\n", 76 | "# user_info['age_range'].replace(np.nan,1,inplace=True)\n", 77 | "# user_info['gender'].replace(np.nan,0,inplace=True)" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 5, 83 | "id": "dc6e724d", 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [ 87 | "#特征值合并\n", 88 | "\n", 89 | "df_test = pd.merge(df_test,user_info,on=\"user_id\",how=\"left\")\n", 90 | " \n", 91 | "total_logs_temp = user_log.groupby([user_log[\"user_id\"],user_log[\"seller_id\"]])[\"item_id\"].count().reset_index()\n", 92 | " \n", 93 | "total_logs_temp.rename(columns={\"seller_id\":\"merchant_id\",\"item_id\":\"total_item_id\"},inplace=True)\n", 94 | " \n", 95 | "df_test = pd.merge(df_test,total_logs_temp,on=[\"user_id\",\"merchant_id\"],how=\"left\")\n", 96 | " \n", 97 | "unique_item_id = user_log.groupby([\"user_id\",\"seller_id\",\"item_id\"]).count().reset_index()[[\"user_id\",\"seller_id\",\"item_id\"]]\n", 98 | " \n", 99 | "unique_item_id_cnt = unique_item_id.groupby([\"user_id\",\"seller_id\"]).count().reset_index()\n", 100 | " \n", 101 | "unique_item_id_cnt.rename(columns={\"seller_id\":\"merchant_id\",\"item_id\":\"unique_item_id\"},inplace=True)\n", 102 | " \n", 103 | "df_test = pd.merge(df_test, unique_item_id_cnt, on=[\"user_id\", \"merchant_id\"], how=\"left\")\n", 104 | " \n", 105 | "cat_id_temp = user_log.groupby([\"user_id\", \"seller_id\", \"cat_id\"]).count().reset_index()[[\"user_id\", \"seller_id\", \"cat_id\"]]\n", 106 | " \n", 107 | "cat_id_temp_cnt = cat_id_temp.groupby([\"user_id\", \"seller_id\"]).count().reset_index()\n", 108 | " \n", 109 | "cat_id_temp_cnt.rename(columns={\"seller_id\":\"merchant_id\",\"cat_id\":\"total_cat_id\"},inplace=True)\n", 110 | " \n", 111 | "df_test = pd.merge(df_test, cat_id_temp_cnt, on=[\"user_id\", \"merchant_id\"], how=\"left\")\n", 112 | " \n", 113 | "time_temp = user_log.groupby([\"user_id\", \"seller_id\", \"time_stamp\"]).count().reset_index()[[\"user_id\", \"seller_id\", \"time_stamp\"]]\n", 114 | " \n", 115 | "time_temp_cnt = time_temp.groupby([\"user_id\", \"seller_id\"]).count().reset_index()\n", 116 | " \n", 117 | "time_temp_cnt.rename(columns={\"seller_id\":\"merchant_id\",\"time_stamp\":\"total_time_temp\"},inplace=True)\n", 118 | " \n", 119 | "df_test = pd.merge(df_test, time_temp_cnt, on=[\"user_id\", \"merchant_id\"], how=\"left\")\n", 120 | " \n", 121 | "click_temp = user_log.groupby([\"user_id\", \"seller_id\", \"action_type\"])[\"item_id\"].count().reset_index()\n", 122 | " \n", 123 | "click_temp.rename(columns={\"seller_id\":\"merchant_id\",\"item_id\":\"times\"},inplace=True)\n", 124 | " \n", 125 | "click_temp[\"clicks\"] = click_temp[\"action_type\"] == 0\n", 126 | " \n", 127 | "click_temp[\"clicks\"] = click_temp[\"clicks\"] * click_temp[\"times\"]\n", 128 | " \n", 129 | "click_temp[\"shopping_cart\"] = click_temp[\"action_type\"] == 1\n", 130 | "click_temp[\"shopping_cart\"] = click_temp[\"shopping_cart\"] * click_temp[\"times\"]\n", 131 | " \n", 132 | "click_temp[\"purchases\"] = click_temp[\"action_type\"] == 2\n", 133 | "click_temp[\"purchases\"] = click_temp[\"purchases\"] * click_temp[\"times\"]\n", 134 | " \n", 135 | "click_temp[\"favourites\"] = click_temp[\"action_type\"] == 3\n", 136 | "click_temp[\"favourites\"] = click_temp[\"favourites\"] * click_temp[\"times\"]\n", 137 | " \n", 138 | "four_features = click_temp.groupby([\"user_id\", \"merchant_id\"]).sum().reset_index()\n", 139 | " \n", 140 | "#删除相关列\n", 141 | "four_features = four_features.drop([\"action_type\", \"times\"], axis=1)\n", 142 | " \n", 143 | "#合并\n", 144 | "df_test = pd.merge(df_test, four_features, on=[\"user_id\", \"merchant_id\"], how=\"left\")\n", 145 | " \n", 146 | "#缺失值向前填充\n", 147 | "df_test = df_test.fillna(method=\"ffill\")\n", 148 | " " 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": 6, 154 | "id": "43c3bf05", 155 | "metadata": {}, 156 | "outputs": [ 157 | { 158 | "data": { 159 | "text/html": [ 160 | "
\n", 161 | "\n", 174 | "\n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | "
user_idmerchant_idprobage_rangegendertotal_item_idunique_item_idtotal_cat_idtotal_time_tempclicksshopping_cartpurchasesfavourites
01639684605NaN-1.00.021111010
13605761581NaN2.0-1.0109415050
2986881964NaN6.00.061115010
3986883645NaN6.00.01111110010
42952963361NaN2.01.05084547012
..........................................
2614722284793111NaN6.00.052124010
261473979192341NaN8.01.021111010
261474979193971NaN8.01.01652312040
261475326393536NaN-1.00.032112010
261476326393319NaN-1.00.01111210010
\n", 372 | "

261477 rows × 13 columns

\n", 373 | "
" 374 | ], 375 | "text/plain": [ 376 | " user_id merchant_id prob age_range gender total_item_id \\\n", 377 | "0 163968 4605 NaN -1.0 0.0 2 \n", 378 | "1 360576 1581 NaN 2.0 -1.0 10 \n", 379 | "2 98688 1964 NaN 6.0 0.0 6 \n", 380 | "3 98688 3645 NaN 6.0 0.0 11 \n", 381 | "4 295296 3361 NaN 2.0 1.0 50 \n", 382 | "... ... ... ... ... ... ... \n", 383 | "261472 228479 3111 NaN 6.0 0.0 5 \n", 384 | "261473 97919 2341 NaN 8.0 1.0 2 \n", 385 | "261474 97919 3971 NaN 8.0 1.0 16 \n", 386 | "261475 32639 3536 NaN -1.0 0.0 3 \n", 387 | "261476 32639 3319 NaN -1.0 0.0 11 \n", 388 | "\n", 389 | " unique_item_id total_cat_id total_time_temp clicks shopping_cart \\\n", 390 | "0 1 1 1 1 0 \n", 391 | "1 9 4 1 5 0 \n", 392 | "2 1 1 1 5 0 \n", 393 | "3 1 1 1 10 0 \n", 394 | "4 8 4 5 47 0 \n", 395 | "... ... ... ... ... ... \n", 396 | "261472 2 1 2 4 0 \n", 397 | "261473 1 1 1 1 0 \n", 398 | "261474 5 2 3 12 0 \n", 399 | "261475 2 1 1 2 0 \n", 400 | "261476 1 1 2 10 0 \n", 401 | "\n", 402 | " purchases favourites \n", 403 | "0 1 0 \n", 404 | "1 5 0 \n", 405 | "2 1 0 \n", 406 | "3 1 0 \n", 407 | "4 1 2 \n", 408 | "... ... ... \n", 409 | "261472 1 0 \n", 410 | "261473 1 0 \n", 411 | "261474 4 0 \n", 412 | "261475 1 0 \n", 413 | "261476 1 0 \n", 414 | "\n", 415 | "[261477 rows x 13 columns]" 416 | ] 417 | }, 418 | "execution_count": 6, 419 | "metadata": {}, 420 | "output_type": "execute_result" 421 | } 422 | ], 423 | "source": [ 424 | "df_test" 425 | ] 426 | }, 427 | { 428 | "cell_type": "code", 429 | "execution_count": 7, 430 | "id": "fa6f95a9", 431 | "metadata": {}, 432 | "outputs": [ 433 | { 434 | "data": { 435 | "text/html": [ 436 | "
\n", 437 | "\n", 450 | "\n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | "
age_rangegendertotal_item_idunique_item_idtotal_cat_idtotal_time_tempclicksshopping_cartpurchasesfavourites
0-1.00.021111010
12.0-1.0109415050
26.00.061115010
36.00.01111110010
42.01.05084547012
.................................
2614726.00.052124010
2614738.01.021111010
2614748.01.01652312040
261475-1.00.032112010
261476-1.00.01111210010
\n", 612 | "

261477 rows × 10 columns

\n", 613 | "
" 614 | ], 615 | "text/plain": [ 616 | " age_range gender total_item_id unique_item_id total_cat_id \\\n", 617 | "0 -1.0 0.0 2 1 1 \n", 618 | "1 2.0 -1.0 10 9 4 \n", 619 | "2 6.0 0.0 6 1 1 \n", 620 | "3 6.0 0.0 11 1 1 \n", 621 | "4 2.0 1.0 50 8 4 \n", 622 | "... ... ... ... ... ... \n", 623 | "261472 6.0 0.0 5 2 1 \n", 624 | "261473 8.0 1.0 2 1 1 \n", 625 | "261474 8.0 1.0 16 5 2 \n", 626 | "261475 -1.0 0.0 3 2 1 \n", 627 | "261476 -1.0 0.0 11 1 1 \n", 628 | "\n", 629 | " total_time_temp clicks shopping_cart purchases favourites \n", 630 | "0 1 1 0 1 0 \n", 631 | "1 1 5 0 5 0 \n", 632 | "2 1 5 0 1 0 \n", 633 | "3 1 10 0 1 0 \n", 634 | "4 5 47 0 1 2 \n", 635 | "... ... ... ... ... ... \n", 636 | "261472 2 4 0 1 0 \n", 637 | "261473 1 1 0 1 0 \n", 638 | "261474 3 12 0 4 0 \n", 639 | "261475 1 2 0 1 0 \n", 640 | "261476 2 10 0 1 0 \n", 641 | "\n", 642 | "[261477 rows x 10 columns]" 643 | ] 644 | }, 645 | "execution_count": 7, 646 | "metadata": {}, 647 | "output_type": "execute_result" 648 | } 649 | ], 650 | "source": [ 651 | "#测试数据预处理\n", 652 | "# y = df_train[\"label\"]\n", 653 | "X = df_test.drop([\"user_id\", \"merchant_id\", \"prob\"], axis=1)\n", 654 | "# X['age_range'].replace(-1,3,inplace=True)\n", 655 | "# X['gender'].replace(-1,0,inplace=True)\n", 656 | "X" 657 | ] 658 | }, 659 | { 660 | "cell_type": "code", 661 | "execution_count": 8, 662 | "id": "0d5814e4", 663 | "metadata": {}, 664 | "outputs": [], 665 | "source": [ 666 | "#将构建好的特征保存\n", 667 | "X.to_csv(\"test_data.csv\",index=None)" 668 | ] 669 | }, 670 | { 671 | "cell_type": "code", 672 | "execution_count": null, 673 | "id": "d54567e6", 674 | "metadata": {}, 675 | "outputs": [], 676 | "source": [] 677 | } 678 | ], 679 | "metadata": { 680 | "kernelspec": { 681 | "display_name": "Python 3", 682 | "language": "python", 683 | "name": "python3" 684 | }, 685 | "language_info": { 686 | "codemirror_mode": { 687 | "name": "ipython", 688 | "version": 3 689 | }, 690 | "file_extension": ".py", 691 | "mimetype": "text/x-python", 692 | "name": "python", 693 | "nbconvert_exporter": "python", 694 | "pygments_lexer": "ipython3", 695 | "version": "3.8.8" 696 | } 697 | }, 698 | "nbformat": 4, 699 | "nbformat_minor": 5 700 | } 701 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/预测建模-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 10, 6 | "id": "8035b6a2", 7 | "metadata": {}, 8 | "outputs": [], 9 | "source": [ 10 | "#导包\n", 11 | "import numpy as np\n", 12 | "import pandas as pd\n", 13 | "import matplotlib.pyplot as plt\n", 14 | "plt.rcParams[\"font.sans-serif\"] = \"SimHei\" #解决中文乱码问题\n", 15 | "import seaborn as sns\n", 16 | "import random\n", 17 | "from sklearn.model_selection import train_test_split\n", 18 | "from sklearn.linear_model import LogisticRegression\n", 19 | "from sklearn.preprocessing import LabelEncoder\n", 20 | "from sklearn.metrics import accuracy_score\n", 21 | "from sklearn import model_selection\n", 22 | "from sklearn.neighbors import KNeighborsRegressor" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 11, 28 | "id": "4e929396", 29 | "metadata": {}, 30 | "outputs": [ 31 | { 32 | "data": { 33 | "text/html": [ 34 | "
\n", 35 | "\n", 48 | "\n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | "
user_idmerchant_idprob
01639684605NaN
13605761581NaN
2986881964NaN
3986883645NaN
42952963361NaN
............
2614722284793111NaN
261473979192341NaN
261474979193971NaN
261475326393536NaN
261476326393319NaN
\n", 126 | "

261477 rows × 3 columns

\n", 127 | "
" 128 | ], 129 | "text/plain": [ 130 | " user_id merchant_id prob\n", 131 | "0 163968 4605 NaN\n", 132 | "1 360576 1581 NaN\n", 133 | "2 98688 1964 NaN\n", 134 | "3 98688 3645 NaN\n", 135 | "4 295296 3361 NaN\n", 136 | "... ... ... ...\n", 137 | "261472 228479 3111 NaN\n", 138 | "261473 97919 2341 NaN\n", 139 | "261474 97919 3971 NaN\n", 140 | "261475 32639 3536 NaN\n", 141 | "261476 32639 3319 NaN\n", 142 | "\n", 143 | "[261477 rows x 3 columns]" 144 | ] 145 | }, 146 | "execution_count": 11, 147 | "metadata": {}, 148 | "output_type": "execute_result" 149 | } 150 | ], 151 | "source": [ 152 | "#读取数据\n", 153 | "df_train = pd.read_csv(r'df_train.csv')\n", 154 | "df_test = pd.read_csv(r'../DataMining/data_format1\\test_format1.csv')\n", 155 | "df_test\n" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 12, 161 | "id": "16970677", 162 | "metadata": {}, 163 | "outputs": [ 164 | { 165 | "data": { 166 | "text/html": [ 167 | "
\n", 168 | "\n", 181 | "\n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | "
age_rangegendertotal_item_idunique_item_idtotal_cat_idtotal_time_tempclicksshopping_cartpurchasesfavourites
06.00.039206936012
16.00.01411313010
26.00.01821212060
36.00.021111010
4-1.00.081137010
54.01.011110010
65.00.032112010
75.00.0834815378050
85.00.074116010
94.01.041122011
\n", 330 | "
" 331 | ], 332 | "text/plain": [ 333 | " age_range gender total_item_id unique_item_id total_cat_id \\\n", 334 | "0 6.0 0.0 39 20 6 \n", 335 | "1 6.0 0.0 14 1 1 \n", 336 | "2 6.0 0.0 18 2 1 \n", 337 | "3 6.0 0.0 2 1 1 \n", 338 | "4 -1.0 0.0 8 1 1 \n", 339 | "5 4.0 1.0 1 1 1 \n", 340 | "6 5.0 0.0 3 2 1 \n", 341 | "7 5.0 0.0 83 48 15 \n", 342 | "8 5.0 0.0 7 4 1 \n", 343 | "9 4.0 1.0 4 1 1 \n", 344 | "\n", 345 | " total_time_temp clicks shopping_cart purchases favourites \n", 346 | "0 9 36 0 1 2 \n", 347 | "1 3 13 0 1 0 \n", 348 | "2 2 12 0 6 0 \n", 349 | "3 1 1 0 1 0 \n", 350 | "4 3 7 0 1 0 \n", 351 | "5 1 0 0 1 0 \n", 352 | "6 1 2 0 1 0 \n", 353 | "7 3 78 0 5 0 \n", 354 | "8 1 6 0 1 0 \n", 355 | "9 2 2 0 1 1 " 356 | ] 357 | }, 358 | "execution_count": 12, 359 | "metadata": {}, 360 | "output_type": "execute_result" 361 | } 362 | ], 363 | "source": [ 364 | "#建模前预处理\n", 365 | "y = df_train[\"label\"]\n", 366 | "X = df_train.drop([\"user_id\", \"merchant_id\", \"label\"], axis=1)\n", 367 | "X.head(10)\n" 368 | ] 369 | }, 370 | { 371 | "cell_type": "code", 372 | "execution_count": 13, 373 | "id": "889e9034", 374 | "metadata": {}, 375 | "outputs": [], 376 | "source": [ 377 | "#分割数据\n", 378 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=8)" 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": 14, 384 | "id": "b66e524a", 385 | "metadata": {}, 386 | "outputs": [ 387 | { 388 | "data": { 389 | "text/html": [ 390 | "
\n", 391 | "\n", 404 | "\n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | "
age_rangegendertotal_item_idunique_item_idtotal_cat_idtotal_time_tempclicksshopping_cartpurchasesfavourites
0-1.00.021111010
12.0-1.0109415050
26.00.061115010
36.00.01111110010
42.01.05084547012
.................................
2614726.00.052124010
2614738.01.021111010
2614748.01.01652312040
261475-1.00.032112010
261476-1.00.01111210010
\n", 566 | "

261477 rows × 10 columns

\n", 567 | "
" 568 | ], 569 | "text/plain": [ 570 | " age_range gender total_item_id unique_item_id total_cat_id \\\n", 571 | "0 -1.0 0.0 2 1 1 \n", 572 | "1 2.0 -1.0 10 9 4 \n", 573 | "2 6.0 0.0 6 1 1 \n", 574 | "3 6.0 0.0 11 1 1 \n", 575 | "4 2.0 1.0 50 8 4 \n", 576 | "... ... ... ... ... ... \n", 577 | "261472 6.0 0.0 5 2 1 \n", 578 | "261473 8.0 1.0 2 1 1 \n", 579 | "261474 8.0 1.0 16 5 2 \n", 580 | "261475 -1.0 0.0 3 2 1 \n", 581 | "261476 -1.0 0.0 11 1 1 \n", 582 | "\n", 583 | " total_time_temp clicks shopping_cart purchases favourites \n", 584 | "0 1 1 0 1 0 \n", 585 | "1 1 5 0 5 0 \n", 586 | "2 1 5 0 1 0 \n", 587 | "3 1 10 0 1 0 \n", 588 | "4 5 47 0 1 2 \n", 589 | "... ... ... ... ... ... \n", 590 | "261472 2 4 0 1 0 \n", 591 | "261473 1 1 0 1 0 \n", 592 | "261474 3 12 0 4 0 \n", 593 | "261475 1 2 0 1 0 \n", 594 | "261476 2 10 0 1 0 \n", 595 | "\n", 596 | "[261477 rows x 10 columns]" 597 | ] 598 | }, 599 | "execution_count": 14, 600 | "metadata": {}, 601 | "output_type": "execute_result" 602 | } 603 | ], 604 | "source": [ 605 | "#加载最终测试数据\n", 606 | "test_data= pd.read_csv(r'test_data.csv')\n", 607 | "test_data\n" 608 | ] 609 | }, 610 | { 611 | "cell_type": "code", 612 | "execution_count": 20, 613 | "id": "bede42d0", 614 | "metadata": {}, 615 | "outputs": [ 616 | { 617 | "name": "stdout", 618 | "output_type": "stream", 619 | "text": [ 620 | "(52173,)\n", 621 | "[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]\n", 622 | "[[0.92242829 0.07757171]\n", 623 | " [0.95384761 0.04615239]\n", 624 | " [0.93995785 0.06004215]\n", 625 | " ...\n", 626 | " [0.94603563 0.05396437]\n", 627 | " [0.86838486 0.13161514]\n", 628 | " [0.95512153 0.04487847]]\n" 629 | ] 630 | }, 631 | { 632 | "data": { 633 | "text/plain": [ 634 | "0.9391831023709581" 635 | ] 636 | }, 637 | "execution_count": 20, 638 | "metadata": {}, 639 | "output_type": "execute_result" 640 | } 641 | ], 642 | "source": [ 643 | "#logistic回归\n", 644 | "Logit = LogisticRegression(solver='liblinear')\n", 645 | "Logit.fit(X_train, y_train)\n", 646 | "Predict = Logit.predict(X_test)\n", 647 | "Predict_proba = Logit.predict_proba(X_test)\n", 648 | "print(Predict.shape)\n", 649 | "print(Predict[0:20])\n", 650 | "print(Predict_proba[:])\n", 651 | "print(\"Accuracy on training set: {:.3f}\".format(Logit.score(X_train, y_train)))\n", 652 | "print(\"Accuracy on test set: {:.3f}\".format(Logit.score(X_test, y_test)))\n", 653 | "Score = accuracy_score(y_test, Predict)\n", 654 | "Score" 655 | ] 656 | }, 657 | { 658 | "cell_type": "code", 659 | "execution_count": 21, 660 | "id": "65787397", 661 | "metadata": {}, 662 | "outputs": [], 663 | "source": [ 664 | "#逻辑回归最终结果获取\n", 665 | "Logit_Ans_Predict_proba = Logit.predict_proba(test_data)\n", 666 | "df_test['prob']=Logit_Ans_Predict_proba[:,1]\n", 667 | "#最终答案保存\n", 668 | "df_test.to_csv(\"Logit_Ans.csv\",index=None)" 669 | ] 670 | }, 671 | { 672 | "cell_type": "code", 673 | "execution_count": 22, 674 | "id": "a37fd1e5", 675 | "metadata": {}, 676 | "outputs": [ 677 | { 678 | "name": "stdout", 679 | "output_type": "stream", 680 | "text": [ 681 | "[[0.89765569 0.10234431]\n", 682 | " [0.9609094 0.0390906 ]\n", 683 | " [0.93901148 0.06098852]\n", 684 | " ...\n", 685 | " [0.92812445 0.07187555]\n", 686 | " [0.89765569 0.10234431]\n", 687 | " [0.9609094 0.0390906 ]]\n", 688 | "Accuracy on training set: 0.939\n", 689 | "Accuracy on test set: 0.939\n" 690 | ] 691 | } 692 | ], 693 | "source": [ 694 | "#决策树\n", 695 | "from sklearn.tree import DecisionTreeClassifier\n", 696 | "tree = DecisionTreeClassifier(max_depth=4,random_state=0) \n", 697 | "tree.fit(X_train, y_train)\n", 698 | "Predict_proba = tree.predict_proba(X_test)\n", 699 | "print(Predict_proba[:])\n", 700 | "print(\"Accuracy on training set: {:.3f}\".format(tree.score(X_train, y_train)))\n", 701 | "print(\"Accuracy on test set: {:.3f}\".format(tree.score(X_test, y_test)))" 702 | ] 703 | }, 704 | { 705 | "cell_type": "code", 706 | "execution_count": 23, 707 | "id": "5ed0c662", 708 | "metadata": {}, 709 | "outputs": [], 710 | "source": [ 711 | "#决策树最终结果获取\n", 712 | "Tree_Ans_Predict_proba = tree.predict_proba(test_data)\n", 713 | "df_test['prob']=Tree_Ans_Predict_proba[:,1]\n", 714 | "#最终答案保存\n", 715 | "df_test.to_csv(\"Tree_Ans.csv\",index=None)" 716 | ] 717 | }, 718 | { 719 | "cell_type": "code", 720 | "execution_count": 28, 721 | "id": "9c002987", 722 | "metadata": {}, 723 | "outputs": [ 724 | { 725 | "name": "stdout", 726 | "output_type": "stream", 727 | "text": [ 728 | "[[0.90345203 0.09654797]\n", 729 | " [0.96242055 0.03757945]\n", 730 | " [0.92398178 0.07601822]\n", 731 | " ...\n", 732 | " [0.91943483 0.08056517]\n", 733 | " [0.86844252 0.13155748]\n", 734 | " [0.9607207 0.0392793 ]]\n", 735 | "Accuracy on training set: 0.939\n", 736 | "Accuracy on test set: 0.939\n" 737 | ] 738 | } 739 | ], 740 | "source": [ 741 | "#随机森林\n", 742 | "from sklearn.ensemble import RandomForestClassifier\n", 743 | "rfc = RandomForestClassifier(n_estimators=100,random_state=90,max_depth=8)\n", 744 | "rfc = rfc.fit(X_train, y_train)\n", 745 | "Predict_proba = rfc.predict_proba(X_test)\n", 746 | "print(Predict_proba[:])\n", 747 | "print(\"Accuracy on training set: {:.3f}\".format(rfc.score(X_train, y_train))) \n", 748 | "print(\"Accuracy on test set: {:.3f}\".format(rfc.score(X_test, y_test)))" 749 | ] 750 | }, 751 | { 752 | "cell_type": "code", 753 | "execution_count": 29, 754 | "id": "55703385", 755 | "metadata": {}, 756 | "outputs": [], 757 | "source": [ 758 | "#随机森林最终结果获取\n", 759 | "RFC_Ans_Predict_proba = rfc.predict_proba(test_data)\n", 760 | "df_test['prob']=RFC_Ans_Predict_proba[:,1]\n", 761 | "#最终答案保存\n", 762 | "df_test.to_csv(\"RFC_Ans.csv\",index=None)" 763 | ] 764 | }, 765 | { 766 | "cell_type": "code", 767 | "execution_count": 27, 768 | "id": "54978d26", 769 | "metadata": {}, 770 | "outputs": [ 771 | { 772 | "name": "stdout", 773 | "output_type": "stream", 774 | "text": [ 775 | "进度: 0\n", 776 | "进度: 10\n", 777 | "进度: 20\n", 778 | "进度: 30\n", 779 | "进度: 40\n", 780 | "进度: 50\n", 781 | "进度: 60\n", 782 | "进度: 70\n", 783 | "进度: 80\n", 784 | "进度: 90\n", 785 | "进度: 100\n", 786 | "进度: 110\n", 787 | "进度: 120\n", 788 | "进度: 130\n", 789 | "进度: 140\n", 790 | "进度: 150\n", 791 | "进度: 160\n", 792 | "进度: 170\n", 793 | "进度: 180\n", 794 | "进度: 190\n", 795 | "最大得分:0.9394897744043854 子树数量为:101\n" 796 | ] 797 | }, 798 | { 799 | "data": { 800 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAD2CAYAAADMHBAjAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAcVUlEQVR4nO3df5BV9X3/8efLVQyCwiIbMo3ijxlo00SxdL8ojdLNN5JgRki+TCtJ/cLU6jh+S5p2nFjMxHQbwaQ4X3Y64zSOfL8k5etoJtSkalErhEEg0yXJQhA12h8z1ThOmbvIIlkwBvX9/eOcZdnLuXvP7t2798J5PWaYPffc9zn3c89e7ms/58fnKCIwMzMrd1ajG2BmZs3JAWFmZpkcEGZmlskBYWZmmRwQZmaW6exGN2Ckpk+fHpdeemmjm2FmdlrZs2fPwYhoG8kyp11AXHrppfT09DS6GWZmpxVJr410Ge9iMjOzTA4IMzPL5IAwM7NMdQ0ISdMkLZQ0vZ6vY2ZmYy9XQEjaIKlb0j0Vnr9M0lOSdklal85rBTYD84Dtktoq1J0t6ReSnkv/XTFG783MzGpQ9SwmSUuBloiYL+nbkmZFxL+Xla0FVkfEbknfk9QBBHBnOq8VmAvcmlF3BPhuRKwau7dlZma1ytOD6AA2pdNbgGszamYDe9PpEjAlInakQbCApBfRnVUHXAPcKOknaU/llNCSdLukHkk9vb29Od+amZnVIs91EJOAN9LpQyQ9gXKPAZ2SdgOLgK8ASBKwDOgDjleo+whwfUT8l6T/B3wGePLklUfEemA9QHt7u8cnt9NLBOzZAz/8IRw7Nvr1tLTANdfAJz4BEyaMXfvMKsgTEP3AxHR6Mhm9johYI+la4C5gY0T0p/MDWClpNbAkq07S/oh4J11VDzCrtrdk1gTeeQeeew6eeAKefBLeSP/Gkka/zoF7t5x/PtxwA3z2s/CZz8DUqbW21ixTnl1MexjcrTQHeLVC3T5gJtAFIGmVpBXpc1OBw1l1wMOS5khqAT4HPJ+z7WbN5fBhePRRWLYM2tpg0SLYuBHmzUt+HjwI778/+n/HjsE//VOy/h074Oabk9e5/np44AH4xS8avQXsDKNqd5STdAGwC9gG3AB8HvjDiLinrO7rwH9ExMPp41aSYxfnAi8CKyMiMuo+BjwKCHgyIr46XHva29vDQ21Y03jttaSX8MQTsHMnvPsuzJgBixcnf+F/8pMwcWL19YzU++/Dj388+NqvvJLMv+qq5HU/+9lkupYei51RJO2JiPYRLZPnlqPpl/1CYGdEHBhl+8aEA8IaKgJ+9rPBL+bn0w7vb/1W8qX8uc8lPYazxvka1H/7t8E2/cu/JO2cOROWLEnatWCBj1sUXN0CopmclgFx7Bhs3QovvAB//Mdw0UXj34b//E946CF4++3xf+0zxdGjsGULvP568pf57/3e4F/rs2c3unWDSiXYvDkJi61bk9/5lCnw6U/Dhz7U6NZZLT75yST0R8EB0Uyy/pMCTJ4Mq1fDF78IZ4/DYLq//jV0dcG998Lx48nr2+i0tMDHP54Ewo03wgc/2OgWVTfwx8kTTyThdvRoo1tktfiLv4DOzlEt6oBotKxu/sUXD/6VefHFyS/4n/852T/80EPJ7oh62bUL7rgDfv5zWLoU/vZvkzaYWeGMJiBOu/tBNJXhDhT+1V9lHyh8+mn4/vfhz/88Oaf9jjvgG98Y21MVDx6Ev/xL+M534JJLkjNfbrxx7NZvZoXgHsRIvf12csHTE08kX7ylUrKr6Pd/PwmEJUuSL+VqjhxJQuSBB5JTFbu64AtfqO2sk/ffh7//e7jrrmT9X/4yfO1rcN55o1+nmZ0RRtOD8HDfeZVK8Ad/ANOnJyGwaRN0dMAjj0BvbxIaf/Zn+cIB4IILkl0+P/1pcrbJzTfDwoXJbqrRePHFJKRuvRV++7dh3z745jcdDmY2ag6IvLZtS3YN3XRTcgyhtxe+9z34oz+qbffQ3LnQ3Q1/93dJWFxxBfz1X8OvfpVv+aNH4e674Xd+B15+Gb797eQiqo9+dPRtMjPDAZFfqZT8XLcuOV3w3HPHbt0tLfCnfwr/+q9JL+XrX0+CYuvW4ZfbvDkJgrVrYcWK5BjILbeM/zn4ZnZG8jdJXgPHGuo57s2HPpTsstq6NTkW8alPJT2UA2XXJr7+enJW0uLFyWmrO3fChg3J7i8zszHigMirVEoOJo/HX+fXXw/79ye7mr7/ffjN30x2Qb3zTtKD+chHkt1cf/M3sHcvXHdd/dtkZoXjgMirVBrfC6M+8IHkgpgXXkiulfjiF5OA+vKXk4PjP/85rFrl4RPMrG4cEHn19iZf0ONt9uzkCthHH4Xf/V34wQ+S02svvXT822JmheIL5fIqleDqqxvz2lJyjcQXvtCY1zezQnIPIq/x3sVkZtZgDog83n4bfvlLB4SZFYoDIo/e3uSnA8LMCqSuASFpmqSFkk7vE/QHLpJzQJhZgeQKCEkbJHVLuqfC85dJekrSLknr0nmtwGZgHrBdUltW3UnrmCHpZzW+n/oY6EE04iwmM7MGqRoQkpYCLRExH7hc0qyMsrXA6oi4DrhIUgdwJXBnRNwHPAvMrVA34H8Ddbh57xhwD8LMCihPD6ID2JRObwGuzaiZDexNp0vAlIjYERG7JS0g6UV0Z9UBSPrvwFEg837Xkm6X1COpp3fgr/nx5IAwswLKExCTgDfS6UPAjIyax4BOSYuBRcA2AEkClgF9wPGsOkkTgK8Bd1dqQESsj4j2iGhva8RunlIJJk6ESZPG/7XNzBokT0D0M7jrZ3LWMhGxBngGuA3YGBH96fyIiJXAfmBJhbq7gW9FxOEa30v9DFwDUcvNfMzMTjN5AmIPg7uV5gCvVqjbB8wEugAkrZK0In1uKnA4qw64Hlgp6TngKkn/N2fbx48vkjOzAsoTEI8DyyV1ATcBL0lak1F3F9AVEcfSx+vT5XYCLSTHL06pi4gFEdERER3Avoi4bdTvpl4aNQ6TmVkDVR2LKSKOpGcbLQTuj4gDwPMZdZ1lj/vSZYatK3uuo2qLG6FUSm7gY2ZWILkG60u/7DdVLTwTRXgXk5kVkofaqOaXv0xu1OOAMLOCcUBU42sgzKygHBDVOCDMrKAcENV4HCYzKygHRDXuQZhZQTkgqhkICPcgzKxgHBDVlEowZQqce26jW2JmNq4cENX4GggzKygHRDUOCDMrKAdENaWSjz+YWSE5IKrp7XUPwswKyQExnPffd0CYWWE5IIZz6FASEg4IMysgB8RwfJGcmRWYA2I4DggzKzAHxHB8FbWZFVhdA0LSNEkLJU0fi7pxNzBQn3sQZlZAuQJC0gZJ3ZLuqfD8ZZKekrRL0rp0XiuwGZgHbJfUlrduLN7YmCiVQIILL2x0S8zMxl3VW45KWgq0RMR8Sd+WNCsi/r2sbC2wOiJ2S/peeg/rAO5M57UCc4Fbc9Y9O1ZvsCalEkyfDi0tjW6Jmdm4y9OD6GDwftRbgGszamYDe9PpEjAlInakX/oLSHoH3SOoG0LS7ZJ6JPX0Duz2GQ8eZsPMCixPQEwC3kinDwEzMmoeAzolLQYWAdsAJAlYBvQBx0dQN0RErI+I9ohobxvPA8YOCDMrsDwB0Q9MTKcnZy0TEWuAZ4DbgI0R0Z/Oj4hYCewHluStq+0tjSGPw2RmBZYnIPYwuFtpDvBqhbp9wEygC0DSKkkr0uemAodHWNd4HmbDzAosT0A8DiyX1AXcBLwkaU1G3V1AV0QcSx+vT5fbCbSQHL8YSV1j/frX0NfngDCzwqp6FlNEHEnPNloI3B8RB4DnM+o6yx73pcuMqq7hDh5MfjogzKygqgYEnPgS31S18EziYTbMrOA81EYlDggzKzgHRCUeh8nMCs4BUYl7EGZWcA6ISnp74ZxzYMqURrfEzKwhHBCVDFxFLTW6JWZmDeGAqMTDbJhZwTkgKnFAmFnBOSAqcUCYWcE5ICrxQH1mVnAOiCxHj8KxY+5BmFmhOSCy+F7UZmYOiEy+SM7MzAGRyQFhZuaAyOSAMDNzQGTyQH1mZg6ITL29MGkSnHdeo1tiZtYwdQ0ISdMkLZQ0vZ6vM+Z8kZyZWb6AkLRBUrekeyo8f5mkpyTtkrQundcKbAbmAdsltVWomyLpGUlbJP2jpAlj9N5GzwFhZlY9ICQtBVoiYj5wuaRZGWVrgdURcR1wUXoP6yuBOyPiPuBZYG6FupuBroj4FHAAWFTrm6qZA8LMLFcPooPB+1FvAa7NqJkN7E2nS8CUiNgREbslLSDpRXRXqPtWRGxN57Wl84eQdLukHkk9vQMXsdWTA8LMLFdATALeSKcPATMyah4DOiUtJukBbAOQJGAZ0Accr1SX1s4HWiNid/nKI2J9RLRHRHtbvc8sivA4TGZm5AuIfmBiOj05a5mIWAM8A9wGbIyI/nR+RMRKYD+wpFKdpGnAA8Cf1PZ2xsDhw/Duu+5BmFnh5QmIPQzuVpoDvFqhbh8wE+gCkLRK0or0uanA4Qp1E4B/AL4SEa+NoO314XGYzMyAfAHxOLBcUhdwE/CSpDUZdXeRHGw+lj5eny63E2ghOX6RVXcryQHsr0p6TtKy0b2VMeKrqM3MADi7WkFEHEnPNloI3B8RB4DnM+o6yx73pctUq3sQeHBEra4nB4SZGZAjIODEl/2mqoVnAgeEmRngoTZONRAQ00+vi7/NzMaaA6JcqQStrXDOOY1uiZlZQzkgyvX2eveSmRkOiFP5KmozM8ABcSoHhJkZ4IA4lQPCzAxwQAz17rvw5pseh8nMDAfEUG++mQzW5x6EmZkDYgiPw2RmdoID4mS+itrM7AQHxMkcEGZmJzggTuaAMDM7wQFxslIJWlqSoTbMzArOAXGyUikZpO8sbxYzM38TnswXyZmZneCAOJkH6jMzO6GuASFpmqSFkk6Pmyu4B2FmdkKugJC0QVK3pHsqPH+ZpKck7ZK0Lp3XCmwG5gHbJbVl1aW1MyTtGoP3UxsHhJnZCVUDQtJSoCUi5gOXS5qVUbYWWB0R1wEXpfewvhK4MyLuA54F5mbVpUGyEZg0Fm9o1H71KzhyxOMwmZml8vQgOhi8H/UW4NqMmtnA3nS6BEyJiB0RsVvSApJeRHdWHfAesAw4UqkBkm6X1COpp3dgOIyx5mE2zMyGyBMQk4A30ulDwIyMmseATkmLgUXANgBJIvny7wOOZ9VFxJGIeGu4BkTE+ohoj4j2tnr9he+L5MzMhsgTEP3AxHR6ctYyEbEGeAa4DdgYEf3p/IiIlcB+YEmluqbgHoSZ2RB5AmIPg7uV5gCvVqjbB8wEugAkrZK0In1uKnA4q65puAdhZjZEnoB4HFguqQu4CXhJ0pqMuruArog4lj5eny63E2ghOX6RVdccHBBmZkOcXa0gIo6kZyUtBO6PiAPA8xl1nWWP+9Jlhq07aX5HrhbXS6kE554Lkyc3tBlmZs2iakDAiS/7TVULT2cD10BIjW6JmVlT8FAbA3yRnJnZEA6IAR6HycxsCAfEAPcgzMyGcEAARDggzMzKOCAA+vuTsZgcEGZmJzggYPAaCA/UZ2Z2ggMCfJGcmVkGBwQ4IMzMMjggwAP1mZllcECAj0GYmWVwQEASEBdcAB/4QKNbYmbWNBwQkASEew9mZkM4IMAXyZmZZXBAgAPCzCyDAwI8UJ+ZWYa6BoSkaZIWSppez9epyfvvOyDMzDLkCghJGyR1S7qnwvOXSXpK0i5J69J5rcBmYB6wXVJbVl2e9ddVXx+8954DwsysTNWAkLQUaImI+cDlkmZllK0FVkfEdcBF6S1KrwTujIj7gGeBuVl1OddfP74GwswsU54eRAeDtxvdAlybUTMb2JtOl4ApEbEjInZLWkDSi+jOqsu5/vrxMBtmZpnyBMQk4I10+hAwI6PmMaBT0mJgEbANQJKAZUAfcLxCXdX1S7pdUo+knt6BYTHGigPCzCxTnoDoByam05OzlomINcAzwG3AxojoT+dHRKwE9gNLKtTlWf/6iGiPiPa2sd4V5HGYzMwy5QmIPQzu9pkDvFqhbh8wE+gCkLRK0or0uanA4ay6Eay/PkolkODCC8f1Zc3Mmt3ZOWoeB3ZJ+g3gBuDzktZERPkZR3cBXRFxLH28Htgk6TbgRZLjC1l15eu/ZrRvZlRKpSQczs6zKczMiqPqt2JEHEnPSloI3B8RB4DnM+o6yx73pctUqytf/1v5mz8GPA6TmVmmXH82p1/2m6oWjlK91z8sD7NhZpbJQ204IMzMMjkgPMyGmVmmYgfE8eNw6JADwswsQ7ED4uDB5KcDwszsFMUOCI/DZGZWkQMC3IMwM8vggAAHhJlZBgcEOCDMzDIUOyB6e5MhNqZObXRLzMyaTrEDYuAiOanRLTEzazoOCJ/BZGaWyQHh4w9mZpkcEA4IM7NMDggHhJlZpuIGxLFjcPSoA8LMrILiBoTvRW1mNqy6BoSkaZIWSppez9cZFY/DZGY2rFwBIWmDpG5J5fehHnj+MklPSdolaV06rxXYDMwDtktqk9Qq6WlJPZIeqrTsuPBV1GZmw6oaEJKWAi0RMR+4XNKsjLK1wOqIuA64KL3H9JXAnRFxH/AsMBdYDjwSEe3A+ZLaKyxbfw4IM7Nh5elBdDB4v+gtwLUZNbOBvel0CZgSETsiYrekBSS9iG7gTeBjkqYCFwOvZy1bvnJJt6e9jp7egWMHtXJAmJkNK09ATALeSKcPATMyah4DOiUtBhYB2wAkCVgG9AHHgR8BlwBfAl5O15e57MkiYn1EtEdEe9tYHTPo7YXzzoNJk8ZmfWZmZ5g8AdEPTEynJ2ctExFrgGeA24CNEdGfzo+IWAnsB5YAncAdEXEv8ApwS6Vl687XQJiZDStPQOxhcLfSHODVCnX7gJlAF4CkVZJWpM9NBQ4DrcAVklqAq4HIWnZceBwmM7Nh5QmIx4HlkrqAm4CXJK3JqLsL6IqIY+nj9elyO4EWkuMX30znvwVMA75bYdn6cw/CzGxYZ1criIgj6ZlFC4H7I+IA8HxGXWfZ4750mZP9BPhotWXHRakEV1017i9rZna6qBoQcOLLflPVwtNFhHsQZmZVFHOojbfeguPHHRBmZsMoZkB4HCYzs6qKGRC+SM7MrKpiB4RPczUzq6jYAeEehJlZRcUOCPcgzMwqKm5ATJ0KEyY0uiVmZk2rmAHR2+vdS2ZmVRQzIHyRnJlZVcUNCB9/MDMbVnEDwj0IM7NhFS8g3nsPDh50QJiZVVG8gHjzzWSwPgeEmdmwihcQHofJzCyX4gWEr6I2M8uluAHhs5jMzIZV14CQNE3SQknT6/k6I+IehJlZLrkCQtIGSd2S7qnw/GWSnpK0S9K6dF4rsBmYB2yX1CapVdLTknokPTRQVz6vrkolOOssmDat7i9lZnY6qxoQkpYCLRExH7hc0qyMsrXA6oi4DrgovYf1lcCdEXEf8CwwF1gOPBIR7cD5ktorzKufUgmmT4eWlrq+jJnZ6S5PD6KDwftRbwGuzaiZDexNp0vAlIjYERG7JS0g6UV0A28CH5M0FbgYeL3CvCEk3Z72MHp6B85CGi1fJGdmlkuegJgEvJFOHwJmZNQ8BnRKWgwsArYBSBKwDOgDjgM/Ai4BvgS8nK4va94QEbE+Itojor2t1oPLHqjPzCyXPAHRD0xMpydnLRMRa4BngNuAjRHRn86PiFgJ7AeWAJ3AHRFxL/AKcEuFefXjcZjMzHLJExB7GNytNAd4tULdPmAm0AUgaZWkFelzU4HDQCtwhaQW4GogKsyrH+9iMjPLJU9APA4sl9QF3AS8JGlNRt1dQFdEHEsfr0+X2wm0kBy/+GY6/y1gGvDdCvPq45134K23HBBmZjmcXa0gIo6kZyUtBO6PiAPA8xl1nWWP+9JlTvYT4KM55tWHh9kwM8utakDAiS/7TVULm50vkjMzy61YQ224B2FmlluxAsLjMJmZ5VbMgHAPwsysquIFxIQJcMEFjW6JmVnTK15AfPCDIDW6JWZmTa+YAWFmZlUVKyA8DpOZWW7FCgiPw2RmlltxAiLCu5jMzEagOAFx9Ci8/bYDwswsp+IEhK+BMDMbEQeEmZllckCYmVmm4gTEhRfC0qXw4Q83uiVmZqeFXMN9nxE+/vHkn5mZ5VKcHoSZmY1IXQNC0jRJCyVNr+frmJnZ2MsVEJI2SOqWdE+F5y+T9JSkXZLWpfNagc3APGC7pDZJrZKeltQj6aG07n9Jei79t29gvpmZNVbVgJC0FGiJiPnA5ZJmZZStBVZHxHXARek9rK8E7oyI+4BngbnAcuCRiGgHzpfUHhEPRkRHRHQAu4D/Mwbvy8zMapSnB9HB4P2otwDXZtTMBvam0yVgSkTsiIjdkhaQ9CK6gTeBj0maClwMvD6wAkkfBmZERE/5yiXdnvY6enoHbhtqZmZ1lScgJgFvpNOHgBkZNY8BnZIWA4uAbQCSBCwD+oDjwI+AS4AvAS+n6xuwEngwqwERsT4i2iOivc2D7ZmZjYs8AdEPTEynJ2ctExFrgGeA24CNEdGfzo+IWAnsB5YAncAdEXEv8ApwC4Cks4BPAM/V8mbMzGzs5AmIPQzuVpoDvFqhbh8wE+gCkLRK0or0uanAYaAVuEJSC3A1EOnz1wE/jojAzMyagqp9J0u6gOTg8TbgBuDzwB9GxD1ldV8H/iMiHk4ft5IcuzgXeJFkF9J/A75DspupG/gfEdEv6RtAT0T8oGqDpV7gtZG8ydR04OAolhsvbl9t3L7auH21OR3aNykiRrSPvmpAwIkv+4XAzog4MLr2NZaknvTsqabk9tXG7auN21ebM7V9uYbaiIg+Bs9kMjOzAvBQG2ZmlqlIAbG+0Q2owu2rjdtXG7evNmdk+3IdgzAzs+IpUg/CzMxGwAFhZmaZChEQ1UajHW+Spkh6RtIWSf8oaYKkX5w0qu0VDW7f2eXtaaZtmDEC8IZm2X6SZkjaddLjU7ZbI7flye2r8Dk85XffwPZltqWJtt8pI1E3cvtV+H3W9Pk74wMi52i04+1moCsiPgUcAO4Gvjswqm1EvNDY5nHlye0BZtFE2zBjBOCHaILtl14vtJFk/LLMz14jP4/l7ePUz+Eiyn7347ktM9p3SluaaftVGIm6YduPU3+fn6fGz98ZHxDkG412XEXEtyJia/qwDXgXuFHST9J0b/StYK85uT3A9TTZNoTBEYCBdppj+71HMjjlkfRxB6dut6x542VI+zI+hyXKfvfjvC3Lt19WWzpoku03QENHom7Y9sv4ff5Pavz8FSEg8oxG2xCS5pOMT7UVuD4i5gHnAJ9paMPgp2XtuYHm3IYDIwCXt7ch2y8ijkTEWyfNyvrsNezzmNE+YPBzGBG7aeC2zGhfVluabvsxdCTqhn8WT/peeZ0aP39FCIiqo9E2gqRpwAPAnwD7I+K/0qd6SHbpNFJ5e6bTZNtQQ0cAbrbtNyDrs9dUn8eyzyE017bMakuzbb/ykagbuv3Kfp81f/4a/h99HOQdjXbcSJoA/APwlYh4DXhY0hwlo9x+Dni+ke3LaM9KmmwbMnQE4GbbfgOyPntN83nM+BxCc23LrLY0zfZLlY9E3bDtl/H7rPnz1+h93ePhcWCXpN8g2VVyTWObA8CtJLdg/aqkrwLbgYcBAU9GxA8b2TjgXuDRgfbQnNvw08DOdHpIe5tg+w14nFO3W2TMa5Tyz+GDNNe2PKUtSkeXbpLtB0M/h9DY7Vf++/wOsLyWz18hrqTWGTAabaN5G45O1nbztqyNt19+tX7+ChEQZmY2ckU4BmFmZqPggDAzs0wOCDMzy+SAMDOzTA4IMzPL9P8BuldbC1NjeU0AAAAASUVORK5CYII=\n", 801 | "text/plain": [ 802 | "
" 803 | ] 804 | }, 805 | "metadata": { 806 | "needs_background": "light" 807 | }, 808 | "output_type": "display_data" 809 | } 810 | ], 811 | "source": [ 812 | "# 调参,绘制学习曲线来调参n_estimators(对随机森林影响最大)\n", 813 | "score_lt = []\n", 814 | "\n", 815 | "# 每隔10步建立一个随机森林,获得不同n_estimators的得分\n", 816 | "for i in range(0,200,10):\n", 817 | " print(\"进度:\",i)\n", 818 | " rfc = RandomForestClassifier(n_estimators=i+1,random_state=90,max_depth=8)\n", 819 | " rfc = rfc.fit(X_train, y_train)\n", 820 | " score = rfc.score(X_test, y_test)\n", 821 | " score_lt.append(score)\n", 822 | "score_max = max(score_lt)\n", 823 | "print('最大得分:{}'.format(score_max),'子树数量为:{}'.format(score_lt.index(score_max)*10+1))\n", 824 | "\n", 825 | "# 绘制学习曲线\n", 826 | "x = np.arange(1,201,10)\n", 827 | "plt.subplot(111)\n", 828 | "plt.plot(x, score_lt, 'r-')\n", 829 | "plt.show()" 830 | ] 831 | }, 832 | { 833 | "cell_type": "code", 834 | "execution_count": 15, 835 | "id": "b124dbb8", 836 | "metadata": {}, 837 | "outputs": [ 838 | { 839 | "name": "stderr", 840 | "output_type": "stream", 841 | "text": [ 842 | "D:\\anaconda3\\lib\\site-packages\\xgboost\\sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].\n", 843 | " warnings.warn(label_encoder_deprecation_msg, UserWarning)\n" 844 | ] 845 | }, 846 | { 847 | "name": "stdout", 848 | "output_type": "stream", 849 | "text": [ 850 | "[15:05:43] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.0/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.\n", 851 | "[[0.87493783 0.12506217]\n", 852 | " [0.9712213 0.02877864]\n", 853 | " [0.8449106 0.1550894 ]\n", 854 | " ...\n", 855 | " [0.87651736 0.12348264]\n", 856 | " [0.916159 0.08384103]\n", 857 | " [0.9761114 0.02388861]]\n", 858 | "Accuracy on training set: 0.939\n", 859 | "Accuracy on test set: 0.939\n" 860 | ] 861 | } 862 | ], 863 | "source": [ 864 | "#使用XGboost\n", 865 | "from sklearn.model_selection import train_test_split\n", 866 | "from sklearn.ensemble import RandomForestClassifier\n", 867 | "from sklearn.linear_model import LinearRegression\n", 868 | "from sklearn.metrics import classification_report\n", 869 | "import xgboost as xgb\n", 870 | "\n", 871 | "model = xgb.XGBClassifier(\n", 872 | " max_depth=8,\n", 873 | " n_estimators=2000,\n", 874 | " min_child_weight=300, \n", 875 | " colsample_bytree=0.8, \n", 876 | " subsample=0.8, \n", 877 | " eta=0.3, \n", 878 | " seed=42 \n", 879 | ")\n", 880 | "# model.fit(\n", 881 | "# X_train, y_train,\n", 882 | "# eval_metric='auc', eval_set=[(X_train, y_train), (X_test, y_test)],\n", 883 | "# verbose=True,\n", 884 | "# #早停法,如果auc在10epoch没有进步就stop\n", 885 | "# early_stopping_rounds=30 \n", 886 | "# )\n", 887 | "model.fit(X_train, y_train)\n", 888 | "\n", 889 | "\n", 890 | "Predict_proba = model.predict_proba(X_test)\n", 891 | "print(Predict_proba[:])\n", 892 | "print(\"Accuracy on training set: {:.3f}\".format(model.score(X_train, y_train))) \n", 893 | "print(\"Accuracy on test set: {:.3f}\".format(model.score(X_test, y_test)))\n" 894 | ] 895 | }, 896 | { 897 | "cell_type": "code", 898 | "execution_count": 16, 899 | "id": "b942defe", 900 | "metadata": {}, 901 | "outputs": [], 902 | "source": [ 903 | "#XGboost最终结果获取\n", 904 | "xgboost_Ans_Predict_proba = model.predict_proba(test_data)\n", 905 | "df_test['prob']=xgboost_Ans_Predict_proba[:,1]\n", 906 | "#最终答案保存\n", 907 | "df_test.to_csv(\"xgboost_Ans.csv\",index=None)" 908 | ] 909 | }, 910 | { 911 | "cell_type": "code", 912 | "execution_count": null, 913 | "id": "33f75369", 914 | "metadata": {}, 915 | "outputs": [], 916 | "source": [] 917 | } 918 | ], 919 | "metadata": { 920 | "kernelspec": { 921 | "display_name": "Python 3", 922 | "language": "python", 923 | "name": "python3" 924 | }, 925 | "language_info": { 926 | "codemirror_mode": { 927 | "name": "ipython", 928 | "version": 3 929 | }, 930 | "file_extension": ".py", 931 | "mimetype": "text/x-python", 932 | "name": "python", 933 | "nbconvert_exporter": "python", 934 | "pygments_lexer": "ipython3", 935 | "version": "3.8.8" 936 | } 937 | }, 938 | "nbformat": 4, 939 | "nbformat_minor": 5 940 | } 941 | -------------------------------------------------------------------------------- /预测建模.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "id": "8035b6a2", 7 | "metadata": {}, 8 | "outputs": [], 9 | "source": [ 10 | "#导包\n", 11 | "import numpy as np\n", 12 | "import pandas as pd\n", 13 | "import matplotlib.pyplot as plt\n", 14 | "plt.rcParams[\"font.sans-serif\"] = \"SimHei\" #解决中文乱码问题\n", 15 | "import seaborn as sns\n", 16 | "import random\n", 17 | "from sklearn.model_selection import train_test_split\n", 18 | "from sklearn.linear_model import LogisticRegression\n", 19 | "from sklearn.preprocessing import LabelEncoder\n", 20 | "from sklearn.metrics import accuracy_score\n", 21 | "from sklearn import model_selection\n", 22 | "from sklearn.neighbors import KNeighborsRegressor" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 2, 28 | "id": "4e929396", 29 | "metadata": {}, 30 | "outputs": [ 31 | { 32 | "data": { 33 | "text/html": [ 34 | "
\n", 35 | "\n", 48 | "\n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | "
user_idmerchant_idprob
01639684605NaN
13605761581NaN
2986881964NaN
3986883645NaN
42952963361NaN
............
2614722284793111NaN
261473979192341NaN
261474979193971NaN
261475326393536NaN
261476326393319NaN
\n", 126 | "

261477 rows × 3 columns

\n", 127 | "
" 128 | ], 129 | "text/plain": [ 130 | " user_id merchant_id prob\n", 131 | "0 163968 4605 NaN\n", 132 | "1 360576 1581 NaN\n", 133 | "2 98688 1964 NaN\n", 134 | "3 98688 3645 NaN\n", 135 | "4 295296 3361 NaN\n", 136 | "... ... ... ...\n", 137 | "261472 228479 3111 NaN\n", 138 | "261473 97919 2341 NaN\n", 139 | "261474 97919 3971 NaN\n", 140 | "261475 32639 3536 NaN\n", 141 | "261476 32639 3319 NaN\n", 142 | "\n", 143 | "[261477 rows x 3 columns]" 144 | ] 145 | }, 146 | "execution_count": 2, 147 | "metadata": {}, 148 | "output_type": "execute_result" 149 | } 150 | ], 151 | "source": [ 152 | "#读取数据\n", 153 | "df_train = pd.read_csv(r'df_train.csv')\n", 154 | "df_test = pd.read_csv(r'../DataMining/data_format1\\test_format1.csv')\n", 155 | "df_test\n" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 3, 161 | "id": "16970677", 162 | "metadata": {}, 163 | "outputs": [ 164 | { 165 | "data": { 166 | "text/html": [ 167 | "
\n", 168 | "\n", 181 | "\n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | "
age_rangegendertotal_item_idunique_item_idtotal_cat_idtotal_time_tempclicksshopping_cartpurchasesfavourites
06.00.039206936012
16.00.01411313010
26.00.01821212060
36.00.021111010
4-1.00.081137010
54.01.011110010
65.00.032112010
75.00.0834815378050
85.00.074116010
94.01.041122011
\n", 330 | "
" 331 | ], 332 | "text/plain": [ 333 | " age_range gender total_item_id unique_item_id total_cat_id \\\n", 334 | "0 6.0 0.0 39 20 6 \n", 335 | "1 6.0 0.0 14 1 1 \n", 336 | "2 6.0 0.0 18 2 1 \n", 337 | "3 6.0 0.0 2 1 1 \n", 338 | "4 -1.0 0.0 8 1 1 \n", 339 | "5 4.0 1.0 1 1 1 \n", 340 | "6 5.0 0.0 3 2 1 \n", 341 | "7 5.0 0.0 83 48 15 \n", 342 | "8 5.0 0.0 7 4 1 \n", 343 | "9 4.0 1.0 4 1 1 \n", 344 | "\n", 345 | " total_time_temp clicks shopping_cart purchases favourites \n", 346 | "0 9 36 0 1 2 \n", 347 | "1 3 13 0 1 0 \n", 348 | "2 2 12 0 6 0 \n", 349 | "3 1 1 0 1 0 \n", 350 | "4 3 7 0 1 0 \n", 351 | "5 1 0 0 1 0 \n", 352 | "6 1 2 0 1 0 \n", 353 | "7 3 78 0 5 0 \n", 354 | "8 1 6 0 1 0 \n", 355 | "9 2 2 0 1 1 " 356 | ] 357 | }, 358 | "execution_count": 3, 359 | "metadata": {}, 360 | "output_type": "execute_result" 361 | } 362 | ], 363 | "source": [ 364 | "#建模前预处理\n", 365 | "y = df_train[\"label\"]\n", 366 | "X = df_train.drop([\"user_id\", \"merchant_id\", \"label\"], axis=1)\n", 367 | "X.head(10)\n" 368 | ] 369 | }, 370 | { 371 | "cell_type": "code", 372 | "execution_count": 4, 373 | "id": "889e9034", 374 | "metadata": {}, 375 | "outputs": [], 376 | "source": [ 377 | "#分割数据\n", 378 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=8)" 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": 5, 384 | "id": "b66e524a", 385 | "metadata": {}, 386 | "outputs": [ 387 | { 388 | "data": { 389 | "text/html": [ 390 | "
\n", 391 | "\n", 404 | "\n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | "
age_rangegendertotal_item_idunique_item_idtotal_cat_idtotal_time_tempclicksshopping_cartpurchasesfavourites
0-1.00.021111010
12.0-1.0109415050
26.00.061115010
36.00.01111110010
42.01.05084547012
.................................
2614726.00.052124010
2614738.01.021111010
2614748.01.01652312040
261475-1.00.032112010
261476-1.00.01111210010
\n", 566 | "

261477 rows × 10 columns

\n", 567 | "
" 568 | ], 569 | "text/plain": [ 570 | " age_range gender total_item_id unique_item_id total_cat_id \\\n", 571 | "0 -1.0 0.0 2 1 1 \n", 572 | "1 2.0 -1.0 10 9 4 \n", 573 | "2 6.0 0.0 6 1 1 \n", 574 | "3 6.0 0.0 11 1 1 \n", 575 | "4 2.0 1.0 50 8 4 \n", 576 | "... ... ... ... ... ... \n", 577 | "261472 6.0 0.0 5 2 1 \n", 578 | "261473 8.0 1.0 2 1 1 \n", 579 | "261474 8.0 1.0 16 5 2 \n", 580 | "261475 -1.0 0.0 3 2 1 \n", 581 | "261476 -1.0 0.0 11 1 1 \n", 582 | "\n", 583 | " total_time_temp clicks shopping_cart purchases favourites \n", 584 | "0 1 1 0 1 0 \n", 585 | "1 1 5 0 5 0 \n", 586 | "2 1 5 0 1 0 \n", 587 | "3 1 10 0 1 0 \n", 588 | "4 5 47 0 1 2 \n", 589 | "... ... ... ... ... ... \n", 590 | "261472 2 4 0 1 0 \n", 591 | "261473 1 1 0 1 0 \n", 592 | "261474 3 12 0 4 0 \n", 593 | "261475 1 2 0 1 0 \n", 594 | "261476 2 10 0 1 0 \n", 595 | "\n", 596 | "[261477 rows x 10 columns]" 597 | ] 598 | }, 599 | "execution_count": 5, 600 | "metadata": {}, 601 | "output_type": "execute_result" 602 | } 603 | ], 604 | "source": [ 605 | "#加载最终测试数据\n", 606 | "test_data= pd.read_csv(r'test_data.csv')\n", 607 | "test_data\n" 608 | ] 609 | }, 610 | { 611 | "cell_type": "code", 612 | "execution_count": 6, 613 | "id": "bede42d0", 614 | "metadata": {}, 615 | "outputs": [ 616 | { 617 | "name": "stdout", 618 | "output_type": "stream", 619 | "text": [ 620 | "(52173,)\n", 621 | "[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]\n", 622 | "[[0.92242829 0.07757171]\n", 623 | " [0.95384761 0.04615239]\n", 624 | " [0.93995785 0.06004215]\n", 625 | " ...\n", 626 | " [0.94603563 0.05396437]\n", 627 | " [0.86838486 0.13161514]\n", 628 | " [0.95512153 0.04487847]]\n", 629 | "Accuracy on training set: 0.939\n", 630 | "Accuracy on test set: 0.939\n" 631 | ] 632 | }, 633 | { 634 | "data": { 635 | "text/plain": [ 636 | "0.9391831023709581" 637 | ] 638 | }, 639 | "execution_count": 6, 640 | "metadata": {}, 641 | "output_type": "execute_result" 642 | } 643 | ], 644 | "source": [ 645 | "#logistic回归\n", 646 | "Logit = LogisticRegression(solver='liblinear')\n", 647 | "Logit.fit(X_train, y_train)\n", 648 | "Predict = Logit.predict(X_test)\n", 649 | "Predict_proba = Logit.predict_proba(X_test)\n", 650 | "print(Predict.shape)\n", 651 | "print(Predict[0:20])\n", 652 | "print(Predict_proba[:])\n", 653 | "print(\"Accuracy on training set: {:.3f}\".format(Logit.score(X_train, y_train)))\n", 654 | "print(\"Accuracy on test set: {:.3f}\".format(Logit.score(X_test, y_test)))\n", 655 | "Score = accuracy_score(y_test, Predict)\n", 656 | "Score" 657 | ] 658 | }, 659 | { 660 | "cell_type": "code", 661 | "execution_count": 21, 662 | "id": "65787397", 663 | "metadata": {}, 664 | "outputs": [], 665 | "source": [ 666 | "#逻辑回归最终结果获取\n", 667 | "Logit_Ans_Predict_proba = Logit.predict_proba(test_data)\n", 668 | "df_test['prob']=Logit_Ans_Predict_proba[:,1]\n", 669 | "#最终答案保存\n", 670 | "df_test.to_csv(\"Logit_Ans.csv\",index=None)" 671 | ] 672 | }, 673 | { 674 | "cell_type": "code", 675 | "execution_count": 22, 676 | "id": "a37fd1e5", 677 | "metadata": {}, 678 | "outputs": [ 679 | { 680 | "name": "stdout", 681 | "output_type": "stream", 682 | "text": [ 683 | "[[0.89765569 0.10234431]\n", 684 | " [0.9609094 0.0390906 ]\n", 685 | " [0.93901148 0.06098852]\n", 686 | " ...\n", 687 | " [0.92812445 0.07187555]\n", 688 | " [0.89765569 0.10234431]\n", 689 | " [0.9609094 0.0390906 ]]\n", 690 | "Accuracy on training set: 0.939\n", 691 | "Accuracy on test set: 0.939\n" 692 | ] 693 | } 694 | ], 695 | "source": [ 696 | "#决策树\n", 697 | "from sklearn.tree import DecisionTreeClassifier\n", 698 | "tree = DecisionTreeClassifier(max_depth=4,random_state=0) \n", 699 | "tree.fit(X_train, y_train)\n", 700 | "Predict_proba = tree.predict_proba(X_test)\n", 701 | "print(Predict_proba[:])\n", 702 | "print(\"Accuracy on training set: {:.3f}\".format(tree.score(X_train, y_train)))\n", 703 | "print(\"Accuracy on test set: {:.3f}\".format(tree.score(X_test, y_test)))" 704 | ] 705 | }, 706 | { 707 | "cell_type": "code", 708 | "execution_count": 23, 709 | "id": "5ed0c662", 710 | "metadata": {}, 711 | "outputs": [], 712 | "source": [ 713 | "#决策树最终结果获取\n", 714 | "Tree_Ans_Predict_proba = tree.predict_proba(test_data)\n", 715 | "df_test['prob']=Tree_Ans_Predict_proba[:,1]\n", 716 | "#最终答案保存\n", 717 | "df_test.to_csv(\"Tree_Ans.csv\",index=None)" 718 | ] 719 | }, 720 | { 721 | "cell_type": "code", 722 | "execution_count": 28, 723 | "id": "9c002987", 724 | "metadata": {}, 725 | "outputs": [ 726 | { 727 | "name": "stdout", 728 | "output_type": "stream", 729 | "text": [ 730 | "[[0.90345203 0.09654797]\n", 731 | " [0.96242055 0.03757945]\n", 732 | " [0.92398178 0.07601822]\n", 733 | " ...\n", 734 | " [0.91943483 0.08056517]\n", 735 | " [0.86844252 0.13155748]\n", 736 | " [0.9607207 0.0392793 ]]\n", 737 | "Accuracy on training set: 0.939\n", 738 | "Accuracy on test set: 0.939\n" 739 | ] 740 | } 741 | ], 742 | "source": [ 743 | "#随机森林\n", 744 | "from sklearn.ensemble import RandomForestClassifier\n", 745 | "rfc = RandomForestClassifier(n_estimators=100,random_state=90,max_depth=8)\n", 746 | "rfc = rfc.fit(X_train, y_train)\n", 747 | "Predict_proba = rfc.predict_proba(X_test)\n", 748 | "print(Predict_proba[:])\n", 749 | "print(\"Accuracy on training set: {:.3f}\".format(rfc.score(X_train, y_train))) \n", 750 | "print(\"Accuracy on test set: {:.3f}\".format(rfc.score(X_test, y_test)))" 751 | ] 752 | }, 753 | { 754 | "cell_type": "code", 755 | "execution_count": 29, 756 | "id": "55703385", 757 | "metadata": {}, 758 | "outputs": [], 759 | "source": [ 760 | "#随机森林最终结果获取\n", 761 | "RFC_Ans_Predict_proba = rfc.predict_proba(test_data)\n", 762 | "df_test['prob']=RFC_Ans_Predict_proba[:,1]\n", 763 | "#最终答案保存\n", 764 | "df_test.to_csv(\"RFC_Ans.csv\",index=None)" 765 | ] 766 | }, 767 | { 768 | "cell_type": "code", 769 | "execution_count": 27, 770 | "id": "54978d26", 771 | "metadata": {}, 772 | "outputs": [ 773 | { 774 | "name": "stdout", 775 | "output_type": "stream", 776 | "text": [ 777 | "进度: 0\n", 778 | "进度: 10\n", 779 | "进度: 20\n", 780 | "进度: 30\n", 781 | "进度: 40\n", 782 | "进度: 50\n", 783 | "进度: 60\n", 784 | "进度: 70\n", 785 | "进度: 80\n", 786 | "进度: 90\n", 787 | "进度: 100\n", 788 | "进度: 110\n", 789 | "进度: 120\n", 790 | "进度: 130\n", 791 | "进度: 140\n", 792 | "进度: 150\n", 793 | "进度: 160\n", 794 | "进度: 170\n", 795 | "进度: 180\n", 796 | "进度: 190\n", 797 | "最大得分:0.9394897744043854 子树数量为:101\n" 798 | ] 799 | }, 800 | { 801 | "data": { 802 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAD2CAYAAADMHBAjAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAcVUlEQVR4nO3df5BV9X3/8efLVQyCwiIbMo3ijxlo00SxdL8ojdLNN5JgRki+TCtJ/cLU6jh+S5p2nFjMxHQbwaQ4X3Y64zSOfL8k5etoJtSkalErhEEg0yXJQhA12h8z1ThOmbvIIlkwBvX9/eOcZdnLuXvP7t2798J5PWaYPffc9zn3c89e7ms/58fnKCIwMzMrd1ajG2BmZs3JAWFmZpkcEGZmlskBYWZmmRwQZmaW6exGN2Ckpk+fHpdeemmjm2FmdlrZs2fPwYhoG8kyp11AXHrppfT09DS6GWZmpxVJr410Ge9iMjOzTA4IMzPL5IAwM7NMdQ0ISdMkLZQ0vZ6vY2ZmYy9XQEjaIKlb0j0Vnr9M0lOSdklal85rBTYD84Dtktoq1J0t6ReSnkv/XTFG783MzGpQ9SwmSUuBloiYL+nbkmZFxL+Xla0FVkfEbknfk9QBBHBnOq8VmAvcmlF3BPhuRKwau7dlZma1ytOD6AA2pdNbgGszamYDe9PpEjAlInakQbCApBfRnVUHXAPcKOknaU/llNCSdLukHkk9vb29Od+amZnVIs91EJOAN9LpQyQ9gXKPAZ2SdgOLgK8ASBKwDOgDjleo+whwfUT8l6T/B3wGePLklUfEemA9QHt7u8cnt9NLBOzZAz/8IRw7Nvr1tLTANdfAJz4BEyaMXfvMKsgTEP3AxHR6Mhm9johYI+la4C5gY0T0p/MDWClpNbAkq07S/oh4J11VDzCrtrdk1gTeeQeeew6eeAKefBLeSP/Gkka/zoF7t5x/PtxwA3z2s/CZz8DUqbW21ixTnl1MexjcrTQHeLVC3T5gJtAFIGmVpBXpc1OBw1l1wMOS5khqAT4HPJ+z7WbN5fBhePRRWLYM2tpg0SLYuBHmzUt+HjwI778/+n/HjsE//VOy/h074Oabk9e5/np44AH4xS8avQXsDKNqd5STdAGwC9gG3AB8HvjDiLinrO7rwH9ExMPp41aSYxfnAi8CKyMiMuo+BjwKCHgyIr46XHva29vDQ21Y03jttaSX8MQTsHMnvPsuzJgBixcnf+F/8pMwcWL19YzU++/Dj388+NqvvJLMv+qq5HU/+9lkupYei51RJO2JiPYRLZPnlqPpl/1CYGdEHBhl+8aEA8IaKgJ+9rPBL+bn0w7vb/1W8qX8uc8lPYazxvka1H/7t8E2/cu/JO2cOROWLEnatWCBj1sUXN0CopmclgFx7Bhs3QovvAB//Mdw0UXj34b//E946CF4++3xf+0zxdGjsGULvP568pf57/3e4F/rs2c3unWDSiXYvDkJi61bk9/5lCnw6U/Dhz7U6NZZLT75yST0R8EB0Uyy/pMCTJ4Mq1fDF78IZ4/DYLq//jV0dcG998Lx48nr2+i0tMDHP54Ewo03wgc/2OgWVTfwx8kTTyThdvRoo1tktfiLv4DOzlEt6oBotKxu/sUXD/6VefHFyS/4n/852T/80EPJ7oh62bUL7rgDfv5zWLoU/vZvkzaYWeGMJiBOu/tBNJXhDhT+1V9lHyh8+mn4/vfhz/88Oaf9jjvgG98Y21MVDx6Ev/xL+M534JJLkjNfbrxx7NZvZoXgHsRIvf12csHTE08kX7ylUrKr6Pd/PwmEJUuSL+VqjhxJQuSBB5JTFbu64AtfqO2sk/ffh7//e7jrrmT9X/4yfO1rcN55o1+nmZ0RRtOD8HDfeZVK8Ad/ANOnJyGwaRN0dMAjj0BvbxIaf/Zn+cIB4IILkl0+P/1pcrbJzTfDwoXJbqrRePHFJKRuvRV++7dh3z745jcdDmY2ag6IvLZtS3YN3XRTcgyhtxe+9z34oz+qbffQ3LnQ3Q1/93dJWFxxBfz1X8OvfpVv+aNH4e674Xd+B15+Gb797eQiqo9+dPRtMjPDAZFfqZT8XLcuOV3w3HPHbt0tLfCnfwr/+q9JL+XrX0+CYuvW4ZfbvDkJgrVrYcWK5BjILbeM/zn4ZnZG8jdJXgPHGuo57s2HPpTsstq6NTkW8alPJT2UA2XXJr7+enJW0uLFyWmrO3fChg3J7i8zszHigMirVEoOJo/HX+fXXw/79ye7mr7/ffjN30x2Qb3zTtKD+chHkt1cf/M3sHcvXHdd/dtkZoXjgMirVBrfC6M+8IHkgpgXXkiulfjiF5OA+vKXk4PjP/85rFrl4RPMrG4cEHn19iZf0ONt9uzkCthHH4Xf/V34wQ+S02svvXT822JmheIL5fIqleDqqxvz2lJyjcQXvtCY1zezQnIPIq/x3sVkZtZgDog83n4bfvlLB4SZFYoDIo/e3uSnA8LMCqSuASFpmqSFkk7vE/QHLpJzQJhZgeQKCEkbJHVLuqfC85dJekrSLknr0nmtwGZgHrBdUltW3UnrmCHpZzW+n/oY6EE04iwmM7MGqRoQkpYCLRExH7hc0qyMsrXA6oi4DrhIUgdwJXBnRNwHPAvMrVA34H8Ddbh57xhwD8LMCihPD6ID2JRObwGuzaiZDexNp0vAlIjYERG7JS0g6UV0Z9UBSPrvwFEg837Xkm6X1COpp3fgr/nx5IAwswLKExCTgDfS6UPAjIyax4BOSYuBRcA2AEkClgF9wPGsOkkTgK8Bd1dqQESsj4j2iGhva8RunlIJJk6ESZPG/7XNzBokT0D0M7jrZ3LWMhGxBngGuA3YGBH96fyIiJXAfmBJhbq7gW9FxOEa30v9DFwDUcvNfMzMTjN5AmIPg7uV5gCvVqjbB8wEugAkrZK0In1uKnA4qw64Hlgp6TngKkn/N2fbx48vkjOzAsoTEI8DyyV1ATcBL0lak1F3F9AVEcfSx+vT5XYCLSTHL06pi4gFEdERER3Avoi4bdTvpl4aNQ6TmVkDVR2LKSKOpGcbLQTuj4gDwPMZdZ1lj/vSZYatK3uuo2qLG6FUSm7gY2ZWILkG60u/7DdVLTwTRXgXk5kVkofaqOaXv0xu1OOAMLOCcUBU42sgzKygHBDVOCDMrKAcENV4HCYzKygHRDXuQZhZQTkgqhkICPcgzKxgHBDVlEowZQqce26jW2JmNq4cENX4GggzKygHRDUOCDMrKAdENaWSjz+YWSE5IKrp7XUPwswKyQExnPffd0CYWWE5IIZz6FASEg4IMysgB8RwfJGcmRWYA2I4DggzKzAHxHB8FbWZFVhdA0LSNEkLJU0fi7pxNzBQn3sQZlZAuQJC0gZJ3ZLuqfD8ZZKekrRL0rp0XiuwGZgHbJfUlrduLN7YmCiVQIILL2x0S8zMxl3VW45KWgq0RMR8Sd+WNCsi/r2sbC2wOiJ2S/peeg/rAO5M57UCc4Fbc9Y9O1ZvsCalEkyfDi0tjW6Jmdm4y9OD6GDwftRbgGszamYDe9PpEjAlInakX/oLSHoH3SOoG0LS7ZJ6JPX0Duz2GQ8eZsPMCixPQEwC3kinDwEzMmoeAzolLQYWAdsAJAlYBvQBx0dQN0RErI+I9ohobxvPA8YOCDMrsDwB0Q9MTKcnZy0TEWuAZ4DbgI0R0Z/Oj4hYCewHluStq+0tjSGPw2RmBZYnIPYwuFtpDvBqhbp9wEygC0DSKkkr0uemAodHWNd4HmbDzAosT0A8DiyX1AXcBLwkaU1G3V1AV0QcSx+vT5fbCbSQHL8YSV1j/frX0NfngDCzwqp6FlNEHEnPNloI3B8RB4DnM+o6yx73pcuMqq7hDh5MfjogzKygqgYEnPgS31S18EziYTbMrOA81EYlDggzKzgHRCUeh8nMCs4BUYl7EGZWcA6ISnp74ZxzYMqURrfEzKwhHBCVDFxFLTW6JWZmDeGAqMTDbJhZwTkgKnFAmFnBOSAqcUCYWcE5ICrxQH1mVnAOiCxHj8KxY+5BmFmhOSCy+F7UZmYOiEy+SM7MzAGRyQFhZuaAyOSAMDNzQGTyQH1mZg6ITL29MGkSnHdeo1tiZtYwdQ0ISdMkLZQ0vZ6vM+Z8kZyZWb6AkLRBUrekeyo8f5mkpyTtkrQundcKbAbmAdsltVWomyLpGUlbJP2jpAlj9N5GzwFhZlY9ICQtBVoiYj5wuaRZGWVrgdURcR1wUXoP6yuBOyPiPuBZYG6FupuBroj4FHAAWFTrm6qZA8LMLFcPooPB+1FvAa7NqJkN7E2nS8CUiNgREbslLSDpRXRXqPtWRGxN57Wl84eQdLukHkk9vQMXsdWTA8LMLFdATALeSKcPATMyah4DOiUtJukBbAOQJGAZ0Accr1SX1s4HWiNid/nKI2J9RLRHRHtbvc8sivA4TGZm5AuIfmBiOj05a5mIWAM8A9wGbIyI/nR+RMRKYD+wpFKdpGnAA8Cf1PZ2xsDhw/Duu+5BmFnh5QmIPQzuVpoDvFqhbh8wE+gCkLRK0or0uanA4Qp1E4B/AL4SEa+NoO314XGYzMyAfAHxOLBcUhdwE/CSpDUZdXeRHGw+lj5eny63E2ghOX6RVXcryQHsr0p6TtKy0b2VMeKrqM3MADi7WkFEHEnPNloI3B8RB4DnM+o6yx73pctUq3sQeHBEra4nB4SZGZAjIODEl/2mqoVnAgeEmRngoTZONRAQ00+vi7/NzMaaA6JcqQStrXDOOY1uiZlZQzkgyvX2eveSmRkOiFP5KmozM8ABcSoHhJkZ4IA4lQPCzAxwQAz17rvw5pseh8nMDAfEUG++mQzW5x6EmZkDYgiPw2RmdoID4mS+itrM7AQHxMkcEGZmJzggTuaAMDM7wQFxslIJWlqSoTbMzArOAXGyUikZpO8sbxYzM38TnswXyZmZneCAOJkH6jMzO6GuASFpmqSFkk6Pmyu4B2FmdkKugJC0QVK3pHsqPH+ZpKck7ZK0Lp3XCmwG5gHbJbVl1aW1MyTtGoP3UxsHhJnZCVUDQtJSoCUi5gOXS5qVUbYWWB0R1wEXpfewvhK4MyLuA54F5mbVpUGyEZg0Fm9o1H71KzhyxOMwmZml8vQgOhi8H/UW4NqMmtnA3nS6BEyJiB0RsVvSApJeRHdWHfAesAw4UqkBkm6X1COpp3dgOIyx5mE2zMyGyBMQk4A30ulDwIyMmseATkmLgUXANgBJIvny7wOOZ9VFxJGIeGu4BkTE+ohoj4j2tnr9he+L5MzMhsgTEP3AxHR6ctYyEbEGeAa4DdgYEf3p/IiIlcB+YEmluqbgHoSZ2RB5AmIPg7uV5gCvVqjbB8wEugAkrZK0In1uKnA4q65puAdhZjZEnoB4HFguqQu4CXhJ0pqMuruArog4lj5eny63E2ghOX6RVdccHBBmZkOcXa0gIo6kZyUtBO6PiAPA8xl1nWWP+9Jlhq07aX5HrhbXS6kE554Lkyc3tBlmZs2iakDAiS/7TVULT2cD10BIjW6JmVlT8FAbA3yRnJnZEA6IAR6HycxsCAfEAPcgzMyGcEAARDggzMzKOCAA+vuTsZgcEGZmJzggYPAaCA/UZ2Z2ggMCfJGcmVkGBwQ4IMzMMjggwAP1mZllcECAj0GYmWVwQEASEBdcAB/4QKNbYmbWNBwQkASEew9mZkM4IMAXyZmZZXBAgAPCzCyDAwI8UJ+ZWYa6BoSkaZIWSppez9epyfvvOyDMzDLkCghJGyR1S7qnwvOXSXpK0i5J69J5rcBmYB6wXVJbVl2e9ddVXx+8954DwsysTNWAkLQUaImI+cDlkmZllK0FVkfEdcBF6S1KrwTujIj7gGeBuVl1OddfP74GwswsU54eRAeDtxvdAlybUTMb2JtOl4ApEbEjInZLWkDSi+jOqsu5/vrxMBtmZpnyBMQk4I10+hAwI6PmMaBT0mJgEbANQJKAZUAfcLxCXdX1S7pdUo+knt6BYTHGigPCzCxTnoDoByam05OzlomINcAzwG3AxojoT+dHRKwE9gNLKtTlWf/6iGiPiPa2sd4V5HGYzMwy5QmIPQzu9pkDvFqhbh8wE+gCkLRK0or0uanA4ay6Eay/PkolkODCC8f1Zc3Mmt3ZOWoeB3ZJ+g3gBuDzktZERPkZR3cBXRFxLH28Htgk6TbgRZLjC1l15eu/ZrRvZlRKpSQczs6zKczMiqPqt2JEHEnPSloI3B8RB4DnM+o6yx73pctUqytf/1v5mz8GPA6TmVmmXH82p1/2m6oWjlK91z8sD7NhZpbJQ204IMzMMjkgPMyGmVmmYgfE8eNw6JADwswsQ7ED4uDB5KcDwszsFMUOCI/DZGZWkQMC3IMwM8vggAAHhJlZBgcEOCDMzDIUOyB6e5MhNqZObXRLzMyaTrEDYuAiOanRLTEzazoOCJ/BZGaWyQHh4w9mZpkcEA4IM7NMDggHhJlZpuIGxLFjcPSoA8LMrILiBoTvRW1mNqy6BoSkaZIWSppez9cZFY/DZGY2rFwBIWmDpG5J5fehHnj+MklPSdolaV06rxXYDMwDtktqk9Qq6WlJPZIeqrTsuPBV1GZmw6oaEJKWAi0RMR+4XNKsjLK1wOqIuA64KL3H9JXAnRFxH/AsMBdYDjwSEe3A+ZLaKyxbfw4IM7Nh5elBdDB4v+gtwLUZNbOBvel0CZgSETsiYrekBSS9iG7gTeBjkqYCFwOvZy1bvnJJt6e9jp7egWMHtXJAmJkNK09ATALeSKcPATMyah4DOiUtBhYB2wAkCVgG9AHHgR8BlwBfAl5O15e57MkiYn1EtEdEe9tYHTPo7YXzzoNJk8ZmfWZmZ5g8AdEPTEynJ2ctExFrgGeA24CNEdGfzo+IWAnsB5YAncAdEXEv8ApwS6Vl687XQJiZDStPQOxhcLfSHODVCnX7gJlAF4CkVZJWpM9NBQ4DrcAVklqAq4HIWnZceBwmM7Nh5QmIx4HlkrqAm4CXJK3JqLsL6IqIY+nj9elyO4EWkuMX30znvwVMA75bYdn6cw/CzGxYZ1criIgj6ZlFC4H7I+IA8HxGXWfZ4750mZP9BPhotWXHRakEV1017i9rZna6qBoQcOLLflPVwtNFhHsQZmZVFHOojbfeguPHHRBmZsMoZkB4HCYzs6qKGRC+SM7MrKpiB4RPczUzq6jYAeEehJlZRcUOCPcgzMwqKm5ATJ0KEyY0uiVmZk2rmAHR2+vdS2ZmVRQzIHyRnJlZVcUNCB9/MDMbVnEDwj0IM7NhFS8g3nsPDh50QJiZVVG8gHjzzWSwPgeEmdmwihcQHofJzCyX4gWEr6I2M8uluAHhs5jMzIZV14CQNE3SQknT6/k6I+IehJlZLrkCQtIGSd2S7qnw/GWSnpK0S9K6dF4rsBmYB2yX1CapVdLTknokPTRQVz6vrkolOOssmDat7i9lZnY6qxoQkpYCLRExH7hc0qyMsrXA6oi4DrgovYf1lcCdEXEf8CwwF1gOPBIR7cD5ktorzKufUgmmT4eWlrq+jJnZ6S5PD6KDwftRbwGuzaiZDexNp0vAlIjYERG7JS0g6UV0A28CH5M0FbgYeL3CvCEk3Z72MHp6B85CGi1fJGdmlkuegJgEvJFOHwJmZNQ8BnRKWgwsArYBSBKwDOgDjgM/Ai4BvgS8nK4va94QEbE+Itojor2t1oPLHqjPzCyXPAHRD0xMpydnLRMRa4BngNuAjRHRn86PiFgJ7AeWAJ3AHRFxL/AKcEuFefXjcZjMzHLJExB7GNytNAd4tULdPmAm0AUgaZWkFelzU4HDQCtwhaQW4GogKsyrH+9iMjPLJU9APA4sl9QF3AS8JGlNRt1dQFdEHEsfr0+X2wm0kBy/+GY6/y1gGvDdCvPq45134K23HBBmZjmcXa0gIo6kZyUtBO6PiAPA8xl1nWWP+9JlTvYT4KM55tWHh9kwM8utakDAiS/7TVULm50vkjMzy61YQ224B2FmlluxAsLjMJmZ5VbMgHAPwsysquIFxIQJcMEFjW6JmVnTK15AfPCDIDW6JWZmTa+YAWFmZlUVKyA8DpOZWW7FCgiPw2RmlltxAiLCu5jMzEagOAFx9Ci8/bYDwswsp+IEhK+BMDMbEQeEmZllckCYmVmm4gTEhRfC0qXw4Q83uiVmZqeFXMN9nxE+/vHkn5mZ5VKcHoSZmY1IXQNC0jRJCyVNr+frmJnZ2MsVEJI2SOqWdE+F5y+T9JSkXZLWpfNagc3APGC7pDZJrZKeltQj6aG07n9Jei79t29gvpmZNVbVgJC0FGiJiPnA5ZJmZZStBVZHxHXARek9rK8E7oyI+4BngbnAcuCRiGgHzpfUHhEPRkRHRHQAu4D/Mwbvy8zMapSnB9HB4P2otwDXZtTMBvam0yVgSkTsiIjdkhaQ9CK6gTeBj0maClwMvD6wAkkfBmZERE/5yiXdnvY6enoHbhtqZmZ1lScgJgFvpNOHgBkZNY8BnZIWA4uAbQCSBCwD+oDjwI+AS4AvAS+n6xuwEngwqwERsT4i2iOivc2D7ZmZjYs8AdEPTEynJ2ctExFrgGeA24CNEdGfzo+IWAnsB5YAncAdEXEv8ApwC4Cks4BPAM/V8mbMzGzs5AmIPQzuVpoDvFqhbh8wE+gCkLRK0or0uanAYaAVuEJSC3A1EOnz1wE/jojAzMyagqp9J0u6gOTg8TbgBuDzwB9GxD1ldV8H/iMiHk4ft5IcuzgXeJFkF9J/A75DspupG/gfEdEv6RtAT0T8oGqDpV7gtZG8ydR04OAolhsvbl9t3L7auH21OR3aNykiRrSPvmpAwIkv+4XAzog4MLr2NZaknvTsqabk9tXG7auN21ebM7V9uYbaiIg+Bs9kMjOzAvBQG2ZmlqlIAbG+0Q2owu2rjdtXG7evNmdk+3IdgzAzs+IpUg/CzMxGwAFhZmaZChEQ1UajHW+Spkh6RtIWSf8oaYKkX5w0qu0VDW7f2eXtaaZtmDEC8IZm2X6SZkjaddLjU7ZbI7flye2r8Dk85XffwPZltqWJtt8pI1E3cvtV+H3W9Pk74wMi52i04+1moCsiPgUcAO4Gvjswqm1EvNDY5nHlye0BZtFE2zBjBOCHaILtl14vtJFk/LLMz14jP4/l7ePUz+Eiyn7347ktM9p3SluaaftVGIm6YduPU3+fn6fGz98ZHxDkG412XEXEtyJia/qwDXgXuFHST9J0b/StYK85uT3A9TTZNoTBEYCBdppj+71HMjjlkfRxB6dut6x542VI+zI+hyXKfvfjvC3Lt19WWzpoku03QENHom7Y9sv4ff5Pavz8FSEg8oxG2xCS5pOMT7UVuD4i5gHnAJ9paMPgp2XtuYHm3IYDIwCXt7ch2y8ijkTEWyfNyvrsNezzmNE+YPBzGBG7aeC2zGhfVluabvsxdCTqhn8WT/peeZ0aP39FCIiqo9E2gqRpwAPAnwD7I+K/0qd6SHbpNFJ5e6bTZNtQQ0cAbrbtNyDrs9dUn8eyzyE017bMakuzbb/ykagbuv3Kfp81f/4a/h99HOQdjXbcSJoA/APwlYh4DXhY0hwlo9x+Dni+ke3LaM9KmmwbMnQE4GbbfgOyPntN83nM+BxCc23LrLY0zfZLlY9E3bDtl/H7rPnz1+h93ePhcWCXpN8g2VVyTWObA8CtJLdg/aqkrwLbgYcBAU9GxA8b2TjgXuDRgfbQnNvw08DOdHpIe5tg+w14nFO3W2TMa5Tyz+GDNNe2PKUtSkeXbpLtB0M/h9DY7Vf++/wOsLyWz18hrqTWGTAabaN5G45O1nbztqyNt19+tX7+ChEQZmY2ckU4BmFmZqPggDAzs0wOCDMzy+SAMDOzTA4IMzPL9P8BuldbC1NjeU0AAAAASUVORK5CYII=\n", 803 | "text/plain": [ 804 | "
" 805 | ] 806 | }, 807 | "metadata": { 808 | "needs_background": "light" 809 | }, 810 | "output_type": "display_data" 811 | } 812 | ], 813 | "source": [ 814 | "# 调参,绘制学习曲线来调参n_estimators(对随机森林影响最大)\n", 815 | "score_lt = []\n", 816 | "\n", 817 | "# 每隔10步建立一个随机森林,获得不同n_estimators的得分\n", 818 | "for i in range(0,200,10):\n", 819 | " print(\"进度:\",i)\n", 820 | " rfc = RandomForestClassifier(n_estimators=i+1,random_state=90,max_depth=8)\n", 821 | " rfc = rfc.fit(X_train, y_train)\n", 822 | " score = rfc.score(X_test, y_test)\n", 823 | " score_lt.append(score)\n", 824 | "score_max = max(score_lt)\n", 825 | "print('最大得分:{}'.format(score_max),'子树数量为:{}'.format(score_lt.index(score_max)*10+1))\n", 826 | "\n", 827 | "# 绘制学习曲线\n", 828 | "x = np.arange(1,201,10)\n", 829 | "plt.subplot(111)\n", 830 | "plt.plot(x, score_lt, 'r-')\n", 831 | "plt.show()" 832 | ] 833 | }, 834 | { 835 | "cell_type": "code", 836 | "execution_count": 15, 837 | "id": "b124dbb8", 838 | "metadata": {}, 839 | "outputs": [ 840 | { 841 | "name": "stderr", 842 | "output_type": "stream", 843 | "text": [ 844 | "D:\\anaconda3\\lib\\site-packages\\xgboost\\sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].\n", 845 | " warnings.warn(label_encoder_deprecation_msg, UserWarning)\n" 846 | ] 847 | }, 848 | { 849 | "name": "stdout", 850 | "output_type": "stream", 851 | "text": [ 852 | "[15:05:43] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.0/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.\n", 853 | "[[0.87493783 0.12506217]\n", 854 | " [0.9712213 0.02877864]\n", 855 | " [0.8449106 0.1550894 ]\n", 856 | " ...\n", 857 | " [0.87651736 0.12348264]\n", 858 | " [0.916159 0.08384103]\n", 859 | " [0.9761114 0.02388861]]\n", 860 | "Accuracy on training set: 0.939\n", 861 | "Accuracy on test set: 0.939\n" 862 | ] 863 | } 864 | ], 865 | "source": [ 866 | "#使用XGboost\n", 867 | "from sklearn.model_selection import train_test_split\n", 868 | "from sklearn.ensemble import RandomForestClassifier\n", 869 | "from sklearn.linear_model import LinearRegression\n", 870 | "from sklearn.metrics import classification_report\n", 871 | "import xgboost as xgb\n", 872 | "\n", 873 | "model = xgb.XGBClassifier(\n", 874 | " max_depth=8,\n", 875 | " n_estimators=2000,\n", 876 | " min_child_weight=300, \n", 877 | " colsample_bytree=0.8, \n", 878 | " subsample=0.8, \n", 879 | " eta=0.3, \n", 880 | " seed=42 \n", 881 | ")\n", 882 | "# model.fit(\n", 883 | "# X_train, y_train,\n", 884 | "# eval_metric='auc', eval_set=[(X_train, y_train), (X_test, y_test)],\n", 885 | "# verbose=True,\n", 886 | "# #早停法,如果auc在10epoch没有进步就stop\n", 887 | "# early_stopping_rounds=30 \n", 888 | "# )\n", 889 | "model.fit(X_train, y_train)\n", 890 | "\n", 891 | "\n", 892 | "Predict_proba = model.predict_proba(X_test)\n", 893 | "print(Predict_proba[:])\n", 894 | "print(\"Accuracy on training set: {:.3f}\".format(model.score(X_train, y_train))) \n", 895 | "print(\"Accuracy on test set: {:.3f}\".format(model.score(X_test, y_test)))\n" 896 | ] 897 | }, 898 | { 899 | "cell_type": "code", 900 | "execution_count": 16, 901 | "id": "b942defe", 902 | "metadata": {}, 903 | "outputs": [], 904 | "source": [ 905 | "#XGboost最终结果获取\n", 906 | "xgboost_Ans_Predict_proba = model.predict_proba(test_data)\n", 907 | "df_test['prob']=xgboost_Ans_Predict_proba[:,1]\n", 908 | "#最终答案保存\n", 909 | "df_test.to_csv(\"xgboost_Ans.csv\",index=None)" 910 | ] 911 | }, 912 | { 913 | "cell_type": "code", 914 | "execution_count": null, 915 | "id": "33f75369", 916 | "metadata": {}, 917 | "outputs": [], 918 | "source": [] 919 | } 920 | ], 921 | "metadata": { 922 | "kernelspec": { 923 | "display_name": "Python 3", 924 | "language": "python", 925 | "name": "python3" 926 | }, 927 | "language_info": { 928 | "codemirror_mode": { 929 | "name": "ipython", 930 | "version": 3 931 | }, 932 | "file_extension": ".py", 933 | "mimetype": "text/x-python", 934 | "name": "python", 935 | "nbconvert_exporter": "python", 936 | "pygments_lexer": "ipython3", 937 | "version": "3.8.8" 938 | } 939 | }, 940 | "nbformat": 4, 941 | "nbformat_minor": 5 942 | } 943 | -------------------------------------------------------------------------------- /特征工程.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "id": "24b1b8d2", 7 | "metadata": {}, 8 | "outputs": [], 9 | "source": [ 10 | "#导包\n", 11 | "import numpy as np\n", 12 | "import pandas as pd\n", 13 | "import matplotlib.pyplot as plt\n", 14 | "plt.rcParams[\"font.sans-serif\"] = \"SimHei\" #解决中文乱码问题\n", 15 | "import seaborn as sns\n", 16 | "import random\n", 17 | "from sklearn.model_selection import train_test_split\n", 18 | "from sklearn.linear_model import LogisticRegression\n", 19 | "from sklearn.preprocessing import LabelEncoder\n", 20 | "from sklearn.metrics import accuracy_score\n", 21 | "from sklearn import model_selection\n", 22 | "from sklearn.neighbors import KNeighborsRegressor" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 2, 28 | "id": "51717a7b", 29 | "metadata": {}, 30 | "outputs": [ 31 | { 32 | "name": "stdout", 33 | "output_type": "stream", 34 | "text": [ 35 | "(261477, 3) (260864, 3)\n", 36 | "(424170, 3) (54925330, 7)\n" 37 | ] 38 | } 39 | ], 40 | "source": [ 41 | "#读取数据\n", 42 | "\n", 43 | "df_train = pd.read_csv(r'../DataMining/data_format1\\train_format1.csv')\n", 44 | "df_test = pd.read_csv(r'../DataMining/data_format1\\test_format1.csv')\n", 45 | "user_info = pd.read_csv(r'../DataMining/data_format1\\user_info_format1.csv')\n", 46 | "user_log = pd.read_csv(r'../DataMining/data_format1\\user_log_format1.csv')\n", 47 | "\n", 48 | "print(df_test.shape,df_train.shape)\n", 49 | "print(user_info.shape,user_log.shape)" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 3, 55 | "id": "e833f4c8", 56 | "metadata": {}, 57 | "outputs": [ 58 | { 59 | "name": "stdout", 60 | "output_type": "stream", 61 | "text": [ 62 | "\n", 63 | "RangeIndex: 424170 entries, 0 to 424169\n", 64 | "Data columns (total 3 columns):\n", 65 | " # Column Non-Null Count Dtype \n", 66 | "--- ------ -------------- ----- \n", 67 | " 0 user_id 424170 non-null int64 \n", 68 | " 1 age_range 329039 non-null float64\n", 69 | " 2 gender 407308 non-null float64\n", 70 | "dtypes: float64(2), int64(1)\n", 71 | "memory usage: 9.7 MB\n" 72 | ] 73 | } 74 | ], 75 | "source": [ 76 | "#使用空值去替换\n", 77 | "user_info['age_range'].replace(0.0,np.nan,inplace=True)\n", 78 | "user_info['gender'].replace(2.0,np.nan,inplace=True)\n", 79 | "\n", 80 | "user_info.info()" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": 4, 86 | "id": "81f6042b", 87 | "metadata": {}, 88 | "outputs": [], 89 | "source": [ 90 | "user_info['age_range'].replace(np.nan,-1,inplace=True)\n", 91 | "user_info['gender'].replace(np.nan,-1,inplace=True)\n", 92 | "# user_info['age_range'].replace(np.nan,1,inplace=True)\n", 93 | "# user_info['gender'].replace(np.nan,0,inplace=True)" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 5, 99 | "id": "cfe56eb8", 100 | "metadata": {}, 101 | "outputs": [ 102 | { 103 | "data": { 104 | "text/plain": [ 105 | "Text(0.5, 1.0, '用户年龄分布')" 106 | ] 107 | }, 108 | "execution_count": 5, 109 | "metadata": {}, 110 | "output_type": "execute_result" 111 | }, 112 | { 113 | "data": { 114 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAmEAAAFzCAYAAAB2A95GAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAbOElEQVR4nO3df7RdZX0n4M+XJBhKQBFSFFMNsXTGWofKyqBUSoOKNWW0HWrFmZZ2rB1K1ZERrU0rhVZA0aWtv4YZqIgupq2/pqWdoVpcKjW2xJlEoUXAJasEhZlIDBJIWyzIO3+cHbm53MDNuTd57715nrXOytnv3vuc933Pzr2f++5371OttQAAsG8d0LsCAAD7IyEMAKADIQwAoAMhDACgAyEMAKADIQyY16rqcfvgPQ6qKj8vgVnlhwowb1XVTyf5yDS2e21V/coevO6GqjpkQtEvJPngY+zzL4d/V1XV86fxHgdU1V9V1dOmWy9gYRHCgK6q6pCq+k5VbRwe36iqOycsf6uqTt/N7n+Z5Pur6vse5fVXJHlrktumWLd44khaVb2qqo5I8kCS71TVkmHVmiSnVNWhu3mPU5N8oqoqSUty2aPVabA2yfe11m5/jO2ABarcrBXoaQhBX22trRyW35hkaWvtwmH5Q0n+vLX2J8Pye5O86FFe8ubW2r8dtl2c5NNJViS5c1i/OsmNSe5PsiTJLa21Vw3b/3GS30hyZUZB7E8yGmm7McmHkxzSWnvtpPofkGRDkotaa382oY5LW2tnTtjuhCR/nGRHkgeT/FCS/5fkvokvl2Rxkte01q599J4D5rvFvSsA7PdakidV1ReG5ackOaCqXjwsH5PkqgnbH5nkrKlCSlWtSfKO4XkluTTJk5Ksbq1tH8qvT/KK1trmSfsemuTAJEszOkvwsxmFpd9Ncvnw72eq6o2ttXdO2PWNSbbvDGCD30qyoareleSNbeS6JCuH9/p3SX65tXZKVX0gyetaa//4WB0FLCxCGNDbQ0m2tNZOTHY7EjbRg9N4vSR5fJKtSS5M8qmq+s5Q/oNJPlpVDyb5viS/1lrbkOTHknw9yZlJ/i7J7wzv9aIkr0pySZKXJPl4Vf1ta+2aqnpBkv+c5HkTK9Ba2zHMC/t0ks9X1a+01r46tGdlkguSnDJsfnJGQRTYzwhhQG+L9nD7JY+xvpKktXZPknXDiNpXW2v/IfneSNjpk0fCWmufqqqfS/K6JJ9L8isZzQV7SWvt9qpam9EpzBe31towcf+yJL+d5LqqOmh473uHl3xiRgHtXyf57vDeT05ydZKnJfnT0WBdVmQ0ataSPDXJGa21q/ewT4B5SAgDentckidX1cZh+ciMTkf+zLC8Mruejjw8yZVV9Q9TvNbiJP93UtlDU2y3Oz+c5KSMwtb3D3U5p6qelNFo1fUZzeu6oLV2X1U9o7X2z0kur6p3JLmztfaeJKmqT2U03+yyYfnYjOaYvSfJm1prPzqU35rkOa21+4dRv3/eg/oC85gQBvT2lCQbWms/kTz66ciqWpTk2UlOaq3d+GgvOtyS4tcyGjk7tKquHVb9YJKPVNX9GY1cLUvy7tbalRkFoF/K6ErK05LcleSc4TWWJVmfUYhKkgwBbKefSHL2pHbdMWF5e5Lfaq19tKre9ChVd2oS9hNCGNDbc5Jsmua2azK6avGmx9qwtfaBJB+YXL67ifmDJyT5RpI3JDkro5+RP5bRvLJvJHl7a+3eyTsN9ytbNMwt2+nJefiKzAzvt/M9H3F7oKo6MKNRvsea8wYsEO4TBvR2epL/MWF5SYY/EIdTeE9P8o/DrSzenuRdrbU9OcU42QHZ/c++g5J8Ocmbkzw/o9v4rE/y/iQvTfLnk3cYAtglGY26paoOraofS3J/a+07k7cfTLzL/+KM2vzVjOaE/e2eNgiYn4yEAd0MN1Jd3Fr76wnFX8nDk+9/PqNQ8oUkz8poZOndM3zbx2V0K4rJdamMbkPx9xldJfmMjIJYWmsfH27i+rYkvzphn/ckeWGSl7bWdo7mnZXkFzO6TcXuHDXh+c7Q+UOttQfGbBMwD7lZK8A0VdWSiUFpuELyfuEJGIcQBgDQgTlhAAAdCGEAAB3Mu4n5RxxxRFu5cmXvagAAPKZNmzZ9q7W2fKp18y6ErVy5Mhs3bnzsDQEAOquq23e3zulIAIAOhDAAgA6EMACADubdnDAAYH564IEHcscdd+T+++/vXZVZt3Tp0qxYsSJLlix57I0HQhgAsE/ccccdOeSQQ7Jy5cqMvilsYWitZdu2bbnjjjty9NFHT3s/pyMBgH3i/vvvz+GHH76gAliSVFUOP/zwPR7hE8IAgH1moQWwncZplxAGAOy3/umf/inf/e53u7y3EAYA7LcuuOCCXHHFFY8oP++88/K5z30ub37zm3PxxRfnvvvuy0/+5E/OamAzMR8A6GLluqtn9fU2X3zqHu+zePHiLFu2bJeyHTt25NBDD811112Xu+66K1u2bMntt9+egw8+OIsWLcpDDz2UJDnggJmNZRkJAwD2Wzt27Mh99923S9n27duzbdu2vO1tb8stt9ySI444Iu9///tz66235qSTTsqKFStm5SsUhTAAYL9100035Ytf/OIuZYsWLcrmzZvzhje8IStWrMiWLVvyxS9+MRdddFEuvfTSvPzlL8/xxx8/4/cWwgCA/dL27dtzzz335M4778xdd931vfIHH3ww559/fh544IGcc845ef3rX58nP/nJ2bRpU77+9a9n1apVs/L+5oQBAPul9773vTnjjDPylKc8JevWrcsHP/jBJMntt9+ec889N1/72tdyww035Prrr89tt92W0047LUmydu3aWXl/IQx4VLM9cba3cSbuAgvP3/zN3+STn/xk1q9fn0WLFuWKK67Ie97znpx99tl53vOel5e//OXZsGFD1q5dm2c+85lZsmRJjjvuuFx11VU577zzZqUOTkcCAPuVj3zkI3n1q1+dT3ziE1m0aFGS5Morr8xHP/rRnHHGGfnmN7+Zs846Ky94wQty4YUX5sYbb8xtt92Wr3zlK3nc4x6XL33pS7NSDyNhAEAXPUamb7nlllx11VX5zGc+k8MPP/x75Yceemg++9nP5q1vfWtuvfXWnH322Vm1alU2bNiQm266Ka985Svzzne+M0ceeWRe9rKX5Y/+6I/y9Kc/fUZ1qdbaTNuzT61evbrNxmWhwPQ4HQnMlptvvjnPeMYzeldjWh588MEsXjwaq2qt5aGHHvreqFlrbcqvKZqqfVW1qbW2eqr3cDoSAGCSnQEsGX0v5M4AtnN5NghhAMA+M9/OwE3XOO0SwgCAfWLp0qXZtm3bggtirbVs27YtS5cu3aP9TMwHAPaJFStW5I477sjWrVt7V2XWLV26NCtWrNijfYQwAGCfWLJkSY4++uje1ZgznI4EAOhACAMA6EAIAwDoQAgDAOhACAMA6EAIAwDoQAgDAOhACAMA6EAIAwDoQAgDAOhACAMA6EAIAwDoQAgDAOhACAMA6EAIAwDoQAgDAOhACAMA6EAIAwDoQAgDAOhACAMA6EAIAwDoYFohrKqOrKr1E5Yvr6rrqurcvVUGALCQPWYIq6rDknw4ycHD8mlJFrXWTkiyqqqOme2yvdNUAIC5YzojYd9NcnqSe4flNUk+Njy/JsmJe6FsF1V1ZlVtrKqNW7dunUaVAQDmtscMYa21e1tr2ycUHZzkzuH53UmO3Atlk+twWWttdWtt9fLly6fXMgCAOWycifk7khw0PF82vMZslwEALGjjBJ5NefiU4bFJNu+FMgCABW3xGPtclWR9VR2VZG2S5yZps1wGALCgTXskrLW2Zvj33owm029IcnJrbftsl81KywAA5rBxRsLSWvt2Hr6ica+UAQAsZCbBAwB0IIQBAHQghAEAdCCEAQB0IIQBAHQghAEAdCCEAQB0MNZ9wvYHK9dd3bsKs2rzxaf2rgIAMIGRMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAO9jiEVdVhVfUXVbWxqi4dyi6vquuq6twJ241dBgCw0I0zEnZGkj9sra1OckhVvSnJotbaCUlWVdUxVXXauGWz1C4AgDlt8Rj7bEvyI1X1hCQ/kGR7ko8N665JcmKSZ8+g7Gtj1AkAYF4ZJ4R9IcmpSV6X5OYkBya5c1h3d5Ljkhw8g7JHqKozk5yZJE996lPHqDLA+Fauu7p3FWbV5otP7V0FIOOdjjw/yVmttbckuSXJv09y0LBu2fCaO2ZQ9gittctaa6tba6uXL18+RpUBAOaWcULYYUmeVVWLkjwnycUZnUZMkmOTbE6yaQZlAAAL3jinI9+W5IokT0tyXZLfT7K+qo5KsjbJc5O0GZQBACx4ezwS1lr73621Z7bWlrXWTmmt3ZtkTZINSU5urW2fSdlsNAoAYK4bZyTsEVpr387DVznOuAwAYKFzx3wAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADmYUwqrqkqp6yfD88qq6rqrOnbB+7DIAgIVs7BBWVT+e5Emttf9ZVaclWdRaOyHJqqo6ZiZls9AuAIA5bawQVlVLkvxBks1V9dNJ1iT52LD6miQnzrAMAGBBG3ck7BeT3JTkHUmOT/KaJHcO6+5OcmSSg2dQtouqOrOqNlbVxq1bt45ZZQCAuWPcEPbsJJe11rYk+e9JPp/koGHdsuF1d8ygbBettctaa6tba6uXL18+ZpUBAOaOcUPYrUlWDc9XJ1mZh08jHptkc5JNMygDAFjQFo+53+VJPlhVr0iyJKN5XX9eVUclWZvkuUlakvVjlgEALGhjjYS11u5rrf1ca+2k1toJrbXbMwpiG5Kc3Frb3lq7d9yymTYKAGCuG3ck7BFaa9/Ow1c5zrgMAGAhc8d8AIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA6EMACADoQwAIAOhDAAgA4W964AzGUr113duwqzavPFp/auAgADI2EAAB0IYQAAHQhhAAAdCGEAAB0IYQAAHQhhAAAdCGEAAB0IYQAAHQhhAAAdCGEAAB0IYQAAHQhhAAAdCGEAAB0IYQAAHQhhAAAdCGEAAB0IYQAAHQhhAAAdCGEAAB0IYQAAHQhhAAAdCGEAAB0IYQAAHQhhAAAdCGEAAB0IYQAAHQhhAAAdCGEAAB0IYQAAHQhhAAAdCGEAAB0IYQAAHSwed8eqOjLJp1prz66qy5P8cJKrW2sXDuvHLgNgblm57ureVZhVmy8+tXcVYEYjYe9MclBVnZZkUWvthCSrquqYmZTNtEEAAPPBWCGsqp6f5B+SbEmyJsnHhlXXJDlxhmVTvd+ZVbWxqjZu3bp1nCoDAMwpexzCqurAJL+dZN1QdHCSO4fndyc5coZlj9Bau6y1trq1tnr58uV7WmUAgDlnnJGwdUkuaa3dMyzvSHLQ8HzZ8JozKQMAWPDGCT0vTPKaqro2yY8meUkePo14bJLNSTbNoAwAYMHb46sjW2sn7Xw+BLGXJllfVUclWZvkuUnaDMoAABa8GZ3+a62taa3dm9EE+w1JTm6tbZ9J2UzqAwAwX4x9n7CJWmvfzsNXOc64DABgoTMRHgCgAyEMAKADIQwAoAMhDACgAyEMAKADIQwAoINZuUUFACx0K9dd3bsKs2rzxaf2rsJ+z0gYAEAHQhgAQAdCGABAB0IYAEAHQhgAQAdCGABAB0IYAEAHQhgAQAdCGABAB0IYAEAHQhgAQAdCGABAB0IYAEAHQhgAQAdCGABAB0IYAEAHQhgAQAdCGABAB0IYAEAHQhgAQAdCGABAB0IYAEAHQhgAQAdCGABAB0IYAEAHQhgAQAdCGABAB0IYAEAHQhgAQAdCGABAB0IYAEAHQhgAQAdCGABAB0IYAEAHQhgAQAdCGABAB0IYAEAHQhgAQAdCGABAB2OFsKp6fFV9sqquqao/raoDq+ryqrquqs6dsN3YZQAAC9m4I2E/n+T3WmsvSrIlySuSLGqtnZBkVVUdU1WnjVs282YBAMxti8fZqbV2yYTF5Ul+Icm7h+VrkpyY5NlJPjZm2dfGqRcAwHwxozlhVXVCksOSfCPJnUPx3UmOTHLwDMomv8+ZVbWxqjZu3bp1JlUGAJgTxg5hVfXEJO9L8stJdiQ5aFi1bHjdmZTtorV2WWttdWtt9fLly8etMgDAnDHuxPwDk3w8yW+21m5Psimj04hJcmySzTMsAwBY0MaaE5bkVUmOS/LmqnpzkiuSnFFVRyVZm+S5SVqS9WOWAQAsaGONhLXW/mtr7bDW2prh8eEka5JsSHJya217a+3ecctm2igAgLlu3JGwR2itfTsPX+U44zIAgIXMHfMBADoQwgAAOhDCAAA6EMIAADoQwgAAOhDCAAA6EMIAADoQwgAAOhDCAAA6EMIAADoQwgAAOhDCAAA6EMIAADoQwgAAOhDCAAA6EMIAADoQwgAAOhDCAAA6EMIAADoQwgAAOhDCAAA6EMIAADoQwgAAOhDCAAA6EMIAADoQwgAAOhDCAAA6WNy7AgDA/LBy3dW9qzCrNl98atf3NxIGANCBEAYA0IEQBgDQgRAGANCBEAYA0IEQBgDQgRAGANCBEAYA0IEQBgDQgRAGANCBEAYA0IEQBgDQgRAGANCBEAYA0IEQBgDQgRAGANCBEAYA0IEQBgDQgRAGANDBnAlhVXV5VV1XVef2rgsAwN42J0JYVZ2WZFFr7YQkq6rqmN51AgDYm+ZECEuyJsnHhufXJDmxX1UAAPa+aq31rkOq6vIk722t3VBVL0pyXGvt4gnrz0xy5rD4L5J8tUM195YjknyrdyU60wf6INEHiT5I9EGiD5KF1QdPa60tn2rF4n1dk93YkeSg4fmyTBqha61dluSyfV2pfaGqNrbWVveuR0/6QB8k+iDRB4k+SPRBsv/0wVw5HbkpD5+CPDbJ5n5VAQDY++bKSNhVSdZX1VFJ1iZ5bt/qAADsXXNiJKy1dm9Gk/M3JDm5tba9b432qQV5mnUP6QN9kOiDRB8k+iDRB8l+0gdzYmI+AMD+Zk6MhLH/qKojq+r5vesB+0pVPbGqTqmqI/bFfnORPmCnqvqpqnp873rMFULYXlJVH6qqtw7Pf2dYPnHC8i9MLJu034K8T1pVPSnJu5JcP4Sx9RPWHVZVf1FVG6vq0n61nD0T2zjd9lXV46vqk1V1TVX9aVUdOOn1vrwv6j4bpmpLVX29qq4dHs/ag/2Orqqrq2p9Vb1rX7dlXFV1WJL/leT4JJ+rquXT+XaQqfabsG6+HQdT9cF0joOp9puXx0Gy6+e2J98QM2m/edv+CW5I8v6qesLuNqiqxVMdIwvxm3WEsL3rP1bV0t6VmAuGiy7ekeS1SVqSDyc5eMImZyT5w+GS5EOqal5fmjz8ApnYxum27+eT/F5r7UVJtiR58YR178zDt3KZDya3ZV2SP26trRkefzfN/V6c5O1JLmit/XiSFVW1Zq/Xfnb8qyTntNYuSvKXSZ6f6X07yOT9jpuwbr4dB5Pb8suZ3nEwVR/M1+MgGT632vNviJn4ec/n9idJWmt3Jvn1JO8d/ji9dELYuraqzsvos9/lGBmj3+aFuXJ15EJ1Y0a/UPZrVfWUJBcleW1r7d6qOjTJ6Un+bMJm25L8yPDX0Q8k+cY+r+js+m52beO02tdau2TC4vIkdyVJjU7h/kNGoWRemKIt30jyb6rq5CR/l+RXW2sPTmO/u5L8UJIvDWV3JZkXpzNaa3+VJFV1UkYjOk/MI78d5GvT2O8tw/J8PA4mt+UTmd5xMFUfvD3z8DiY9LmtyTSOgSn2S+bp/4PJWmtbquqNSd6d5PWttbsnrq+qV2fSMZI96Lf5xEjY3vVfMjp49ndrknxpuAo2rbV7p7gC9gtJnpbkdUluTnJ35rEp2viI9u3mL8AkSVWdkOSw1tqG4ZTkb2c0kjTv7GxLkk8neWFr7fgkS5L81HT7IKNf3OdX1UsyGhn7zL5vyXiqqjIK5N/OaBT4zmHV3UmO3F0fTNrvgfl8HExqy5czzeNgch9kHh4HU3xuB2cax8BuPu951/7daa3dleTvM/Utqf5PJh0jmaLf9kU99zYjYXvXliS3ZBRCbpu0br+5LLW19odV9UtV9Z9aa+/bzWbnJzlrGCk7J8krs7AuUX5E+1prUwb0qnpikvcl+dmhaF2SS1pr94x+J80fk9qypbX2nWHVxiTHTLcPWmsX1miu5K8n+XBrbcder/wsaaNL0F9TVRckeVmSPxhWLUtywO76YNJ+L83oK9vm5XEwqS1HtdZ2zgd91ONgch/M0+Ng8v/fR3xDzFTtH4LoLp/3PG3/lIZ5XTck+emqetOEVZ9N8vbJPyvyGN+sM18tiEbMcb+f5CeSfDPJqqFsVebR6YTZ0Fr7cJJvTfrPNtFhSZ5VVYuSPCcLL6ROq33DX78fT/KbrbXbh+IXZvSL6NokP1pVH9gH9Z2xKdpyZVUdO/TBz2T0A3g6++10fZKnJvm9vVnv2VRVv1FVvzgsPiHJxZnGt4NMsd89mb/HweS2/LdpHgdT9UEy/46DXT63JC/J9L4hZnef9/WZX+1/hBpdtPbl1tpVrbVfnTD3a01r7S2Z+mfFwvxmndaax154JPlQkhOH59cOy3+dZH1G57UPGMpuySjpb0xy2lRlvdsyy/1yWpJX7OyXCeXHJ/lKRn/tfDrJst51naX2Xrsn7Uvyaxmderl2eJw+1evNh8cUbTk/yd9mNMfjoj3Y7/Sh/HeTnNG7XXvYBztPw34+ySUZzeG5IaNfoDcnefw096t5fBxMbsuzpnkcTNkH8/E4mPi5JTl0OsfA7j7v+dz+of5nJTnlMbb5kcnHyDj9Nh8ebtYKsA8NV86ekuTzrbX9akScEcfAeBZivwlhAAAdmBMGANCBEAYA0IEQBgDQgRAGANCBEAYA0IEQBgDQwf8H8cDUaF56b1sAAAAASUVORK5CYII=\n", 115 | "text/plain": [ 116 | "
" 117 | ] 118 | }, 119 | "metadata": { 120 | "needs_background": "light" 121 | }, 122 | "output_type": "display_data" 123 | } 124 | ], 125 | "source": [ 126 | "#年龄分布可视化\n", 127 | "fig = plt.figure(figsize = (10, 6))\n", 128 | "x = np.array([\"NULL\",\"<18\",\"18-24\",\"25-29\",\"30-34\",\"35-39\",\"40-49\",\">=50\"])\n", 129 | "#<18岁为1;[18,24]为2; [25,29]为3; [30,34]为4;[35,39]为5;[40,49]为6; > = 50时为7和8\n", 130 | "y = np.array([user_info[user_info['age_range'] == -1]['age_range'].count(),\n", 131 | " user_info[user_info['age_range'] == 1]['age_range'].count(),\n", 132 | " user_info[user_info['age_range'] == 2]['age_range'].count(),\n", 133 | " user_info[user_info['age_range'] == 3]['age_range'].count(),\n", 134 | " user_info[user_info['age_range'] == 4]['age_range'].count(),\n", 135 | " user_info[user_info['age_range'] == 5]['age_range'].count(),\n", 136 | " user_info[user_info['age_range'] == 6]['age_range'].count(),\n", 137 | " user_info[user_info['age_range'] == 7]['age_range'].count() + \n", 138 | " user_info[user_info['age_range'] == 8]['age_range'].count()])\n", 139 | "plt.bar(x,y,label='人数')\n", 140 | "plt.legend()\n", 141 | "plt.title('用户年龄分布')" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 6, 147 | "id": "21ae9565", 148 | "metadata": {}, 149 | "outputs": [ 150 | { 151 | "data": { 152 | "text/plain": [ 153 | "Text(0.5, 1.0, '用户性别分布')" 154 | ] 155 | }, 156 | "execution_count": 6, 157 | "metadata": {}, 158 | "output_type": "execute_result" 159 | }, 160 | { 161 | "data": { 162 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAESCAYAAADqoDJEAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAASfklEQVR4nO3de7BdZ13G8e9DkmppqdPSQ6RViYFwUdtYCNBIrScISKSjpTBWBgEtGjoyon8gWlsUEXFEZYqdAY1ULAWrxWmRUZHgpTZgKpyovah06iVtia0G20mMtRfozz/2itk9yUn2m7Mv53R/PzN7zl6/vS7v4vTsh/d911pJVSFJ0qCeMOkGSJKWF4NDktTE4JAkNTE4JElNDA5JUhODQ2qU5KuO8vmTk6wcV3ukcTM4pAZJvhf4vaOstg14+rzt3p7kp4/heM9M8jcLfPbs7ufaJC8eYF9PSPJXSZ7W2g6pn8GhqZPkSUkeSjLXve5Osrtv+UtJLlxg808BT0nyxCMc4iHg4e5Yv5jkvP7avLa8O8k9Sb7QvR6Zt++HgUcOs90rgD9IEqCArUdpE8Bm4IlVdedR1pOOyO60ptHDwD1VtQEgyVuBr66qd3XLv0Pfl3WSXwdeNm8ff9v7zgbgn6rqld2X+HHAo8ArknwEeB7wceDZwKMH1qmqh7ptHwHeXlUf7I61C3g4yQ8A3wa8Z37jkzwB+Dng0urdwftvSf4EuBzY0rfeRuAaYD/wZeCZwD1J/r5/d/S+B95cVTcc5X83CTA4NJ0K+Nokn+mWTweekOTl3fI6el/2B6wGLj7cF2uSWQ5+uX8D8MHu/bcAPwicBXyg20cBrwPuBs4/sAvgtCTf0i2v6moPcpgeSuetwN6q+sO+2s8ANyX5NeCt1bMDWNO18zXARVX10iQfBN5SVQ8ssH/piAwOTaNHgXur6hxYsMfR78sD7A96vYp/Ap4LXA38MDBXVa/ojvHlqrp83ra3A68EXtNtu4MjDCEn+U7gJ4AX9deran83z/Fp4MYkP1xVt3fbrAF+AXhpt/omeiEmHRODQ9NoReP6q47y+YExq6cAn6EXHLuBTwD7jrRhVX0U+GiSB4D1VXVgbuTQgyRPArYCbwd2JDm+O/aBY5xCL1SeD3yl2+apwB8DTwOu7/b7dfR6J0Wvl/S6qvrjo5yj9P8MDk2jrwKemmSuW15Nb6jq/G55DY8dqnoycHWS/znMvlYC/w5QVVcDJHlLt3x9kl/qho+eDFSSNwC3VtXru6uifpPePMcK4E+6L/YbgX+cf6Cq+u8kz+nC5cok7wF2V9X7uuP+KfCFqtraLa8HrgPeB7ytqr61q/8z8MKqerDrXS00JCYdlsGhaXQ6cFNVfQcceagqyQp68xTnVtVtrQeqqkuASxYYqtoF/BC9nsoH6c2JPB94FfAFDvZk+vfX/yX/HcCPzzuvL/Yt7wV+pqp+P8nbjtTMAU9HAgwOTacXAjsHXHeWXo/gkB7AYlXVg8C/dnMTf11VX0zyHOB/6c2rLDh53d1PsqKqbuorP5XeENmB/e+iF05wmHmTJMfR6wkdbQ5HegyDQ9PoQuBdfcur6P4WuuGdpwMPdHeI/zLwa1X16CF7WdhKYEWSVUBV1WO+mLsv7C9X1aNJngn8LHBe9/GJwANVdR1wXZKnM69H0IXG++muzEpyEr2ruB7su8x3vv673Vd253wLvfmRWxrOTTI4NF2SfB2wsqo+21f+Bw5OgL+W3hfpZ4Az6P0/+MsbD7OK3v0cFwEXJem/J+TV9L7EL+4mu68FfqKqbknyDuCNwE91654PvA24o2/79wEvAb6nqg70mi4GXk/vktyFnDavfSuBZ1bVITcXSkcT/wVAaTKSnEDvSqq/7pafQ6+3cWe3vJZeeH36wD0XXdg86Be+JsngkCQ18VlVkqQmBockqcnjfnL81FNPrTVr1ky6GZK0rOzcufNLVTVzuM8e98GxZs0a5ubmjr6iJOn/JVnw8fsOVUmSmhgckqQmBockqYnBIUlqYnBIkpoYHJKkJgaHJKmJwSFJamJwSJKaPO7vHNf0uOudZ0y6CY973/Czt066CVoC7HFIkpoYHJKkJgaHJKmJwSFJamJwSJKaGBySpCYGhySpicEhSWpicEiSmhgckqQmBockqYnBIUlqYnBIkpoYHJKkJgaHJKmJwSFJamJwSJKaGBySpCYGhySpicEhSWpicEiSmhgckqQmQw+OJF+T5JNJtiW5PslxSe5KckP3OqNb78okO5Jc1rftMdckSeMxih7Ha4H3VtXLgHuBnwauqarZ7nVrkguAFVW1EVibZN1iaiM4B0nSAlYOe4dV9f6+xRngbuC8JJuAW4E3AbPAtd0624BzgLMWUbtj2OchSTq8kc1xJNkInAx8GnhJVb0AWAV8N3ACsLtb9T5g9SJr84+9Jclckrk9e/YM+cwkabqNJDiSnAJcAVwE3FJV93QfzQHrgP3A8V3txK4di6k9RlVtraoNVbVhZmZmiGcmSRrF5PhxwMeAS6rqTuDqJOuTrADOB24GdtIbYgJYD+xaZE2SNCZDn+MA3gg8F7g0yaXAXwJXAwE+UVV/luQkYHuS04DNwNlALaImSRqTUUyOfwD4wLzyz89bZ1+SWeClwHuqai/AYmqSpPEYRY9jIFV1Pwevjlp0TZI0Ht45LklqYnBIkpoYHJKkJgaHJKmJwSFJamJwSJKaGBySpCYGhySpicEhSWpicEiSmhgckqQmBockqYnBIUlqYnBIkpoYHJKkJgaHJKmJwSFJamJwSJKaGBySpCYGhySpicEhSWpicEiSmhgckqQmBockqYnBIUlqYnBIkpoYHJKkJkMPjiRfk+STSbYluT7JcUmuTLIjyWV96w21Jkkaj1H0OF4LvLeqXgbcC3w/sKKqNgJrk6xLcsEwayM4B0nSAlYOe4dV9f6+xRngB4DLu+VtwDnAWcC1Q6zdMeTTkCQtYGRzHEk2AicDdwO7u/J9wGrghCHX5h97S5K5JHN79uwZ4llJkkYSHElOAa4ALgL2A8d3H53YHXPYtceoqq1VtaGqNszMzAzvxCRJI5kcPw74GHBJVd0J7KQ3nASwHtg1gpokaUyGPscBvBF4LnBpkkuBDwGvS3IasBk4Gyhg+xBrkqQxGXqPo6o+UFUnV9Vs97oKmAVuAjZV1d6q2jfM2rDPQZK0sFH0OA5RVfdz8EqokdQkSePhneOSpCYGhySpicEhSWpicEiSmhgckqQmBockqYnBIUlqYnBIkpoYHJKkJgaHJKmJwSFJamJwSJKaGBySpCYGhySpicEhSWpicEiSmhgckqQmBockqYnBIUlqYnBIkpocU3AkOWfYDZEkLQ8DBUeST88r/dII2iJJWgZWHunDJGcCZwGnJ3l9Vz4BeHDUDZMkLU1H63HkMD//C/i+kbVIkrSkHbHHUVU3AzcneVZVfXhMbZIkLWFHDI4+lyf5fuC4AwWDRJKm06BXVf0p8Ax6Q1UHXpKkKTRoj+O/q+pdI22JJGlZGLTHsT3JNUk2Jzk3yblHWjnJ6iTbu/crk9yV5IbudUZXvzLJjiSX9W13zDVJ0ngMGhyPAF8Ang9sAmYXWjHJycBV9C7bBTgTuKaqZrvXrUkuAFZU1UZgbZJ1i6m1n7Yk6VgNGhy7gH8D7ux+7jrCul8BLgT2dctnA+cl+VzXU1hJL3iu7T7fBpyzyNpjJNmSZC7J3J49ewY8RUnSIFoeORLgeOACYMGhqqraV1V7+0qfB15SVS8AVgHfTa83srv7/D5g9SJr89uwtao2VNWGmZmZhlOUJB3NQJPjVXVV3+JvJHl/wzFuqaqHuvdzwDpgP70QAjiRXoAtpiZJGpNBn1V1bt/r1cA3NRzj6iTrk6wAzgduBnZycIhpPb2hr8XUJEljMujluJuA6t4/DLy54RjvBH6X3lDXJ6rqz5KcRO9KrdOAzfTmQWoRNUnSmAw6zPNu4D+AU4AvAbcfbYOqmu1+3lZVZ1bVGVV1aVfbR2+S+yZgU1XtXUxtwHOQJA3BoMHx2/QmoT8JnA58aLEHrqr7q+raqrp3GDVJ0ngMOlT19VX1uu79p5L81agaJEla2gYNjn9PcgnwN8BGDl4OK0maMoMOVV1ML2ReTe/GvjeNrEWSpCVt0OD4CHBXVf0o8CR6cx6SpCk0aHCcfOAmwKp6N3Dq6JokSVrKBp3j+GKSnwI+R+9Bh/85uiZJkpayQXscPwg8QG+O43+BN4yqQZKkpW3QZ1U9BFwx4rZIkpYBHxAoSWpicEiSmhgckqQmBockqYnBIUlqYnBIkpoMegOgJI3Ui6540aSb8Lj32R/77FD2Y49DktTE4JAkNTE4JElNDA5JUhODQ5LUxOCQJDUxOCRJTQwOSVITg0OS1MTgkCQ1MTgkSU0MDklSk5EER5LVSbb3LV+ZZEeSy0ZVkySNx9CDI8nJwFXACd3yBcCKqtoIrE2ybti1YZ+DJGlho+hxfAW4ENjXLc8C13bvtwHnjKD2GEm2JJlLMrdnz55Fn5Ak6aChB0dV7auqvX2lE4Dd3fv7gNUjqM1vw9aq2lBVG2ZmZoZxWpKkzjgmx/cDx3fvT+yOOeyaJGlMxvGlu5ODw0nrgV0jqEmSxmQc/3Tsx4HtSU4DNgNnAzXkmiRpTEbW46iq2e7nPnoT2jcBm6pq77BrozoHSdKhxtHjoKru5+CVUCOpSZLGw4llSVITg0OS1MTgkCQ1MTgkSU0MDklSE4NDktTE4JAkNTE4JElNDA5JUhODQ5LUxOCQJDUxOCRJTQwOSVITg0OS1MTgkCQ1MTgkSU0MDklSE4NDktTE4JAkNTE4JElNDA5JUhODQ5LUxOCQJDUxOCRJTQwOSVITg0OS1MTgkCQ1GXlwJFmZ5K4kN3SvM5JcmWRHksv61jvmmiRpfMbR4zgTuKaqZqtqFlgHrKiqjcDaJOuSXHCstTG0X5LUZ+UYjnE2cF6STcCtwEPAtd1n24BzgLMWUbtjxO2XJPUZR4/j88BLquoFwCpgM7C7++w+YDVwwiJqh0iyJclckrk9e/YM92wkacqNIzhuqap7uvdzwKnA8d3yiV0b9i+idoiq2lpVG6pqw8zMzBBPRZI0juC4Osn6JCuA84E30xtiAlgP7AJ2LqImSRqjccxxvBP4XSDAJ4CPA9uTnEZv2OpsoBZRkySN0ch7HFV1W1WdWVVnVNWlVbUPmAVuAjZV1d7F1EbdfknSY42jx3GIqrqfg1dHLbomSRof7xyXJDUxOCRJTQwOSVITg0OS1MTgkCQ1MTgkSU0MDklSE4NDktTE4JAkNTE4JElNDA5JUhODQ5LUxOCQJDUxOCRJTQwOSVITg0OS1MTgkCQ1MTgkSU0m8k/HLlXP+8kPT7oJU2Hnr7x+0k2QtAj2OCRJTQwOSVITg0OS1MTgkCQ1MTgkSU0MDklSE4NDktTE4JAkNVm2wZHkyiQ7klw26bZI0jRZlsGR5AJgRVVtBNYmWTfpNknStFiWwQHMAtd277cB50yuKZI0XZbrs6pOAHZ37+8Dntv/YZItwJZucX+S28fYtnE7FfjSpBvRIr/6hkk3YSlZXr+/n8ukW7CULK/fHZC3NP3+nrbQB8s1OPYDx3fvT2Rez6mqtgJbx92oSUgyV1UbJt0OHRt/f8vXNP/ulutQ1U4ODk+tB3ZNrimSNF2Wa4/j48D2JKcBm4GzJ9scSZoey7LHUVX76E2Q3wRsqqq9k23RRE3FkNzjmL+/5Wtqf3epqkm3QZK0jCzLHockaXIMjmUuyeok2yfdDmnaTPPfnsGxjCU5GbiK3n0tWkZ8ZM7yNu1/ewbH8vYV4EJg36QbosH5yJzHhan+21uul+NOpSS/CTyrr/QXVfXOxLt5l5lZDn1kzh0Ta42adVd2Mq1/ewbHMlJVb5p0GzQUR3xkjrTUOVQljd8RH5kjLXX+ByuNn4/M0bLmDYDSmCU5CdgO/DndI3Om/OkHWmYMDmkCuss5XwrcWFX3Tro9UguDQ5LUxDkOSVITg0OS1MTgkJaAJO9IMjvpdkiDMDgkSU28c1xapCTHA9cBpwD/AtxG727wpwC3VtWbk7wDWAV8O3AS8HLgIeBjwAogwA1Jngh8uH/b7hg3AJ8Hzqyq7xrbyUmHYY9DWrxnA1+kd1PfM4AHgNuq6lzgqUnO7NZ7Rle7DngxsAX4o6raBDzSrbNlgW3PBnYYGloKDA5p8XYDzwNuBN5H70GUr+x6CWuB07v1Ptz9vAs4DvhG4OauNtf9XGjb26rqutGdgjQ4h6qkxXs58AtVdT1Akhngc1X1oSTn0QuKFwL/M2+7u4BvBv4S+FbgU8Dth9kWes+3kpYEexzS4v0dcEWSv0jye/SCYHOSG4GLgbsX2G4r8Kqud3FSV/utAbeVJsY7x6VFSvIjwGvozVM8AvxqVd0w0UZJI2RwSJKaOFQlSWpicEiSmhgckqQmBockqYnBIUlqYnBIkpr8H3c5ZeKklhTFAAAAAElFTkSuQmCC\n", 163 | "text/plain": [ 164 | "
" 165 | ] 166 | }, 167 | "metadata": { 168 | "needs_background": "light" 169 | }, 170 | "output_type": "display_data" 171 | } 172 | ], 173 | "source": [ 174 | "sns.countplot(x='gender',order=[-1,0,1],data=user_info)\n", 175 | "plt.title('用户性别分布')" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": 7, 181 | "id": "a9a9558f", 182 | "metadata": {}, 183 | "outputs": [ 184 | { 185 | "data": { 186 | "text/plain": [ 187 | "'\\n1.年龄空值的比较多,性别空值的少\\n2.年龄主要在18-39之间\\n3.大多数是女性\\n'" 188 | ] 189 | }, 190 | "execution_count": 7, 191 | "metadata": {}, 192 | "output_type": "execute_result" 193 | }, 194 | { 195 | "data": { 196 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAESCAYAAAAIfCk9AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAcxklEQVR4nO3de5RV5Znn8e/PEgVBFLCCClYQgxoWSNTSgAJd5XgJE5MoE6NZiXgFNY5JViadaLw0Y4zpdtK9SDvRWBEz0cQodtS2TURcIyVgU6aLRPHSumQUFVQkQCCgKNFn/ti74FDULk7VuVbV77NWrTr7PXvv99lo7ee8l/0eRQRmZmYd2aPSAZiZWfVykjAzs0xOEmZmlslJwszMMjlJmJlZJicJ6/Uk7V2GOgZI6tLfk6RhkvYsVUxmxeAkYb2apC8A9+Sx33+XdHEXztsiad+coq8Cd3QxvAXAYe3Oe62kK7t4HiQdLumpjPeOTH+PlnRSHufaQ9ITkj7e1Tis93GSsKonaV9J70tqTX/ekLQ6Z/tPks7OOPxR4GOS9unk/COBG4FXO3hvz9yWiKSLJB0AbAPel9QvfasBOEXS4C5c2vvAB+l5fyDp9NyydnHcKOktSS+mP9vaXdMHaUztj/ss8C+SBATQ1Nm/RWoasE9EvNaFa7FeyknCeoIPgLcioj4i6oEfA7fmbD9Mzg1S0j+33UyBp4FhwB9ybrAP5Oy7J3AXsAa4VlKzpM1pS6EZeAK4JSeWk4F9gI+A3wEzJQ0FGoE7SZJNp5TYOz3HZyXtDxybxvAR8FHOPm22AddGxJERcSSwGvhA0lcl3UIH0u6vvwOujsSracxz2u03SdJKSc9Jehq4Dxgq6emcn2ckPS+pYXfXZ72L+0OtJwjgQElL0u0RwB6SPpNujwEezNl/OHBpRDS3P1F6k7spfS3gNuBAoD4iNqblTwPnRMTKdscOBvYC+pN8wPpvwF+B/wnMTX//X0nfjogfdXI9dcDt6etxwPnA0cCtaewBnAu8AZzRVj1wsKRx6Xa/tGwrHbQ8Ut8GNkbEv+aUfQ9okfSPwLfT5LEUGJVe45eBCyPiFEm3A1+PiHc7uRbr5ZwkrCf4CHg7IiYDSPo20D8ibki3/0+7/f+ax/kA9gPWAjcA8yW9n5Z/ArhX0l9JWg2XRUQLcALwOjALeBaYndZ1KnARSYvjc8B9kpZHxIKM+o8E/hM4hqQVczHQGhGfTa/trxExp90xLwFnAl9Oj11KJz0Bkv4L8E3gxNzyiNicjks8BiySdHFEvJQeMwr4PnBKunsjScKyPsxJwnqCmi7u32837wsgIv4MXJm2SF6KiPNhe0vi7PYtiYiYL+ks4OvAQpKbewPwuYh4TdI0kk/2n4mIkPSvwKfb1X0HyU1+CUmSWA08BGzqLOCI+BXwK0nvAhMiom0sY9eLSwbUm4BrgaWSBqTX3FbHUJIEchzwYXrMQcBvgY8DD6TnHUnS6giS1s+5EfHbzuK03sdJwnqCvYGDJLWm28NJupvOSLdHsXN30zDgLklbOjjXnsCb7co+6mC/LGOBqSTJ4GNpLN+SdCDJp+6ngV8D34+IL3R2IklfB4iIByT9MO0CGgaEpPOAZyNiRjo76TaScYka4HfpTXwR8EL780bEXyR9Mk0kcyXdBKyOiB+n9c4HXoyIpnR7AnA/yVjPdyLiU2n5CuDTEbE1ba1ldWtZL+YkYT3BCKAlIv4GOu9uklRD0r8/NSKe6+ykSqa8XkbS8hicDlRD0t10j6StJJ/ABwFzIuIukhvleSQzoaYD7wDfSs8xCFhMcrPtkoi4Crgqo7tpJXABSVK6nWQM4ziSMZEX0xjbny/3hv43wDdytkcAq3K2NwLfi4h7JX2nszDzvBzrRZwkrCf4NLAsz30bSD5x7/IJu72IuJ0dA8jbZQ1cp/YnGVD+H8ClJH9DJ5CMa7wB/ENEdNp11FURsRV4JR1L+PeIWCXpk8B7JGMimQPLSp4TqUnHVNocRNLN1Xb+lSSJCDoY55C0F0kLZ3djPdYLeQqs9QRnA7/J2e5H+gEn7So5DHg3nTL6D8A/RkRXupDa24Psv40BwB+Bq4GTAEXEYuB/A58nGV/I155AjaR+6uDJa0l7pdNYkXQ4cF1aDyStlncj4v60FVJDu0/6aYK4haS1hKTBkk4AtkbE+3Qsd9rtniT/1i+RjEks78K1WS/hloRVNSUPuu0ZEU/mFD/PjsHpr5DcvJYA40k+Ic8psNq9Saa6to9FJNNcXyGZ5fRJkkRBRNyn5CG7HwKX5FlPv7SeC4ELJeU+6/HFNI5L04HoecA3I2K5pNkks6m+m+57BvAd4OWc439M8kzH5yOirRV2KTCDZBpsloPbxbcncHhE7PKgnvUN8jfTmRWPpH7FvqFKGkgyo+nf0+1PkrQiXku3R5MkyMfanmlIE8tW39ytUE4SZmaWyWMSZmaWyUnCzMwy9aqB6wMOOCBGjRpV6TDMzHqUZcuW/Skiajt6r1cliVGjRtHa2rr7Hc3MbDtJmcvCu7vJzMwyOUmYmVkmJwkzM8vUq8YkzMwKtW3bNlatWsXWrVsrHUrR9e/fn5EjR9Kv3+5W09/BScLMLMeqVavYd999GTVqVIff19FTRQTr1q1j1apVHHrooXkf5+4mM7McW7duZdiwYb0qQUDyBVXDhg3rcgvJScLMrJ3eliDadOe6nCTMzCyTk4SZWRWYPXs2zc3NlQ5jFx64th7t9evH57Vf3XXPljgSs97JScLMrEDvvfce06dPZ/369Rx22GGMGzeOP/zhD7zzzjuMHz+en/zkJ8yePZtt27axePFiNm3axPz589l7770566yz+PDDD4kIGhoaePfdd5kxY8ZOxwI0NDRw3HHHsXz5ch599NGyXZu7m8zMCvTiiy8ycuRIlixZwooVK9hnn30YN24cixYt4q233mL58uSbX1esWMGiRYuYPn06jz/+OE1NTZx++uksXLhw+7MLTU1NHR7b0tLCpEmTypogwEnCzKxgI0aMYNmyZUydOpVvfOMbvPTSSzzwwAM0NDTwyiuvsHr1agBmzJgBQF1dHR988AGvvvoqEyZMAKC+vh4g89hx48Yxffr0sl+bu5vMzAo0f/58rr32Ws4880wA1q5dy/HHH88FF1zAww8/TF1dHU899RQDBw7c6bi6ujqef/55GhsbefrppznttNM44ogjdjkWYNCgQWW/LnBLwsysYEcffTRXXHEFJ510Eueccw6NjY088sgjTJ06lZ/+9KcccsghHR43a9YsfvOb39DQ0MCmTZsAmDlzZl7HlotbEmZmBWppaeHwww+nX79+bN68mQ0bNjBv3ryd9pk9e/b21+eff/721wsXLtzlfO2PBSo2PdZJwsysQDNnzmTmzJmVDqMk3N1kZmaZnCTMzCxT0bubJF0GnJ1u7g88ldYzFvhtRNyQ7je3u2VmZlYeRU8SEXErcCuApJuBN4CxETFJ0h2SxgDjgZrulEXEy8WO2cwsy7F/e2dRz7fsf80o6vlKrWTdTZJGAMOBkUDbUP0CYDLQUECZmVmfsmbNGqZMmbLb/S666CImTZrEDTcUr9OllGMSl5O0KAYCq9Oy9SSJo5CynUiaJalVUuvatWtLcBlmZpWzYcMGzjvvPLZs2dLpfvfffz8ffvghS5cu5ZVXXuHll4vT6VKSJCFpD6ARaAY2AwPStwaldRZStpOIaIqI+oior62tLfq1mJlVUk1NDffeey+DBw/udL/m5ma+9KUvAXDqqaeyZMmSotRfqpbEFOCpiAhgGTu6iSYAKwssMzPrtS655BIaGhq2/8yZM4f99ttvt8dt2bKFESNGADB06FDWrFlTlHhK9TDdacCi9PWDwGJJBwPTgIlAFFBmZtZr3Xbbbd06btCgQbz33nsAbN68mY8++qgo8ZSkJRER34uI+9PXm0gGoFuAxojYWEhZKeI1M+vpjj322O1dTM888wyjRo0qynnLsixHRGxgxyylgsvMzMqlGqesvvDCC9x99907zWI644wzmDJlCm+++SaPPPIILS0tRanLT1ybmfUAuQv8jR07dpdproMHD6a5uZmJEyeycOHCvMYx8uEF/szMeokhQ4Zsn+FULE4SwOvXj89rv7rrni1xJGZm1cXdTWZmlslJwszMMrm7ycysE/l2R+erp3VbuyVhZlZl8lmorxSL+XXEScLMrIrks1BfqRbz64iThJlZFclnob5SLebXEScJM7Mqks9CfaVazK8jThJmZlUkn4X6SrWYX0ecJMzMqkg+C/WVajG/jngKrJlZJ8o9ZbX9Qn333HMP11xzTVkW8+uIWxJmZlWk/UJ9EyZMKNtifh1xS8LMrMrks1BfKRbz64hbEmZmlslJwszMMjlJmJlZJo9JmJl14sSbTyzq+Z684sminq/UStaSkHSLpM+lr+dKWirpmpz3u11mZtbbrVmzhilTpnS6TzkW+StJkpA0BTgwIv5N0nSgJiImAaMljSmkrBTxmplVkw0bNnDeeeexZcuWzH3Ktchf0ZOEpH7Az4CVkr4ANADz0rcXAJMLLGtf3yxJrZJa165dW9yLMTOrgJqaGu69914GDx6cuU+5FvkrRUtiBvACcBNwPHA5sDp9bz0wHBhYQNlOIqIpIuojor62trboF2NmVm6DBw/e7QNy5VrkrxQD10cDTRHxtqRfAicAA9L3BpEkps0FlJmZ9XnlWuSvFDfdFcDo9HU9MIod3UQTgJXAsgLKzMz6vHIt8leKlsRc4A5J5wD9SMYVHpJ0MDANmAgEsLibZWZmZVMNU1ZfeOEF7r777oos8lf0lkRE/CUizoqIqRExKSJeI0kULUBjRGyMiE3dLSt2vGZm1aq5uRmAsWPHVmyRv7I8TBcRG9gxS6ngMjMzK88ifx4INjNrJyIqHUJJdOe6nCTMzHL079+fdevW9bpEERGsW7eO/v37d+k4r91kZpZj5MiRrFq1it74cG7//v0ZOXJkl45xkjAzy9GvXz8OPfTQSodRNdzdZGZmmZwkzMwsk5OEmZllcpIwM7NMThJmZpbJScLMzDI5SZiZWSYnCTMzy+QkYWZmmZwkzMwsk5OEmZll8tpNZiXw+vXj89qv7rpnSxyJWWHckjAzs0xOEmZmlslJwszMMhU1SUjaU9LrkprTn/GS5kpaKumanP26XWZmZuVT7JbEUcCvI6IhIhqAMUBNREwCRksaI2l6d8uKHKuZme1GsWc3TQROl9QIPAu8D8xL31sATAaOLqDs5fYVSpoFzAKoq6sr7tWYmfVxxW5J/AdwckQcD/QDpgGr0/fWA8OBgQWU7SIimiKiPiLqa2tri3s1ZmZ9XLGTxPKIeCt93QocAAxItwel9W0uoMzMzMqo2DfeuyRNkFQDnAFcTtJNBDABWAksK6DMzMzKqNhjEtcDdwMCHgIeBBZLOpik62kiEAWUmZlZGRW1JRERz0XEURExPiKujohNQAPQAjRGxMZCyooZq5mZ7V7J126KiA3smKVUcJmZmZWPB4PNzCyTk4SZmWVykjAzs0xOEmZmlslJwszMMjlJmJlZJicJMzPL5CRhZmaZnCTMzCyTk4SZmWVykjAzs0xOEmZmlslJwszMMnUrSUiavPu9zMysp8srSUh6rF3RD0sQi5mZVZlOv09C0lHA0cAISTPS4oHA1lIHZmZmlbe7loQ6+L0O+FLJIjIzs6rRaUsiIp4BnpF0RETcWaaYzMysSuQ7cD1H0jmSZrT9dLazpOGS/pi+nitpqaRrct7vdpmZmZVPvkliPvAJku6mtp/O/AgYIGk6UBMRk4DRksYUUtaN6zMzswJ02t2U4y8RcUM+O0o6CdgCvA00APPStxYAk0kGwrtb9nKe8ZqZWRHk25JYLOnXkqZJmippakc7SdoLuBa4Mi0aCKxOX68HhhdY1lGdsyS1Smpdu3ZtnpdjZmb5yDdJbANeBI4DGklaCB25ErglIv6cbm8GBqSvB6X1FVK2i4hoioj6iKivra3N83LMzCwf+XY3rQSCZCwiOtnvZOAkSZcDnwLqgDeAFmAC8BKwiqTrqDtlZmZWRvkmCUgSxADgM8CfgF2mxEbE9m4oSc3A50m6qg4GpgETSZJMd8vMzKyM8upuiohfpD8/jYgzgA/yOKYhIjaRdE21AI0RsbGQsq5enJmZFSavlkS7geqPAWPzrSAiNrBjllLBZWZmVj75djc1smMs4gPg8tKEY2Zm1STf2U03AmuAoSTjER5ENjPrA/JNEneQPKfwCDAC+HnJIjIzs6qRb3fTIRFxbvr6UUlPlCogMzOrHvkmiTclXQU8BUxix5PQZmbWi+Xb3XQpSUL5IrAJuKRkEZmZWdXIN0n8Eng9Ir4G7EsyRmFmZr1cvkliSET8AiAibgQOKF1IZmZWLfIdk1gl6bvA70kW+XundCGZmVm1yLclcT7wLsmYxHvAeaUKyMzMqkdeLYmIeB+4ucSxmJlZlenKKrDWx71+/fi89qu77tkSR2Jm5ZJvd5OZmfVBbklYn3DizSfmtd+TVzxZ4kjMeha3JMzMLJOThJmZZXKSMDOzTE4SZmaWqSRJQtJQSadI8vIdZmY9WNGThKQhwMPA8cBCSbWS5kpaKumanP26XWZmZuVRipbEUcC3IuIHwKPASUBNREwCRksaI2l6d8tKEK+ZmWUo+nMSEfEEgKSpJK2JocC89O0FwGTg6ALKXi52zGZm1rFSjUkIOBvYAAQ7vsluPcl3ZQ8soKx9XbMktUpqXbt2bfEvxsysDytJkojE5cBy4ARgQPrWoLTOzQWUta+rKSLqI6K+tra2BFdjZtZ3lWLg+ruSZqSb+wN/T9JNBDABWAksK6DMzMzKpBRrNzUB8yRdDDwHPAgsknQwMA2YSNIFtbibZWZmViZFb0lExIaIOCUipkbE1yJiI9AAtACNEbExIjZ1t6zY8ZqZWbayrAIbERvYMUup4DIzMysPLxVu1kf4S6OsO7x2k5mZZXKSMDOzTE4SZmaWyUnCzMwyOUmYmVkmJwkzM8vkKbBmVjGellv93JIwM7NMThJmZpbJScLMzDJ5TMKsgk68+cTd7vPkFU+WIRKzjrklYWZmmZwkzMwsk7ubuiCfrgFw94CZ9R5uSZiZWSYnCTMzy+QkYWZmmZwkzMwsU9GThKT9JD0iaYGkByTtJWmupKWSrsnZr9tlZmZWHqVoSXwF+KeIOBV4GzgHqImIScBoSWMkTe9uWQniNTOzDEWfAhsRt+Rs1gJfBeak2wuAycDRwLxulr2cW5+kWcAsgLq6uuJdiJmZlW5MQtIkYAjwBrA6LV4PDAcGFlC2k4hoioj6iKivra0twZWYmfVdJUkSkoYCNwMXApuBAelbg9I6CykzM7MyKcXA9V7AfcBVEfEasIykmwhgArCywDIzMyuTUizLcRFwDHC1pKuBnwPnSjoYmAZMBAJY3M0yMzMrk1IMXN8K3JpbJukh4BTgpojYmJY1dLfMqpvXuDLrPcqywF9EbGDHLKWCy8zMrDw8EGxmZpmcJMzMLJOThJmZZXKSMDOzTE4SZmaWyUnCzMwyOUmYmVkmJwkzM8vkJGFmZpmcJMzMLJOThJmZZXKSMDOzTE4SZmaWyUnCzMwyOUmYmVkmJwkzM8vkJGFmZpnK8s10ZtZz+OtnLZdbEmZmlqkkSULScEmLc7bnSloq6ZpilJmZWXkUPUlIGgL8AhiYbk8HaiJiEjBa0phCyoodr5mZZStFS+JD4GxgU7rdAMxLXy8AJhdYthNJsyS1Smpdu3Zt8a7CzMyKP3AdEZsAJLUVDQRWp6/XA8cUWNa+viagCaC+vj6KdyVmVi08mF455Ri43gwMSF8PSusspMzMzMqkHDfdZezoJpoArCywzMzMyqQcz0k8CCyWdDAwDZgIRAFlZmZWJiVrSUREQ/p7E8kAdAvQGBEbCykrVbxmZrarsjxxHREb2DFLqeAyMzMrDw8Em5lZJicJMzPL5CRhZmaZnCTMzCyTk4SZmWXq1d8ncezf3pnXfg/sW+JAzMx6KLckzMwsk5OEmZll6tXdTWZmXfX69ePz2q/uumdLHEl1cJIw6wKPc1lf4yRhVSufG7Jvxmal5TEJMzPL5CRhZmaZnCTMzCyTk4SZmWXywLWZWTecePOJee335BVPljiS0nJLwszMMrklYdbD+dkNK6UekSQkzQXGAr+NiBsqHU9v45uMmWWp+iQhaTpQExGTJN0haUxEvFzpuMwsmz94FFcllwqp+iQBNADz0tcLgMmAk4SZdUlfSFylGExXRHQ3nrJIu5r+OSKekXQqcExE/H3O+7OAWenmEcBLRar6AOBPRTpXsTim/FRjTFCdcTmm/PT2mD4eEbUdvdETWhKbgQHp60G0m5EVEU1AU7ErldQaEfXFPm8hHFN+qjEmqM64HFN++nJMPWEK7DKSLiaACcDKyoViZta39ISWxIPAYkkHA9OAiZUNx8ys76j6lkREbCIZvG4BGiNiY5mqLnoXVhE4pvxUY0xQnXE5pvz02ZiqfuDazMwqp+pbEmZdIWmopFMkHVDpWMx6AyeJDJKGS1pc6TjaVGE8+0l6RNICSQ9I2qsKYhoCPAwcDyyU1OGUvkpI//v9sdJxAEjaU9LrkprTn/ye1CoTSbdI+lyl4wCQdFnOv9PTkm6rgpiGSPqdpNZyxOMk0YH0ZvMLYGClY4Hqiyf1FeCfIuJU4G3gMxWOB+Ao4FsR8QPgUeCYCseT60fsmMpdaUcBv46IhvSn+I/pdpOkKcCBEfFvlY4FICJubft3AhYDP6twSADnAr9Kp7/uK6mk02CdJDr2IXA2sKnSgaSqLR4i4paIeCzdrAXeqWQ8ABHxRES0SJpK0ppYWumYACSdBGwhSabVYCJwuqTfS5orqSpmOUrqR3ITXinpC5WOJ5ekEcDwiGitdCzAOmCcpP2BQ4A3SlmZkwQg6bacJmUz8M0yzqLarYjYVE3x5JI0CRgSES2VjgVAkkgS6gZgW4XDIe2Guxa4stKx5PgP4OSIOB7oB/zXCsfTZgbwAnATcLykKyocT67LgVsrHURqCfBx4OvAfwLrS1mZkwQQEZfkNL0bIuL6SsfUE0gaCtwMXFjpWNpE4nJgOfD5SsdDkhxuiYg/VzqQHMsj4q30dSswppLB5DgaaIqIt4FfAo0VjgcASXuQxNJc4VDa/B1waXqfehG4oJSVOUlYt6SfkO8DroqI1yodD4Ck70qakW7uD/y5ctFsdzJwedpC/ZSk2yscD8BdkiZIqgHOAJ6pcDxtVgCj09f1QFX8fwVMAZ6K6nleYAgwPv3v92mgpHH5OYlOSGpOB6yqQjXFI+ky4EZ23GBujYh7KxhS2wD/PGBv4Dng8ir6w66a/36SxgF3AwIeioirKxwSAJL2Be4AhpN0g30xIlZXNiqQdCPQGhH3VzoWAEnHAz8n6XJaCpwZEZtLVl8V/Q2ZmVmVcXeTmZllcpIwM7NMThJmZpbJScLMzDI5SZiZWSYnCTMzy1QVa7aYVRNJg4B/IVlQcQXwNeB+YCjw/0iewZgD3Al8DHg2fco763zNJEthHBURp7U/f0RcIGk2ybMBU4DBJAsmbiykXrNicEvCbFcHkSw3cjIwimTV1FUk37X+iYi4EZgFPBcRU4GDJB3VyfkmAksj4rSOzi9peFr+ifR89wMnAUcWWK9ZwZwkzHa1DbgY+BXJp/gAjgUWAT9O9zkCODNtJYwGRnRyvufaPa3b/vxtS4jfmf5+HdgLWF1gvWYFc3eT2a4uIukOmgc8QdL18/2IeCBnn5eA30fEzyWdTnJjz9J+yYT252+zpd1+hdZrVjC3JMx29RhwFfB4uv0EcLOkxyXdk6599DNgmqRFwKV0bU3/9ufPag38scj1mnWZ124y2w1JM4Evk3QTbQN+FBHNvbVes1xOEmZFko4T5NoYEVX1DWtmXeUkYWZmmTwmYWZmmZwkzMwsk5OEmZllcpIwM7NMThJmZpbp/wODLB+ztKmDhgAAAABJRU5ErkJggg==\n", 197 | "text/plain": [ 198 | "
" 199 | ] 200 | }, 201 | "metadata": { 202 | "needs_background": "light" 203 | }, 204 | "output_type": "display_data" 205 | } 206 | ], 207 | "source": [ 208 | "sns.countplot(x='age_range',order=[-1,1,2,3,4,5,6,7,8],hue='gender',data=user_info)\n", 209 | "plt.title('用户年龄-性别分布')\n", 210 | "'''\n", 211 | "1.年龄空值的比较多,性别空值的少\n", 212 | "2.年龄主要在18-39之间\n", 213 | "3.大多数是女性\n", 214 | "'''" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": 8, 220 | "id": "dc6ddb76", 221 | "metadata": {}, 222 | "outputs": [], 223 | "source": [ 224 | "#特征值合并\n", 225 | "\n", 226 | "df_train = pd.merge(df_train,user_info,on=\"user_id\",how=\"left\")\n", 227 | " \n", 228 | "total_logs_temp = user_log.groupby([user_log[\"user_id\"],user_log[\"seller_id\"]])[\"item_id\"].count().reset_index()\n", 229 | " \n", 230 | "total_logs_temp.rename(columns={\"seller_id\":\"merchant_id\",\"item_id\":\"total_item_id\"},inplace=True)\n", 231 | " \n", 232 | "df_train = pd.merge(df_train,total_logs_temp,on=[\"user_id\",\"merchant_id\"],how=\"left\")\n", 233 | " \n", 234 | "unique_item_id = user_log.groupby([\"user_id\",\"seller_id\",\"item_id\"]).count().reset_index()[[\"user_id\",\"seller_id\",\"item_id\"]]\n", 235 | " \n", 236 | "unique_item_id_cnt = unique_item_id.groupby([\"user_id\",\"seller_id\"]).count().reset_index()\n", 237 | " \n", 238 | "unique_item_id_cnt.rename(columns={\"seller_id\":\"merchant_id\",\"item_id\":\"unique_item_id\"},inplace=True)\n", 239 | " \n", 240 | "df_train = pd.merge(df_train, unique_item_id_cnt, on=[\"user_id\", \"merchant_id\"], how=\"left\")\n", 241 | " \n", 242 | "cat_id_temp = user_log.groupby([\"user_id\", \"seller_id\", \"cat_id\"]).count().reset_index()[[\"user_id\", \"seller_id\", \"cat_id\"]]\n", 243 | " \n", 244 | "cat_id_temp_cnt = cat_id_temp.groupby([\"user_id\", \"seller_id\"]).count().reset_index()\n", 245 | " \n", 246 | "cat_id_temp_cnt.rename(columns={\"seller_id\":\"merchant_id\",\"cat_id\":\"total_cat_id\"},inplace=True)\n", 247 | " \n", 248 | "df_train = pd.merge(df_train, cat_id_temp_cnt, on=[\"user_id\", \"merchant_id\"], how=\"left\")\n", 249 | " \n", 250 | "time_temp = user_log.groupby([\"user_id\", \"seller_id\", \"time_stamp\"]).count().reset_index()[[\"user_id\", \"seller_id\", \"time_stamp\"]]\n", 251 | " \n", 252 | "time_temp_cnt = time_temp.groupby([\"user_id\", \"seller_id\"]).count().reset_index()\n", 253 | " \n", 254 | "time_temp_cnt.rename(columns={\"seller_id\":\"merchant_id\",\"time_stamp\":\"total_time_temp\"},inplace=True)\n", 255 | " \n", 256 | "df_train = pd.merge(df_train, time_temp_cnt, on=[\"user_id\", \"merchant_id\"], how=\"left\")\n", 257 | " \n", 258 | "click_temp = user_log.groupby([\"user_id\", \"seller_id\", \"action_type\"])[\"item_id\"].count().reset_index()\n", 259 | " \n", 260 | "click_temp.rename(columns={\"seller_id\":\"merchant_id\",\"item_id\":\"times\"},inplace=True)\n", 261 | " \n", 262 | "click_temp[\"clicks\"] = click_temp[\"action_type\"] == 0\n", 263 | " \n", 264 | "click_temp[\"clicks\"] = click_temp[\"clicks\"] * click_temp[\"times\"]\n", 265 | " \n", 266 | "click_temp[\"shopping_cart\"] = click_temp[\"action_type\"] == 1\n", 267 | "click_temp[\"shopping_cart\"] = click_temp[\"shopping_cart\"] * click_temp[\"times\"]\n", 268 | " \n", 269 | "click_temp[\"purchases\"] = click_temp[\"action_type\"] == 2\n", 270 | "click_temp[\"purchases\"] = click_temp[\"purchases\"] * click_temp[\"times\"]\n", 271 | " \n", 272 | "click_temp[\"favourites\"] = click_temp[\"action_type\"] == 3\n", 273 | "click_temp[\"favourites\"] = click_temp[\"favourites\"] * click_temp[\"times\"]\n", 274 | " \n", 275 | "four_features = click_temp.groupby([\"user_id\", \"merchant_id\"]).sum().reset_index()\n", 276 | " \n", 277 | "#删除相关列\n", 278 | "four_features = four_features.drop([\"action_type\", \"times\"], axis=1)\n", 279 | " \n", 280 | "#合并\n", 281 | "df_train = pd.merge(df_train, four_features, on=[\"user_id\", \"merchant_id\"], how=\"left\")\n", 282 | " \n", 283 | "#缺失值向前填充\n", 284 | "df_train = df_train.fillna(method=\"ffill\")\n", 285 | " " 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "execution_count": 9, 291 | "id": "c0467512", 292 | "metadata": {}, 293 | "outputs": [], 294 | "source": [ 295 | "# user_info['age_range'].replace(np.nan,1,inplace=True)\n", 296 | "# user_info['gender'].replace(np.nan,0,inplace=True)\n", 297 | "# df_train['age_range'].replace(-1,np.nan,inplace=True)\n", 298 | "# df_train['gender'].replace(-1,np.nan,inplace=True)" 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": 10, 304 | "id": "91ed1463", 305 | "metadata": {}, 306 | "outputs": [ 307 | { 308 | "data": { 309 | "text/html": [ 310 | "
\n", 311 | "\n", 324 | "\n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | "
user_idmerchant_idlabelage_rangegendertotal_item_idunique_item_idtotal_cat_idtotal_time_tempclicksshopping_cartpurchasesfavourites
034176390606.00.039206936012
13417612106.00.01411313010
234176435616.00.01821212060
334176221706.00.021111010
423078448180-1.00.081137010
..........................................
260859359807432504.01.02062118020
26086029452739710-1.01.01731213013
2608612945271520-1.01.091117011
26086229452725370-1.01.011110010
260863229247414004.0-1.024151223010
\n", 522 | "

260864 rows × 13 columns

\n", 523 | "
" 524 | ], 525 | "text/plain": [ 526 | " user_id merchant_id label age_range gender total_item_id \\\n", 527 | "0 34176 3906 0 6.0 0.0 39 \n", 528 | "1 34176 121 0 6.0 0.0 14 \n", 529 | "2 34176 4356 1 6.0 0.0 18 \n", 530 | "3 34176 2217 0 6.0 0.0 2 \n", 531 | "4 230784 4818 0 -1.0 0.0 8 \n", 532 | "... ... ... ... ... ... ... \n", 533 | "260859 359807 4325 0 4.0 1.0 20 \n", 534 | "260860 294527 3971 0 -1.0 1.0 17 \n", 535 | "260861 294527 152 0 -1.0 1.0 9 \n", 536 | "260862 294527 2537 0 -1.0 1.0 1 \n", 537 | "260863 229247 4140 0 4.0 -1.0 24 \n", 538 | "\n", 539 | " unique_item_id total_cat_id total_time_temp clicks shopping_cart \\\n", 540 | "0 20 6 9 36 0 \n", 541 | "1 1 1 3 13 0 \n", 542 | "2 2 1 2 12 0 \n", 543 | "3 1 1 1 1 0 \n", 544 | "4 1 1 3 7 0 \n", 545 | "... ... ... ... ... ... \n", 546 | "260859 6 2 1 18 0 \n", 547 | "260860 3 1 2 13 0 \n", 548 | "260861 1 1 1 7 0 \n", 549 | "260862 1 1 1 0 0 \n", 550 | "260863 15 1 2 23 0 \n", 551 | "\n", 552 | " purchases favourites \n", 553 | "0 1 2 \n", 554 | "1 1 0 \n", 555 | "2 6 0 \n", 556 | "3 1 0 \n", 557 | "4 1 0 \n", 558 | "... ... ... \n", 559 | "260859 2 0 \n", 560 | "260860 1 3 \n", 561 | "260861 1 1 \n", 562 | "260862 1 0 \n", 563 | "260863 1 0 \n", 564 | "\n", 565 | "[260864 rows x 13 columns]" 566 | ] 567 | }, 568 | "execution_count": 10, 569 | "metadata": {}, 570 | "output_type": "execute_result" 571 | } 572 | ], 573 | "source": [ 574 | "# print(df_train.shape)\n", 575 | "# df_train_dropnan=df_train.dropna(axis=0,how='any')\n", 576 | "# df_train_dropnan.shape\n", 577 | "df_train" 578 | ] 579 | }, 580 | { 581 | "cell_type": "code", 582 | "execution_count": 11, 583 | "id": "becb8596", 584 | "metadata": {}, 585 | "outputs": [], 586 | "source": [ 587 | "#将构建好的特征保存\n", 588 | "df_train.to_csv(\"df_train.csv\",index=None)" 589 | ] 590 | }, 591 | { 592 | "cell_type": "code", 593 | "execution_count": null, 594 | "id": "9aeeb576", 595 | "metadata": {}, 596 | "outputs": [], 597 | "source": [] 598 | } 599 | ], 600 | "metadata": { 601 | "kernelspec": { 602 | "display_name": "Python 3", 603 | "language": "python", 604 | "name": "python3" 605 | }, 606 | "language_info": { 607 | "codemirror_mode": { 608 | "name": "ipython", 609 | "version": 3 610 | }, 611 | "file_extension": ".py", 612 | "mimetype": "text/x-python", 613 | "name": "python", 614 | "nbconvert_exporter": "python", 615 | "pygments_lexer": "ipython3", 616 | "version": "3.8.8" 617 | } 618 | }, 619 | "nbformat": 4, 620 | "nbformat_minor": 5 621 | } 622 | --------------------------------------------------------------------------------