├── .gitignore ├── README.md ├── _config.yml ├── model_evaluation ├── rnn_v1 │ ├── rnn_v1_sku_level_test_roc.png │ ├── rnn_v1_sku_level_train_roc.png │ ├── rnn_v1_user_level_test_roc.png │ └── rnn_v1_user_level_train_roc.png ├── rnn_v2 │ ├── rnn_v2_sku_level_test_roc.png │ ├── rnn_v2_sku_level_train_roc.png │ ├── rnn_v2_user_level_test_roc.png │ ├── rnn_v2_user_level_train_roc.png │ ├── score.csv │ └── upload.csv ├── rnn_v3 │ ├── sku_step_result.png │ ├── sku_step_result_head300.png │ ├── user_step_result.png │ └── user_step_result_head300.png └── rule_v01 │ ├── model_sku_id_score_v01.py │ ├── model_sku_id_train_v01.py │ ├── rule_user_id_score_v01.py │ ├── rule_user_id_train_v01.py │ └── submit_sw_20170513_rule_D2.csv ├── prof ├── prof_action.txt ├── prof_comment.txt ├── prof_product.txt └── prof_user.txt ├── rnn.py ├── run.py ├── signal_generation ├── MASTER_creation_v1.sql ├── SKU_signal_creation_v2.sql ├── SKU_signal_creation_v3.sql ├── SKU_signal_creation_v3_0201_0415.sql ├── SKU_signal_creation_v4_0201_0415.sql ├── USER_SKU_signal_creation_v1.sql ├── USER_SKU_signal_creation_v2_0201_0408.sql ├── create_tbl_server.sql ├── create_tbl_server_train_action_x_v2.sql ├── create_tbl_server_train_action_x_v3.sql ├── file_merging_master_for_application.py ├── gbdtmodel_for_application.py ├── master_x_for_application.py └── model_master.py └── submission ├── submit_20170503_sw_0.02069.csv ├── submit_20170504__sw_0.03168.csv ├── submit_20170512_gny_0.04036.csv ├── submit_20170512_sw_0.00549.csv ├── submit_20170513_sw_0.05148.csv ├── submit_20170518_sw_0.079.csv ├── submit_20170518_sw_0.09218.csv ├── submit_20170519_sw_0.09512.csv └── 提交结果 /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | *.swp 3 | data 4 | temp 5 | model 6 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 竞赛概述 2 | 3 | 本次大赛以京东商城真实的用户、商品和行为数据(脱敏后)为基础,参赛队伍需要通过数据挖掘的技术和机器学习的算法,构建用户购买商品的预测模型,输出高潜用户和目标商品的匹配结果,为精准营销提供高质量的目标群体。同时,希望参赛队伍能通过本次比赛,挖掘数据背后潜在的意义,为电商用户提供更简单、快捷、省心的购物体验。 4 | 5 | [详情](http://www.datafountain.cn/projects/jdata/) 6 | 7 | # 数据介绍 8 | 9 | ## 符号定义 10 | S:提供的商品全集 11 | P:候选的商品子集(JData_Product.csv),P是S的子集 12 | U:用户集合 13 | A:用户对S的行为数据集合 14 | C:S的评价数据 15 | 16 | ## 训练数据部分 17 | 提供2016-02-01到2016-04-15日用户集合U中的用户,对商品集合S中部分商品的行为、评价、用户数据;提供部分候选商品的数据P。 18 | 选手从数据中自行组成特征和数据格式,自由组合训练测试数据比例。 19 | 20 | ## 预测数据部分 21 | 2016-04-16到2016-04-20用户是否下单P中的商品,每个用户只会下单一个商品; 22 | 抽取部分下单用户数据,A榜使用50%的测试数据来计算分数; 23 | B榜使用另外50%的数据计算分数(计算准确率时剔除用户提交结果中user_Id与A榜的交集部分)。 24 | 25 | 为保护用户的隐私和数据安全,所有数据均已进行了采样和脱敏。 26 | 数据中部分列存在空值或NULL,请参赛者自行处理。 27 | 28 | ## 数据结构 29 | 30 | 1. 用户数据 31 | 32 | |字段名称   |  意义 |      备注 | 33 | | ----- | ----- | ----- | 34 | |user_id | 用户ID | 脱敏 | 35 | |age | 年龄段 | -1表示未知 | 36 | |sex | 性别 | 0表示男,1表示女,2表示保密 | 37 | |user_lv_cd | 用户等级 | 有顺序的级别枚举,越高级别数字越大 | 38 | |user_reg_tm| 用户注册日期| 粒度到天 | 39 | 40 | 2. 商品数据 41 | 42 | |字段名称   |  意义 |      备注 | 43 | | ----- | ----- | ----- | 44 | |sku_id | 商品编号 | 脱敏 | 45 | |a1 | 属性1 | 枚举,-1表示未知 | 46 | |a2 | 属性2 | 枚举,-1表示未知 | 47 | |a3 | 属性3 | 枚举,-1表示未知 | 48 | |cate | 品类ID | 脱敏 | 49 | |brand | 品牌ID | 脱敏 | 50 | 51 | 3. 评价数据 52 | 53 | |字段名称   |  意义 |      备注 | 54 | | ----- | ----- | ----- | 55 | |dt | 截止到时间| 粒度到天 | 56 | |sku_id | 商品编号 | 脱敏 | 57 | |comment_num | 累计评论数分段| 0表示无评论,1表示有1条评论,2表示有2-10条评论,3表示有11-50条评论,4表示大于50条评论 | 58 | |has_bad_comment | 是否有差评 | 0表示无,1表示有 | 59 | |bad_comment_rate| 差评率 | 差评数占总评论数的比重 | 60 | 61 | 4. 行为数据 62 | 63 | |字段名称   |  意义 |      备注 | 64 | | ----- | ----- | ----- | 65 | |user_id| 用户编号 | 脱敏 | 66 | |sku_id | 商品编号 | 脱敏 | 67 | |time | 行为时间 | | 68 | |model_id| 点击模块编号,如果是点击 | 脱敏 | 69 | |type   | 行为类型 | 1.浏览(指浏览商品详情页);2.加入购物车;3.购物车删除;4.下单;5.关注;6.点击;| 70 | |cate | 品类ID | 脱敏 | 71 | |brand | 品牌ID | 脱敏 | 72 | 73 | # 模型评分 74 | 75 | 参赛者需要使用京东多个品类下商品的历史销售数据,构建算法模型,预测用户在未来5天内,对某个目标品类下商品的购买意向。对于训练集中出现的每一个用户,参赛者的模型需要预测该用户在未来5天内是否购买目标品类下的商品以及所购买商品的SKU_ID。评测算法将针对参赛者提交的预测结果,计算加权得分。 76 | 77 | 参赛者提交的结果文件中包含对所有用户购买意向的预测结果。对每一个用户的预测结果包括两方面: 78 | 1. 该用户2016-04-16到2016-04-20是否下单P中的商品,提交的结果文件中仅包含预测为下单的用户,预测为未下单的用户,无须在结果中出现。若预测正确,则评测算法中置label=1,不正确label=0; 79 | 2. 如果下单,下单的sku_id (只需提交一个sku_id),若sku_id预测正确,则评测算法中置pred=1,不正确pred=0。 80 | 81 | 对于参赛者提交的结果文件,按如下公式计算得分: 82 | 83 | ```math 84 | Score = 0.4*F11 + 0.6*F12 85 | ``` 86 | 87 | 此处的F1值定义为: 88 | ```math 89 | F11 = 6*Recall*Precise/(5*Recall+Precise) // note: Precise更重要 90 | F12 = 5*Recall*Precise/(2*Recall+3*Precise) // note: Recall更重要 91 | ``` 92 | 其中,Precise为准确率,Recall为召回率. 93 | F11是label=1或0的F1值,F12是pred=1或0的F1值. 94 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-slate -------------------------------------------------------------------------------- /model_evaluation/rnn_v1/rnn_v1_sku_level_test_roc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/guangningyu/JD-prediction/836dadf02ecfff08abe00a4ec74abfb8b3898019/model_evaluation/rnn_v1/rnn_v1_sku_level_test_roc.png -------------------------------------------------------------------------------- /model_evaluation/rnn_v1/rnn_v1_sku_level_train_roc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/guangningyu/JD-prediction/836dadf02ecfff08abe00a4ec74abfb8b3898019/model_evaluation/rnn_v1/rnn_v1_sku_level_train_roc.png -------------------------------------------------------------------------------- /model_evaluation/rnn_v1/rnn_v1_user_level_test_roc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/guangningyu/JD-prediction/836dadf02ecfff08abe00a4ec74abfb8b3898019/model_evaluation/rnn_v1/rnn_v1_user_level_test_roc.png -------------------------------------------------------------------------------- /model_evaluation/rnn_v1/rnn_v1_user_level_train_roc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/guangningyu/JD-prediction/836dadf02ecfff08abe00a4ec74abfb8b3898019/model_evaluation/rnn_v1/rnn_v1_user_level_train_roc.png -------------------------------------------------------------------------------- /model_evaluation/rnn_v2/rnn_v2_sku_level_test_roc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/guangningyu/JD-prediction/836dadf02ecfff08abe00a4ec74abfb8b3898019/model_evaluation/rnn_v2/rnn_v2_sku_level_test_roc.png -------------------------------------------------------------------------------- /model_evaluation/rnn_v2/rnn_v2_sku_level_train_roc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/guangningyu/JD-prediction/836dadf02ecfff08abe00a4ec74abfb8b3898019/model_evaluation/rnn_v2/rnn_v2_sku_level_train_roc.png -------------------------------------------------------------------------------- /model_evaluation/rnn_v2/rnn_v2_user_level_test_roc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/guangningyu/JD-prediction/836dadf02ecfff08abe00a4ec74abfb8b3898019/model_evaluation/rnn_v2/rnn_v2_user_level_test_roc.png -------------------------------------------------------------------------------- /model_evaluation/rnn_v2/rnn_v2_user_level_train_roc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/guangningyu/JD-prediction/836dadf02ecfff08abe00a4ec74abfb8b3898019/model_evaluation/rnn_v2/rnn_v2_user_level_train_roc.png -------------------------------------------------------------------------------- /model_evaluation/rnn_v2/upload.csv: -------------------------------------------------------------------------------- 1 | user_id,sku_id 2 | 294887,9702 3 | 231770,154636 4 | 241651,154636 5 | 248451,154636 6 | 255953,44854 7 | 265610,79520 8 | 246664,79520 9 | 251262,79520 10 | 207174,31662 11 | 225855,31662 12 | 289558,9702 13 | 296877,9702 14 | 231803,154636 15 | 270311,154636 16 | 301809,14433 17 | 220224,154636 18 | 223578,154636 19 | 258197,14433 20 | 303874,14433 21 | 235372,140807 22 | 279040,14433 23 | 213485,68767 24 | 246036,103652 25 | 284432,154636 26 | 244431,14433 27 | 207336,126146 28 | 231980,103652 29 | 238004,59175 30 | 218616,149641 31 | 277618,103652 32 | 250305,103652 33 | 290259,149641 34 | 206879,149641 35 | 201646,145946 36 | 230571,103652 37 | 243466,103652 38 | 283601,103652 39 | 263598,154636 40 | 200288,154636 41 | 230471,154636 42 | 276682,154636 43 | 260329,154636 44 | 245993,166707 45 | 297250,154636 46 | 262513,59175 47 | 268564,154636 48 | 274553,57018 49 | 304141,75877 50 | 227840,152478 51 | 304478,149641 52 | 208048,149641 53 | 243771,154636 54 | 257134,149641 55 | 207552,56792 56 | 218664,152478 57 | 210325,149641 58 | 298513,154636 59 | 207534,149641 60 | 302166,154636 61 | 302853,149641 62 | 257305,103652 63 | 290074,103652 64 | 236598,154636 65 | 230285,166354 66 | 294696,57018 67 | 243238,57018 68 | 202340,57018 69 | 297264,57018 70 | 236190,166354 71 | 215604,57018 72 | 246349,57018 73 | 282091,123773 74 | 273530,12564 75 | 299869,154636 76 | 279436,154636 77 | 207519,79520 78 | 267625,116489 79 | 207511,149641 80 | 297622,149641 81 | 250690,126146 82 | 303799,79520 83 | 251755,116489 84 | 284242,57018 85 | 270664,5505 86 | 205066,56792 87 | 281649,149641 88 | 299117,152478 89 | 245036,79520 90 | 279943,56792 91 | 207554,152478 92 | 271234,152478 93 | 207545,149641 94 | 275530,56792 95 | 200808,152478 96 | 270787,152478 97 | 244552,56792 98 | 287249,56792 99 | 248490,152478 100 | 215003,152478 101 | 295210,152478 102 | 229671,56792 103 | 284663,56792 104 | 252238,152478 105 | 223125,152478 106 | 286741,152478 107 | 277966,152478 108 | 217924,152478 109 | -------------------------------------------------------------------------------- /model_evaluation/rnn_v3/sku_step_result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/guangningyu/JD-prediction/836dadf02ecfff08abe00a4ec74abfb8b3898019/model_evaluation/rnn_v3/sku_step_result.png -------------------------------------------------------------------------------- /model_evaluation/rnn_v3/sku_step_result_head300.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/guangningyu/JD-prediction/836dadf02ecfff08abe00a4ec74abfb8b3898019/model_evaluation/rnn_v3/sku_step_result_head300.png -------------------------------------------------------------------------------- /model_evaluation/rnn_v3/user_step_result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/guangningyu/JD-prediction/836dadf02ecfff08abe00a4ec74abfb8b3898019/model_evaluation/rnn_v3/user_step_result.png -------------------------------------------------------------------------------- /model_evaluation/rnn_v3/user_step_result_head300.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/guangningyu/JD-prediction/836dadf02ecfff08abe00a4ec74abfb8b3898019/model_evaluation/rnn_v3/user_step_result_head300.png -------------------------------------------------------------------------------- /model_evaluation/rule_v01/model_sku_id_score_v01.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # @Author: shu.wen 4 | # @Date: 2016-05-30 15:31:15 5 | # @Last Modified by: shu.wen 6 | # @Last Modified time: 2017-05-12 00:37:06 7 | # 打分数据 /root/data/rule/RULE_USER_SKU_ACTION_SCORE_TBL.csv 8 | __author__ = 'shu.wen' 9 | 10 | import os 11 | import sys 12 | import csv 13 | import time 14 | import glob 15 | import pandas 16 | import math 17 | import json 18 | import numpy as np 19 | # import pylab as pl 20 | from sklearn import svm 21 | from sklearn.metrics import precision_score 22 | from sklearn.metrics import recall_score 23 | from sklearn.metrics import f1_score 24 | from sklearn.metrics import classification_report 25 | from sklearn.externals import joblib 26 | from sklearn.cross_validation import train_test_split 27 | from sklearn.cross_validation import KFold 28 | from sklearn.metrics import confusion_matrix 29 | from sklearn.metrics import accuracy_score 30 | from sklearn.metrics import mean_squared_error 31 | from sklearn.metrics import roc_curve 32 | from sklearn.metrics import auc 33 | from sklearn.metrics import mean_squared_error 34 | from sklearn.feature_extraction import DictVectorizer 35 | from sklearn.feature_selection import f_regression 36 | import re 37 | import pandas 38 | from sklearn import tree 39 | from sklearn import ensemble 40 | import random 41 | from sklearn.externals.six import StringIO 42 | # import matplotlib.pyplot as plt 43 | from sklearn.linear_model import LogisticRegression 44 | from collections import Counter 45 | from sklearn.externals import joblib 46 | from sklearn.tree import DecisionTreeClassifier 47 | from math import isnan 48 | from sklearn.preprocessing import StandardScaler 49 | from sklearn.svm import SVC 50 | from sklearn.ensemble import GradientBoostingClassifier 51 | from sklearn.ensemble import RandomForestClassifier 52 | from sklearn.ensemble import RandomForestRegressor 53 | from sklearn.cross_validation import KFold 54 | from sklearn.ensemble import GradientBoostingClassifier 55 | import matplotlib.pyplot as plt 56 | #import feature_trt_v5 57 | 58 | 59 | if __name__ == '__main__': 60 | 61 | # step_01 读入数据集 62 | master_table_address = r'C:\shu.wen\opera\00_projects\99_competition\01_JData\data\rule\RULE_USER_SKU_ACTION_SCORE_TBL.csv' 63 | master_df = pandas.read_csv(master_table_address, sep='|', header=0) 64 | 65 | 66 | print 'row_cnt: %d'%(master_df.shape[0]) 67 | print 68 | 69 | # step_02 fill na 70 | 71 | master_df = master_df.fillna(-1) 72 | 73 | # step_02 load model 74 | lr = joblib.load(r'C:\shu.wen\opera\00_projects\99_competition\01_JData\data\rule\lr.model') 75 | # step_03 dummy coding 76 | 77 | feature_set = [ 78 | "USER_CATE_8_SKU_TYPE_5_CNT_15", 79 | #"USER_CATE_8_SKU_TYPE_5_RATE_15", 80 | "USER_CATE_8_SKU_SKU_CNT_3", 81 | "USER_CATE_8_SKU_HOUR_13_18_SECTION_3", 82 | "USER_CATE_8_SKU_WEEK_05_RATE_7", 83 | "USER_CATE_8_SKU_HOUR_13_18_CNT_3", 84 | "USER_CATE_8_SKU_HOUR_13_18_SECTION_CNT_7", 85 | #"USER_CATE_8_SKU_WEEK_03_CNT_15", 86 | #"USER_CATE_8_SKU_TYPE_6_RATE_7", 87 | #"USER_CATE_8_SKU_SKU_TYPE_1_RATE_3", 88 | 89 | ] 90 | 91 | master_df['Y_pre'] = lr.predict_proba(master_df[feature_set].values)[:,1] 92 | master_df_cut = master_df[['USER_ID', 'SKU_ID','Y_pre']].sort(columns =['USER_ID', 'Y_pre', 'SKU_ID', ], ascending=[1,0,1]) 93 | 94 | master_df_cut.drop_duplicates(subset=['USER_ID'], keep = 'first', inplace = True ) 95 | master_df_cut[['USER_ID', 'SKU_ID']].to_csv(r'C:\shu.wen\opera\00_projects\99_competition\01_JData\data\rule\rule_submit.csv', sep=',', index=False ) 96 | print 'done' 97 | -------------------------------------------------------------------------------- /model_evaluation/rule_v01/model_sku_id_train_v01.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # @Author: shu.wen 4 | # @Date: 2016-05-30 15:31:15 5 | # @Last Modified by: shu.wen 6 | # @Last Modified time: 2017-05-12 00:37:06 7 | # 训练数据 /root/data/rule/RULE_USER_SKU_ACTION_MST_TBL_v2.csv 8 | __author__ = 'shu.wen' 9 | 10 | import os 11 | import sys 12 | import csv 13 | import time 14 | import glob 15 | import pandas 16 | import math 17 | import json 18 | import numpy as np 19 | # import pylab as pl 20 | from sklearn import svm 21 | from sklearn.metrics import precision_score 22 | from sklearn.metrics import recall_score 23 | from sklearn.metrics import f1_score 24 | from sklearn.metrics import classification_report 25 | from sklearn.externals import joblib 26 | from sklearn.cross_validation import train_test_split 27 | from sklearn.cross_validation import KFold 28 | from sklearn.metrics import confusion_matrix 29 | from sklearn.metrics import accuracy_score 30 | from sklearn.metrics import mean_squared_error 31 | from sklearn.metrics import roc_curve 32 | from sklearn.metrics import auc 33 | from sklearn.metrics import mean_squared_error 34 | from sklearn.feature_extraction import DictVectorizer 35 | from sklearn.feature_selection import f_regression 36 | import re 37 | import pandas 38 | from sklearn import tree 39 | from sklearn import ensemble 40 | import random 41 | from sklearn.externals.six import StringIO 42 | # import matplotlib.pyplot as plt 43 | from sklearn.linear_model import LogisticRegression 44 | from collections import Counter 45 | from sklearn.externals import joblib 46 | from sklearn.tree import DecisionTreeClassifier 47 | from math import isnan 48 | from sklearn.preprocessing import StandardScaler 49 | from sklearn.svm import SVC 50 | from sklearn.ensemble import GradientBoostingClassifier 51 | from sklearn.ensemble import RandomForestClassifier 52 | from sklearn.ensemble import RandomForestRegressor 53 | from sklearn.cross_validation import KFold 54 | from sklearn.ensemble import GradientBoostingClassifier 55 | import matplotlib.pyplot as plt 56 | #import feature_trt_v5 57 | 58 | 59 | if __name__ == '__main__': 60 | 61 | # step_01 读入数据集 62 | master_table_address = r'C:\shu.wen\opera\00_projects\99_competition\01_JData\data\rule\RULE_USER_SKU_ACTION_MST_TBL_v2.csv' 63 | master_df = pandas.read_csv(master_table_address, sep=',', header=0) 64 | 65 | 66 | print 'row_cnt: %d, trt_rt: %.4f'%(master_df.shape[0], master_df.Y.sum()*1.0/master_df.shape[0]) 67 | print 68 | 69 | # step_02 fill na 70 | 71 | master_df['SKU_A1'] = master_df['SKU_A1'].apply(int).apply(str) 72 | master_df['SKU_A2'] = master_df['SKU_A2'].apply(int).apply(str) 73 | master_df['SKU_A3'] = master_df['SKU_A3'].apply(int).apply(str) 74 | master_df = master_df.fillna(-1) 75 | 76 | 77 | # step_03 dummy coding 78 | 79 | feature_set = [ 80 | "USER_CATE_8_SKU_TYPE_5_CNT_15", 81 | #"USER_CATE_8_SKU_TYPE_5_RATE_15", 82 | "USER_CATE_8_SKU_SKU_CNT_3", 83 | "USER_CATE_8_SKU_HOUR_13_18_SECTION_3", 84 | "USER_CATE_8_SKU_WEEK_05_RATE_7", 85 | "USER_CATE_8_SKU_HOUR_13_18_CNT_3", 86 | "USER_CATE_8_SKU_HOUR_13_18_SECTION_CNT_7", 87 | #"USER_CATE_8_SKU_WEEK_03_CNT_15", 88 | #"USER_CATE_8_SKU_TYPE_6_RATE_7", 89 | #"USER_CATE_8_SKU_SKU_TYPE_1_RATE_3", 90 | 91 | ] 92 | 93 | vec = DictVectorizer(sparse = False) 94 | master_df_vec = vec.fit_transform(master_df[feature_set].to_dict(orient = 'record')) 95 | 96 | 97 | # vec.get_feature_names() 98 | 99 | # step_02 划分数据集 100 | 101 | X_train, X_test, y_train, y_test = train_test_split(master_df_vec, master_df['Y'], test_size=0.5, random_state=int(time.time())) 102 | 103 | # scaler = StandardScaler().fit(X_train_raw) 104 | # X_train = scaler.transform(X_train_raw) 105 | # X_test = scaler.transform(X_test_raw) 106 | 107 | 108 | print u'训练集行数:%d, target rate: %f'%(X_train.shape[0], sum(y_train)*1.0/X_train.shape[0]) 109 | print u'测试集行数:%d, target rate: %f'%(X_test.shape[0], sum(y_test)*1.0/X_test.shape[0]) 110 | 111 | # step_02 model 112 | 113 | #clf = GradientBoostingClassifier(loss='exponential', learning_rate = 0.001, n_estimators = 2, subsample = 1.0, criterion='friedman_mse', max_depth= 3, max_features="log2", presort='auto') 114 | clf = LogisticRegression(penalty = 'l1', tol = 0.001, max_iter = 1000, C = 1.0, class_weight = {1:30, 0:1},) 115 | 116 | clf = clf.fit(X_train, y_train) 117 | joblib.dump(clf, r'C:\shu.wen\opera\00_projects\99_competition\01_JData\data\rule\lr.model') 118 | 119 | 120 | #feature_importances = zip(vec.get_feature_names(), clf.feature_importances_.tolist()) 121 | #feature_importances = [ item for item in feature_importances] 122 | #feature_importances.sort(key=lambda x:x[1], reverse = True) 123 | #feature_importances_df = pandas.DataFrame(feature_importances) 124 | #feature_importances_df.columns = ['feature', 'importances'] 125 | #feature_importances_df.to_csv(r'C:\shu.wen\opera\00_projects\99_competition\01_JData\data\rule\feature_importances_part_2.csv', sep=',', index=False ) 126 | 127 | 128 | print 129 | y_train_pre = clf.predict_proba(X_train) 130 | fpr_train, tpr_train, thres_train = roc_curve( y_train, y_train_pre[:,1]) 131 | 132 | y_test_pre = clf.predict_proba(X_test) 133 | fpr_test, tpr_test, thres_test = roc_curve( y_test, y_test_pre[:,1]) 134 | 135 | print auc(fpr_train, tpr_train) 136 | print auc(fpr_test, tpr_test) 137 | print 138 | 139 | 140 | 141 | 142 | for item in range(10): 143 | thres = item*1.0/10 144 | P1_1_train = precision_score(y_train, y_train_pre[:,1]>=thres ) 145 | R1_1_train = recall_score(y_train, y_train_pre[:,1]>=thres) 146 | F11_train = 5*R1_1_train*P1_1_train/(2*R1_1_train+3*P1_1_train) 147 | P1_1_test = precision_score(y_test, y_test_pre[:,1]>=thres ) 148 | R1_1_test = recall_score(y_test, y_test_pre[:,1]>=thres) 149 | F11_test = 5*R1_1_test*P1_1_test/(2*R1_1_test+3*P1_1_test) 150 | 151 | print "Thres %s"%(thres) 152 | print 'Train F11: %.4f, P: %.4f, R: %.4f'%(F11_train, P1_1_train, R1_1_train) 153 | print 'Test F11: %.4f, P: %.4f, R: %.4f'%(F11_test, P1_1_test, R1_1_test) 154 | print 155 | 156 | -------------------------------------------------------------------------------- /model_evaluation/rule_v01/rule_user_id_score_v01.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # @Author: shu.wen 4 | # @Date: 2016-05-30 15:31:15 5 | # @Last Modified by: shu.wen 6 | # @Last Modified time: 2017-05-12 00:37:06 7 | # 打分数据 /root/data/rule/RULE_USER_CATE_ACTION_SCORE_TBL.csv 8 | __author__ = 'shu.wen' 9 | 10 | import os 11 | import sys 12 | import csv 13 | import time 14 | import glob 15 | import pandas 16 | import math 17 | import json 18 | import numpy as np 19 | # import pylab as pl 20 | from sklearn import svm 21 | from sklearn.metrics import precision_score 22 | from sklearn.metrics import recall_score 23 | from sklearn.metrics import f1_score 24 | from sklearn.metrics import classification_report 25 | from sklearn.externals import joblib 26 | from sklearn.cross_validation import train_test_split 27 | from sklearn.cross_validation import KFold 28 | from sklearn.metrics import confusion_matrix 29 | from sklearn.metrics import accuracy_score 30 | from sklearn.metrics import mean_squared_error 31 | from sklearn.metrics import roc_curve 32 | from sklearn.metrics import auc 33 | from sklearn.metrics import mean_squared_error 34 | from sklearn.feature_extraction import DictVectorizer 35 | from sklearn.feature_selection import f_regression 36 | import re 37 | import pandas 38 | from sklearn import tree 39 | from sklearn import ensemble 40 | import random 41 | from sklearn.externals.six import StringIO 42 | # import matplotlib.pyplot as plt 43 | from sklearn.linear_model import LogisticRegression 44 | from collections import Counter 45 | from sklearn.externals import joblib 46 | from sklearn.tree import DecisionTreeClassifier 47 | from math import isnan 48 | from sklearn.preprocessing import StandardScaler 49 | from sklearn.svm import SVC 50 | from sklearn.ensemble import GradientBoostingClassifier 51 | from sklearn.ensemble import RandomForestClassifier 52 | from sklearn.ensemble import RandomForestRegressor 53 | from sklearn.cross_validation import KFold 54 | from sklearn.ensemble import GradientBoostingClassifier 55 | import matplotlib.pyplot as plt 56 | #import feature_trt_v5 57 | 58 | 59 | if __name__ == '__main__': 60 | 61 | # step_01 读入数据集 62 | master_table_address = r'C:\shu.wen\opera\00_projects\99_competition\01_JData\data\rule\RULE_USER_CATE_ACTION_SCORE_TBL.csv' 63 | master_df = pandas.read_csv(master_table_address, sep='|', header=0) 64 | 65 | 66 | print 'row_cnt: %d'%(master_df.shape[0]) 67 | print 68 | 69 | master_df_filter = master_df[ 70 | ((master_df.USER_CATE_8_TYPE_2_CNT_06.isin([0,1,2,3,4]) & (master_df.USER_CATE_8_TYPE_5_CNT_06 == 2)) 71 | | (master_df.USER_CATE_8_TYPE_2_CNT_06.isin([3,4,5]) & (master_df.USER_CATE_8_TYPE_6_CNT_06 > 15)) 72 | | (master_df.USER_CATE_8_TYPE_5_CNT_05 > 1) 73 | | (master_df.USER_CATE_8_TYPE_2_CNT_05 > 1) 74 | #| (master_df.USER_CATE_8_TYPE_3_CNT_05 > 1) 75 | | (master_df.USER_REG_10_FLAG == 1) 76 | ) 77 | & (master_df.USER_SKU_DIS_CNT <= 3) 78 | & (master_df.USER_LV_CD.isin([3,4,5])) 79 | ] 80 | 81 | print 'row_cnt: %d'%(master_df_filter.shape[0]) 82 | print 83 | 84 | user_id_candidate = set(master_df_filter.USER_ID) 85 | master_df['y_pre'] = master_df.USER_ID.apply(lambda x: 1 if x in user_id_candidate else 0) 86 | 87 | 88 | # step_01 读入数据集 89 | master_df[master_df['y_pre'] == 1][['USER_ID', 'USER_SKU_DIS_CNT']].to_csv(r'C:\shu.wen\opera\00_projects\99_competition\01_JData\data\rule\USER_ID_SCORE_CANDIDATE.csv', sep=',', index=False ) 90 | -------------------------------------------------------------------------------- /model_evaluation/rule_v01/rule_user_id_train_v01.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # @Author: shu.wen 4 | # @Date: 2016-05-30 15:31:15 5 | # @Last Modified by: shu.wen 6 | # @Last Modified time: 2017-05-12 00:37:06 7 | # 训练数据 /root/data/rule/RULE_USER_CATE_ACTION_MST_TBL_v4.csv 8 | __author__ = 'shu.wen' 9 | 10 | import os 11 | import sys 12 | import csv 13 | import time 14 | import glob 15 | import pandas 16 | import math 17 | import json 18 | import numpy as np 19 | # import pylab as pl 20 | from sklearn import svm 21 | from sklearn.metrics import precision_score 22 | from sklearn.metrics import recall_score 23 | from sklearn.metrics import f1_score 24 | from sklearn.metrics import classification_report 25 | from sklearn.externals import joblib 26 | from sklearn.cross_validation import train_test_split 27 | from sklearn.cross_validation import KFold 28 | from sklearn.metrics import confusion_matrix 29 | from sklearn.metrics import accuracy_score 30 | from sklearn.metrics import mean_squared_error 31 | from sklearn.metrics import roc_curve 32 | from sklearn.metrics import auc 33 | from sklearn.metrics import mean_squared_error 34 | from sklearn.feature_extraction import DictVectorizer 35 | from sklearn.feature_selection import f_regression 36 | import re 37 | import pandas 38 | from sklearn import tree 39 | from sklearn import ensemble 40 | import random 41 | from sklearn.externals.six import StringIO 42 | # import matplotlib.pyplot as plt 43 | from sklearn.linear_model import LogisticRegression 44 | from collections import Counter 45 | from sklearn.externals import joblib 46 | from sklearn.tree import DecisionTreeClassifier 47 | from math import isnan 48 | from sklearn.preprocessing import StandardScaler 49 | from sklearn.svm import SVC 50 | from sklearn.ensemble import GradientBoostingClassifier 51 | from sklearn.ensemble import RandomForestClassifier 52 | from sklearn.ensemble import RandomForestRegressor 53 | from sklearn.cross_validation import KFold 54 | from sklearn.ensemble import GradientBoostingClassifier 55 | import matplotlib.pyplot as plt 56 | #import feature_trt_v5 57 | 58 | 59 | if __name__ == '__main__': 60 | 61 | # step_01 读入数据集 62 | master_table_address = r'C:\shu.wen\opera\00_projects\99_competition\01_JData\data\rule\RULE_USER_CATE_ACTION_MST_TBL_v4.csv' 63 | master_df = pandas.read_csv(master_table_address, sep='|', header=0) 64 | 65 | 66 | print 'row_cnt: %d, trt_rt: %.4f'%(master_df.shape[0], master_df.Y.sum()*1.0/master_df.shape[0]) 67 | print 68 | 69 | master_df_filter = master_df[ 70 | ((master_df.USER_CATE_8_TYPE_2_CNT_06.isin([0,1,2,3,4]) & (master_df.USER_CATE_8_TYPE_5_CNT_06 == 2)) 71 | | (master_df.USER_CATE_8_TYPE_2_CNT_06.isin([3,4,5]) & (master_df.USER_CATE_8_TYPE_6_CNT_06 > 15)) 72 | | (master_df.USER_CATE_8_TYPE_5_CNT_05 > 1) 73 | | (master_df.USER_CATE_8_TYPE_2_CNT_05 > 1) 74 | #| (master_df.USER_CATE_8_TYPE_3_CNT_05 > 1) 75 | | (master_df.USER_REG_10_FLAG == 1) 76 | ) 77 | & (master_df.USER_SKU_DIS_CNT <= 3) 78 | & (master_df.USER_LV_CD.isin([3,4,5])) 79 | ] 80 | 81 | print 'row_cnt: %d, trt_rt: %.4f'%(master_df_filter.shape[0], master_df_filter.Y.sum()*1.0/master_df_filter.shape[0]) 82 | 83 | 84 | user_id_candidate = set(master_df_filter.USER_ID) 85 | master_df['y_pre'] = master_df.USER_ID.apply(lambda x: 1 if x in user_id_candidate else 0) 86 | 87 | P1_1_train = precision_score(master_df['Y'], master_df['y_pre']) 88 | R1_1_train = recall_score(master_df['Y'], master_df['y_pre']) 89 | F11_train = 6*R1_1_train*P1_1_train/(5*R1_1_train+P1_1_train) 90 | print 91 | print 'F11: %.4f, P: %.4f, R: %.4f'%(F11_train, P1_1_train, R1_1_train) 92 | 93 | 94 | # step_01 读入数据集 95 | master_table_address = r'C:\shu.wen\opera\00_projects\99_competition\01_JData\data\rule\RULE_USER_CATE_ACTION_MST_TBL_v4.csv' 96 | master_df[master_df['y_pre'] == 1][['USER_ID', 'USER_SKU_DIS_CNT']].to_csv(r'C:\shu.wen\opera\00_projects\99_competition\01_JData\data\rule\USER_ID_CANDIDATE.csv', sep=',', index=False ) 97 | -------------------------------------------------------------------------------- /model_evaluation/rule_v01/submit_sw_20170513_rule_D2.csv: -------------------------------------------------------------------------------- 1 | "user_id ",sku_id 2 | 201409,103652 3 | 203624,50688 4 | 205855,95850 5 | 206143,154636 6 | 206484,57018 7 | 206544,18412 8 | 206606,56792 9 | 206967,80462 10 | 207048,111999 11 | 207073,57161 12 | 207089,146704 13 | 207097,153645 14 | 207147,58475 15 | 207155,18412 16 | 207157,46911 17 | 207237,124997 18 | 207240,79520 19 | 207274,13785 20 | 207277,5505 21 | 207288,12564 22 | 207289,21147 23 | 207334,154636 24 | 207357,154636 25 | 207363,128988 26 | 207390,18412 27 | 207391,93295 28 | 207433,126535 29 | 207450,18412 30 | 207493,68767 31 | 207494,152478 32 | 207522,12564 33 | 207526,63006 34 | 207601,81708 35 | 207982,14163 36 | 208000,79520 37 | 210412,15259 38 | 210801,109083 39 | 211454,144267 40 | 211928,69355 41 | 212752,5825 42 | 212996,31662 43 | 213743,6486 44 | 213968,103234 45 | 214839,31662 46 | 215525,48895 47 | 217763,59175 48 | 218204,63006 49 | 219543,61531 50 | 219973,24371 51 | 220282,154732 52 | 220724,69209 53 | 221414,164252 54 | 221992,52343 55 | 222683,114640 56 | 223769,84409 57 | 224144,119979 58 | 224533,37995 59 | 225515,79636 60 | 225886,145974 61 | 225890,92909 62 | 226388,43062 63 | 227236,5505 64 | 227849,31493 65 | 227959,47107 66 | 229250,154636 67 | 231687,6533 68 | 234139,133029 69 | 234379,164211 70 | 236261,154732 71 | 236950,62351 72 | 237453,5505 73 | 237523,21147 74 | 238419,133477 75 | 238606,14163 76 | 238689,123016 77 | 240871,60230 78 | 241653,31662 79 | 242259,31662 80 | 244001,162658 81 | 245751,32465 82 | 246639,63006 83 | 248451,74517 84 | 248572,36307 85 | 248604,154636 86 | 248713,38955 87 | 249048,13334 88 | 249668,5505 89 | 249736,61226 90 | 250674,128747 91 | 251443,131300 92 | 251511,14433 93 | 254179,73469 94 | 254797,149641 95 | 255053,161641 96 | 255317,83144 97 | 255365,135409 98 | 256534,116489 99 | 256571,166707 100 | 256833,47895 101 | 258422,126146 102 | 259260,63006 103 | 260185,171182 104 | 260329,117452 105 | 262167,69209 106 | 263525,111225 107 | 263572,31662 108 | 264402,75877 109 | 264647,18412 110 | 266966,73842 111 | 267010,90621 112 | 268485,65520 113 | 269595,20308 114 | 269984,138151 115 | 271419,59820 116 | 274051,154636 117 | 274276,15106 118 | 274370,5825 119 | 274616,63006 120 | 275530,62872 121 | 275634,154636 122 | 277327,32465 123 | 278405,52343 124 | 279866,32465 125 | 280250,65520 126 | 280664,160476 127 | 281411,126146 128 | 283182,60230 129 | 283335,31662 130 | 284471,107090 131 | 285478,109728 132 | 285597,149641 133 | 286766,164258 134 | 287186,21147 135 | 287249,65520 136 | 288172,126146 137 | 289428,18412 138 | 290923,63006 139 | 291403,6533 140 | 295940,75877 141 | 296717,76959 142 | 298241,88295 143 | 298461,65520 144 | 299721,39830 145 | 299781,31662 146 | 299869,128747 147 | 302029,63006 148 | 302115,68767 149 | 302912,36307 150 | 304114,142477 151 | 304731,154636 152 | -------------------------------------------------------------------------------- /prof/prof_action.txt: -------------------------------------------------------------------------------- 1 | ===== Check action data ===== 2 | 3 | > Check sample records... 4 | user_id sku_id time model_id type cate brand 5 | 0 280567 167208 2016-02-29 23:59:01 0.0 6 4 519 6 | 1 270248 35533 2016-02-29 23:59:02 111.0 6 4 306 7 | 2 203360 78694 2016-02-29 23:59:02 NaN 1 8 244 8 | 3 252369 90402 2016-02-29 23:59:03 0.0 6 7 38 9 | 4 279590 154208 2016-02-29 23:59:03 0.0 6 5 570 10 | 5 203360 78694 2016-02-29 23:59:03 0.0 6 8 244 11 | 6 279590 154208 2016-02-29 23:59:03 0.0 6 5 570 12 | 7 279590 154208 2016-02-29 23:59:03 NaN 1 5 570 13 | 8 252369 90402 2016-02-29 23:59:04 13.0 6 7 38 14 | 9 257109 95850 2016-02-29 23:59:04 0.0 6 8 800 15 | 16 | > Check column data type... 17 | user_id int64 18 | sku_id int64 19 | time datetime64[ns] 20 | model_id float64 21 | type int64 22 | cate int64 23 | brand int64 24 | dtype: object 25 | 26 | > Count records... 27 | 50601736 28 | 29 | > Count unique user_id... 30 | 105180 31 | 32 | > Count unique sku_id... 33 | 28710 34 | 35 | > Count records by model_id... 36 | NaN 20655896 37 | 0.0 12112574 38 | 216.0 4842945 39 | 217.0 4335430 40 | 27.0 1468903 41 | 26.0 1388148 42 | 218.0 1203986 43 | 211.0 651555 44 | 24.0 568860 45 | 29.0 486169 46 | 21.0 349610 47 | 111.0 228742 48 | 17.0 220852 49 | 210.0 201725 50 | 219.0 182587 51 | 222.0 125944 52 | 23.0 123880 53 | 19.0 116766 54 | 220.0 115865 55 | 31.0 112592 56 | 13.0 108323 57 | 14.0 95793 58 | 25.0 91814 59 | 119.0 82586 60 | 28.0 79186 61 | 11.0 74777 62 | 221.0 73715 63 | 16.0 67316 64 | 223.0 52626 65 | 115.0 39406 66 | ... 67 | 116.0 1948 68 | 319.0 1928 69 | 323.0 1917 70 | 321.0 1872 71 | 324.0 1740 72 | 326.0 1446 73 | 325.0 1427 74 | 327.0 1397 75 | 328.0 1296 76 | 329.0 1282 77 | 331.0 1200 78 | 330.0 1030 79 | 212.0 1001 80 | 333.0 958 81 | 340.0 941 82 | 339.0 830 83 | 341.0 682 84 | 336.0 672 85 | 347.0 624 86 | 348.0 556 87 | 337.0 553 88 | 335.0 550 89 | 346.0 538 90 | 345.0 523 91 | 332.0 518 92 | 338.0 496 93 | 334.0 487 94 | 344.0 468 95 | 343.0 458 96 | 342.0 403 97 | Name: model_id, dtype: int64 98 | 99 | > Count records by type... 100 | 6 30630744 101 | 1 18981373 102 | 2 575418 103 | 3 256053 104 | 5 109896 105 | 4 48252 106 | Name: type, dtype: int64 107 | 108 | > Count records by category... 109 | 8 18128055 110 | 4 11088350 111 | 6 6554984 112 | 5 5731403 113 | 7 4365031 114 | 9 4005882 115 | 10 627359 116 | 11 100672 117 | Name: cate, dtype: int64 118 | 119 | > Count records by brand... 120 | 214 5507485 121 | 489 5421227 122 | 306 5225207 123 | 800 2680474 124 | 545 2674812 125 | 885 2069165 126 | 78 1696804 127 | 519 1581827 128 | 693 1212508 129 | 658 1151313 130 | 479 1025021 131 | 36 994366 132 | 200 980285 133 | 640 951659 134 | 174 948812 135 | 403 943311 136 | 159 815889 137 | 124 814166 138 | 30 797271 139 | 752 777523 140 | 375 737387 141 | 630 691047 142 | 587 656539 143 | 842 556712 144 | 453 539459 145 | 269 477171 146 | 741 434029 147 | 235 426413 148 | 427 405237 149 | 225 388313 150 | ... 151 | 64 15 152 | 436 14 153 | 718 13 154 | 818 12 155 | 461 12 156 | 779 11 157 | 526 10 158 | 84 10 159 | 294 8 160 | 685 7 161 | 323 7 162 | 449 6 163 | 787 6 164 | 120 6 165 | 584 5 166 | 774 5 167 | 34 5 168 | 272 4 169 | 740 4 170 | 574 3 171 | 339 2 172 | 329 2 173 | 882 2 174 | 775 2 175 | 610 2 176 | 690 2 177 | 854 2 178 | 287 2 179 | 889 1 180 | 759 1 181 | Name: brand, dtype: int64 182 | 183 | > Count records by time... 184 | 2016-01-31 23:59:02 1 185 | 2016-01-31 23:59:03 1 186 | 2016-01-31 23:59:07 1 187 | 2016-01-31 23:59:08 2 188 | 2016-01-31 23:59:11 3 189 | 2016-01-31 23:59:14 2 190 | 2016-01-31 23:59:16 1 191 | 2016-01-31 23:59:22 1 192 | 2016-01-31 23:59:24 1 193 | 2016-01-31 23:59:25 2 194 | 2016-01-31 23:59:40 1 195 | 2016-01-31 23:59:48 1 196 | 2016-01-31 23:59:50 1 197 | 2016-01-31 23:59:52 1 198 | 2016-01-31 23:59:53 1 199 | 2016-01-31 23:59:54 2 200 | 2016-01-31 23:59:59 7 201 | 2016-02-01 00:00:00 7 202 | 2016-02-01 00:00:01 4 203 | 2016-02-01 00:00:03 4 204 | 2016-02-01 00:00:04 1 205 | 2016-02-01 00:00:05 5 206 | 2016-02-01 00:00:06 12 207 | 2016-02-01 00:00:08 3 208 | 2016-02-01 00:00:10 1 209 | 2016-02-01 00:00:11 3 210 | 2016-02-01 00:00:12 1 211 | 2016-02-01 00:00:13 1 212 | 2016-02-01 00:00:14 6 213 | 2016-02-01 00:00:15 16 214 | .. 215 | 2016-04-15 23:59:29 5 216 | 2016-04-15 23:59:30 4 217 | 2016-04-15 23:59:31 14 218 | 2016-04-15 23:59:32 32 219 | 2016-04-15 23:59:33 15 220 | 2016-04-15 23:59:34 24 221 | 2016-04-15 23:59:35 8 222 | 2016-04-15 23:59:36 3 223 | 2016-04-15 23:59:37 29 224 | 2016-04-15 23:59:38 17 225 | 2016-04-15 23:59:39 5 226 | 2016-04-15 23:59:40 22 227 | 2016-04-15 23:59:41 11 228 | 2016-04-15 23:59:43 4 229 | 2016-04-15 23:59:44 12 230 | 2016-04-15 23:59:45 2 231 | 2016-04-15 23:59:46 2 232 | 2016-04-15 23:59:47 21 233 | 2016-04-15 23:59:48 14 234 | 2016-04-15 23:59:49 2 235 | 2016-04-15 23:59:50 14 236 | 2016-04-15 23:59:51 7 237 | 2016-04-15 23:59:52 13 238 | 2016-04-15 23:59:53 20 239 | 2016-04-15 23:59:54 12 240 | 2016-04-15 23:59:55 39 241 | 2016-04-15 23:59:56 7 242 | 2016-04-15 23:59:57 11 243 | 2016-04-15 23:59:58 2 244 | 2016-04-15 23:59:59 1 245 | Name: time, dtype: int64 246 | 247 | > Count unique sku_id (1.used to be ordered; 2.in cate8)... 248 | 859 249 | 250 | > Count total orders (1.used to be ordered; 2.in cate8)... 251 | 13281 252 | 253 | > Count total orders by sku_id(1.used to be ordered; 2.in cate8)... 254 | 154636 691 255 | 63006 420 256 | 31662 379 257 | 12564 359 258 | 57018 307 259 | 52343 245 260 | 32465 229 261 | 128988 192 262 | 18412 191 263 | 79520 172 264 | 131300 168 265 | 69209 166 266 | 21147 164 267 | 166707 160 268 | 126146 156 269 | 152478 154 270 | 44854 144 271 | 151327 134 272 | 149641 129 273 | 5825 127 274 | 75877 124 275 | 24371 124 276 | 40336 121 277 | 84389 120 278 | 36307 117 279 | 65520 111 280 | 5505 109 281 | 89802 107 282 | 79636 106 283 | 166354 105 284 | ... 285 | 50684 1 286 | 142809 1 287 | 166955 1 288 | 138289 1 289 | 21942 1 290 | 154983 1 291 | 87310 1 292 | 79106 1 293 | 107318 1 294 | 36685 1 295 | 139681 1 296 | 141200 1 297 | 168303 1 298 | 67924 1 299 | 285 1 300 | 67796 1 301 | 164354 1 302 | 117267 1 303 | 109087 1 304 | 162341 1 305 | 78412 1 306 | 47693 1 307 | 49269 1 308 | 43112 1 309 | 12371 1 310 | 94283 1 311 | 32715 1 312 | 63434 1 313 | 31493 1 314 | 75849 1 315 | Name: sku_id, dtype: int64 316 | -------------------------------------------------------------------------------- /prof/prof_comment.txt: -------------------------------------------------------------------------------- 1 | ===== Check comment data ===== 2 | 3 | > Check sample records... 4 | dt sku_id comment_num has_bad_comment bad_comment_rate 5 | 0 2016-02-01 1000 3 1 0.0417 6 | 1 2016-02-01 10000 2 0 0.0000 7 | 2 2016-02-01 100011 4 1 0.0376 8 | 3 2016-02-01 100018 3 0 0.0000 9 | 4 2016-02-01 100020 3 0 0.0000 10 | 5 2016-02-01 100021 1 0 0.0000 11 | 6 2016-02-01 100028 1 0 0.0000 12 | 7 2016-02-01 100031 3 0 0.0000 13 | 8 2016-02-01 100033 2 0 0.0000 14 | 9 2016-02-01 100035 1 1 1.0000 15 | 16 | > Check column data type... 17 | dt datetime64[ns] 18 | sku_id int64 19 | comment_num int64 20 | has_bad_comment int64 21 | bad_comment_rate float64 22 | dtype: object 23 | 24 | > Count records... 25 | 558552 26 | 27 | > Count comments by dt... 28 | 2016-02-01 46546 29 | 2016-02-08 46546 30 | 2016-02-15 46546 31 | 2016-02-22 46546 32 | 2016-02-29 46546 33 | 2016-03-07 46546 34 | 2016-03-14 46546 35 | 2016-03-21 46546 36 | 2016-03-28 46546 37 | 2016-04-04 46546 38 | 2016-04-11 46546 39 | 2016-04-15 46546 40 | Name: dt, dtype: int64 41 | 42 | > Count unique sku_id... 43 | 46546 44 | 45 | > Count records by comment_num... 46 | 2 168698 47 | 4 164789 48 | 3 119642 49 | 1 85430 50 | 0 19993 51 | Name: comment_num, dtype: int64 52 | 53 | > Count records by has_bad_comment... 54 | 0 292978 55 | 1 265574 56 | Name: has_bad_comment, dtype: int64 57 | 58 | > Count records by bad_comment_rate... 59 | 0.0000 292978 60 | 0.0006 12 61 | 0.0008 12 62 | 0.0010 12 63 | 0.0012 27 64 | 0.0013 46 65 | 0.0014 13 66 | 0.0015 28 67 | 0.0016 33 68 | 0.0018 3 69 | 0.0019 14 70 | 0.0020 66 71 | 0.0021 11 72 | 0.0022 24 73 | 0.0023 37 74 | 0.0024 21 75 | 0.0025 27 76 | 0.0026 43 77 | 0.0027 51 78 | 0.0028 59 79 | 0.0029 13 80 | 0.0030 54 81 | 0.0031 36 82 | 0.0032 73 83 | 0.0033 28 84 | 0.0034 32 85 | 0.0035 39 86 | 0.0036 30 87 | 0.0037 53 88 | 0.0038 61 89 | ... 90 | 0.3143 1 91 | 0.3158 48 92 | 0.3200 1 93 | 0.3214 1 94 | 0.3333 5081 95 | 0.3500 12 96 | 0.3529 12 97 | 0.3571 24 98 | 0.3636 31 99 | 0.3750 123 100 | 0.3800 11 101 | 0.3878 1 102 | 0.3889 12 103 | 0.4000 530 104 | 0.4167 12 105 | 0.4286 83 106 | 0.4444 22 107 | 0.4545 2 108 | 0.5000 5861 109 | 0.5385 1 110 | 0.5714 29 111 | 0.6000 127 112 | 0.6250 24 113 | 0.6429 12 114 | 0.6667 457 115 | 0.7500 69 116 | 0.7778 4 117 | 0.8000 3 118 | 0.8750 1 119 | 1.0000 6647 120 | Name: bad_comment_rate, dtype: int64 121 | -------------------------------------------------------------------------------- /prof/prof_product.txt: -------------------------------------------------------------------------------- 1 | ===== Check product data ===== 2 | 3 | > Check sample records... 4 | sku_id a1 a2 a3 cate brand 5 | 0 10 3 1 1 8 489 6 | 1 100002 3 2 2 8 489 7 | 2 100003 1 -1 -1 8 30 8 | 3 100006 1 2 1 8 545 9 | 4 10001 -1 1 2 8 244 10 | 5 100016 3 -1 2 8 214 11 | 6 100029 3 2 2 8 214 12 | 7 10003 3 1 2 8 214 13 | 8 100045 2 2 2 8 124 14 | 9 100057 3 1 2 8 306 15 | 16 | > Check column data type... 17 | sku_id int64 18 | a1 int64 19 | a2 int64 20 | a3 int64 21 | cate int64 22 | brand int64 23 | dtype: object 24 | 25 | > Count records... 26 | 24187 27 | 28 | > Count unique sku_id... 29 | 24187 30 | 31 | > Count products by a1... 32 | 3 14144 33 | 1 4760 34 | 2 3582 35 | -1 1701 36 | Name: a1, dtype: int64 37 | 38 | > Count products by a2... 39 | 1 13513 40 | 2 6624 41 | -1 4050 42 | Name: a2, dtype: int64 43 | 44 | > Count products by a3... 45 | 2 11978 46 | 1 8394 47 | -1 3815 48 | Name: a3, dtype: int64 49 | 50 | > Count products by category... 51 | 8 24187 52 | Name: cate, dtype: int64 53 | 54 | > Count products by brand... 55 | 489 6637 56 | 214 6444 57 | 623 1101 58 | 812 1061 59 | 800 1015 60 | 545 945 61 | 124 932 62 | 306 795 63 | 30 659 64 | 885 601 65 | 403 517 66 | 693 372 67 | 658 368 68 | 766 254 69 | 571 235 70 | 635 199 71 | 562 134 72 | 244 114 73 | 655 102 74 | 622 97 75 | 801 94 76 | 321 82 77 | 790 81 78 | 677 77 79 | 200 72 80 | 427 67 81 | 674 53 82 | 174 50 83 | 857 49 84 | 596 48 85 | ... 86 | 574 4 87 | 665 4 88 | 116 4 89 | 291 3 90 | 197 3 91 | 383 3 92 | 180 2 93 | 739 2 94 | 227 2 95 | 907 2 96 | 759 2 97 | 453 2 98 | 324 2 99 | 336 1 100 | 285 1 101 | 479 1 102 | 49 1 103 | 752 1 104 | 13 1 105 | 772 1 106 | 331 1 107 | 299 1 108 | 354 1 109 | 922 1 110 | 554 1 111 | 905 1 112 | 871 1 113 | 855 1 114 | 499 1 115 | 438 1 116 | Name: brand, dtype: int64 117 | -------------------------------------------------------------------------------- /prof/prof_user.txt: -------------------------------------------------------------------------------- 1 | ===== Check user data ===== 2 | 3 | > Check sample records... 4 | user_id age sex user_lv_cd user_reg_tm 5 | 0 200001 56岁以上 2.0 5 2016-01-26 6 | 1 200002 -1 0.0 1 2016-01-26 7 | 2 200003 36-45岁 1.0 4 2016-01-26 8 | 3 200004 -1 2.0 1 2016-01-26 9 | 4 200005 16-25岁 0.0 4 2016-01-26 10 | 5 200006 36-45岁 2.0 2 2013-04-10 11 | 6 200007 36-45岁 2.0 3 2016-01-26 12 | 7 200008 -1 2.0 3 2016-01-26 13 | 8 200009 36-45岁 2.0 2 2016-01-26 14 | 9 200010 36-45岁 2.0 3 2016-01-26 15 | 16 | > Check column data type... 17 | user_id int64 18 | age object 19 | sex float64 20 | user_lv_cd int64 21 | user_reg_tm datetime64[ns] 22 | dtype: object 23 | 24 | > Count records... 25 | 105321 26 | 27 | > Count unique user_id... 28 | 105321 29 | 30 | > Count users by age... 31 | 26-35岁 46570 32 | 36-45岁 30336 33 | -1 14412 34 | 16-25岁 8797 35 | 46-55岁 3325 36 | 56岁以上 1871 37 | 15岁以下 7 38 | NaN 3 39 | Name: age, dtype: int64 40 | 41 | > Count users by sex... 42 | 2.0 54735 43 | 0.0 42846 44 | 1.0 7737 45 | NaN 3 46 | Name: sex, dtype: int64 47 | 48 | > Count users by level... 49 | 5 36088 50 | 4 32343 51 | 3 24563 52 | 2 9661 53 | 1 2666 54 | Name: user_lv_cd, dtype: int64 55 | 56 | > Count users by reg date... 57 | NaT 3 58 | 2003-06-16 1 59 | 2003-09-05 1 60 | 2003-10-15 1 61 | 2003-11-11 1 62 | 2003-12-02 1 63 | 2003-12-08 1 64 | 2003-12-21 1 65 | 2004-01-09 1 66 | 2004-01-10 1 67 | 2004-01-13 1 68 | 2004-03-15 1 69 | 2004-04-16 1 70 | 2004-04-27 1 71 | 2004-04-29 1 72 | 2004-05-07 1 73 | 2004-06-16 1 74 | 2004-06-18 1 75 | 2004-07-13 1 76 | 2004-07-15 1 77 | 2004-07-24 1 78 | 2004-08-28 1 79 | 2004-09-07 1 80 | 2004-09-08 1 81 | 2004-10-26 1 82 | 2004-10-27 1 83 | 2004-10-28 1 84 | 2004-10-29 1 85 | 2004-10-30 2 86 | 2004-10-31 1 87 | .. 88 | 2016-03-25 36 89 | 2016-03-26 36 90 | 2016-03-27 46 91 | 2016-03-28 46 92 | 2016-03-29 35 93 | 2016-03-30 43 94 | 2016-03-31 42 95 | 2016-04-01 28 96 | 2016-04-02 38 97 | 2016-04-03 39 98 | 2016-04-04 38 99 | 2016-04-05 33 100 | 2016-04-06 30 101 | 2016-04-07 35 102 | 2016-04-08 36 103 | 2016-04-09 20 104 | 2016-04-10 18 105 | 2016-04-11 15 106 | 2016-04-12 17 107 | 2016-04-13 16 108 | 2016-04-14 9 109 | 2016-04-15 18 110 | 2016-04-29 1 111 | 2016-05-11 1 112 | 2016-05-24 2 113 | 2016-06-06 1 114 | 2016-07-05 1 115 | 2016-09-11 1 116 | 2016-10-05 1 117 | 2016-11-25 1 118 | Name: user_reg_tm, dtype: int64 119 | -------------------------------------------------------------------------------- /rnn.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | #from __future__ import print_function 4 | 5 | import os 6 | import tensorflow as tf 7 | from tensorflow.contrib import rnn 8 | import numpy as np 9 | import pandas as pd 10 | import datetime 11 | import pickle 12 | import random 13 | import math 14 | from sklearn.metrics import roc_curve, auc 15 | import matplotlib.pyplot as plt 16 | 17 | # ---------- path definition ---------- # 18 | MAIN_DIR = os.path.dirname(os.path.abspath(__file__)) 19 | TEMP_DIR = os.path.join(MAIN_DIR, 'temp') 20 | 21 | # ---------- file definition ---------- # 22 | MASTER_DATA = os.path.join(TEMP_DIR, 'master.csv') 23 | 24 | MASTER_DATA_X = os.path.join(TEMP_DIR, 'master_x.csv') 25 | MASTER_DATA_Y = os.path.join(TEMP_DIR, 'master_y.csv') 26 | SKUS = os.path.join(TEMP_DIR, 'sku_list.csv') 27 | USERS = os.path.join(TEMP_DIR, 'user_list.csv') 28 | BRANDS = os.path.join(TEMP_DIR, 'brand_list.csv') 29 | MODEL_IDS = os.path.join(TEMP_DIR, 'model_id_list.csv') 30 | 31 | TRAIN_SEQUENCE = os.path.join(TEMP_DIR, 'train_sequence.pkl') 32 | SCORE_SEQUENCE = os.path.join(TEMP_DIR, 'score_sequence.pkl') 33 | 34 | TRAIN_LABELS = os.path.join(TEMP_DIR, 'train_labels.pkl') 35 | SCORE_LABELS = os.path.join(TEMP_DIR, 'score_labels.pkl') 36 | 37 | TRAINSET = os.path.join(TEMP_DIR, 'trainset.pkl') 38 | TESTSET = os.path.join(TEMP_DIR, 'testset.pkl') 39 | SCORESET = os.path.join(TEMP_DIR, 'scoreset.pkl') 40 | 41 | TRAINSET_USER_RESULT = os.path.join(TEMP_DIR, 'trainset_user_result.pkl') 42 | TESTSET_USER_RESULT = os.path.join(TEMP_DIR, 'testset_user_result.pkl') 43 | SCORESET_USER_RESULT = os.path.join(TEMP_DIR, 'scoreset_user_result.pkl') 44 | 45 | TRAINSET_SKU_RESULT = os.path.join(TEMP_DIR, 'trainset_sku_result.pkl') 46 | TESTSET_SKU_RESULT = os.path.join(TEMP_DIR, 'testset_sku_result.pkl') 47 | SCORESET_SKU_RESULT = os.path.join(TEMP_DIR, 'scoreset_sku_result.pkl') 48 | 49 | TRAINSET_RESULT = os.path.join(TEMP_DIR, 'trainset_result.pkl') 50 | TESTSET_RESULT = os.path.join(TEMP_DIR, 'testset_result.pkl') 51 | SCORESET_RESULT = os.path.join(TEMP_DIR, 'scoreset_result.pkl') 52 | 53 | SCORE_FILE = os.path.join(TEMP_DIR, 'score.csv') 54 | OUTPUT_FILE = os.path.join(TEMP_DIR, 'upload.csv') 55 | 56 | USER_STEP_RESULT = os.path.join(TEMP_DIR, 'user_step_result.pkl') 57 | SKU_STEP_RESULT = os.path.join(TEMP_DIR, 'sku_step_result.pkl') 58 | 59 | PROF_ACTION_NUM = os.path.join(TEMP_DIR, 'prof_action_num.csv') 60 | 61 | # ---------- constants ---------- # 62 | EVENT_LENGTH = 500 63 | TOP_N_SKU = 100 64 | TOP_N_BRAND = 20 65 | 66 | # ---------- prepare training data ---------- # 67 | def dump_pickle(dataset, save_file): 68 | with open(save_file, 'wb') as handle: 69 | pickle.dump(dataset, handle, protocol=pickle.HIGHEST_PROTOCOL) 70 | 71 | def load_pickle(save_file): 72 | with open(save_file, 'rb') as handle: 73 | data = pickle.load(handle) 74 | return data 75 | 76 | def load_csv(save_file): 77 | return pd.read_csv(save_file, sep=',', header=0, encoding='utf-8') 78 | 79 | def separate_time_window(infile, outfile_x, outfile_y): 80 | # set time window 81 | start_dt = datetime.date(2016,2,1) 82 | cut_dt = datetime.date(2016,4,8) 83 | end_dt = datetime.date(2016,4,13) 84 | # read master 85 | df = pd.read_csv(infile, sep=',', header=0, encoding='utf-8') 86 | df['time'] = pd.to_datetime(df['time'], errors='coerce') 87 | df['date'] = pd.to_datetime(df['date'], errors='coerce') 88 | df['user_reg_tm'] = pd.to_datetime(df['user_reg_tm'], errors='coerce') 89 | # separate by time window 90 | x = df[(df['date'] >= start_dt) & (df['date'] <= cut_dt)] 91 | y = df[(df['date'] > cut_dt) & (df['date'] <= end_dt)] 92 | x.to_csv(outfile_x, sep=',', index=False, encoding='utf-8') 93 | y.to_csv(outfile_y, sep=',', index=False, encoding='utf-8') 94 | 95 | def get_skus(infile, outfile): 96 | # keep top TOP_N_SKU popular sku 97 | df = pd.read_csv(infile, sep=',', header=0, encoding='utf-8') 98 | df = df[(df['category']==8) & (df['type']==4)] 99 | df = df[['sku_id']] \ 100 | .groupby('sku_id') \ 101 | .size() \ 102 | .to_frame(name = 'count') \ 103 | .reset_index() \ 104 | .sort_values(['count'], ascending=[False]) \ 105 | .head(TOP_N_SKU) \ 106 | .sort_values(['sku_id'], ascending=[True]) 107 | df.to_csv(outfile, sep=',', index=False, encoding='utf-8') 108 | 109 | def get_brands(infile, outfile): 110 | # keep top TOP_N_BRAND popular brands 111 | df = pd.read_csv(infile, sep=',', header=0, encoding='utf-8') 112 | df = df[['brand']] \ 113 | .groupby('brand') \ 114 | .size() \ 115 | .to_frame(name = 'count') \ 116 | .reset_index() \ 117 | .sort_values(['count'], ascending=[False]) \ 118 | .head(TOP_N_BRAND) \ 119 | .sort_values(['brand'], ascending=[True]) 120 | df.to_csv(outfile, sep=',', index=False, encoding='utf-8') 121 | 122 | def get_model_ids(infile, outfile): 123 | df = pd.read_csv(infile, sep=',', header=0, encoding='utf-8') 124 | df = df[['model_id']] \ 125 | .groupby('model_id') \ 126 | .size() \ 127 | .to_frame(name = 'count') \ 128 | .reset_index() \ 129 | .sort_values(['model_id'], ascending=[True]) 130 | df.to_csv(outfile, sep=',', index=False, encoding='utf-8') 131 | 132 | def get_users(infile, outfile): 133 | df = pd.read_csv(infile, sep=',', header=0, encoding='utf-8') 134 | df = df[['user_id']].drop_duplicates() 135 | df.to_csv(outfile, sep=',', index=False, encoding='utf-8') 136 | 137 | def get_train_labels(user_file, sku_file, master_file, outfile): 138 | ''' 139 | Result: 140 | [ 141 | (202501, [1, 0], 168651, [0,0,0,...1,...0,0]), 142 | (202991, [0, 1], -1, [0,0,0,...0,...0,0]), 143 | ... 144 | ] 145 | ''' 146 | # 1.get all users who have order 147 | df = pd.read_csv(master_file, sep=',', header=0, encoding='utf-8') 148 | # if a user has multiple orders, keep the latest one 149 | df = df[(df['category']==8) & (df['type']==4)] \ 150 | .drop_duplicates(subset='user_id', keep='first') 151 | df = df[['user_id', 'sku_id']] 152 | df['has_order'] = 1 153 | # 2.append to user_list 154 | labels = pd.read_csv(user_file, sep=',', header=0, encoding='utf-8') \ 155 | .merge(df, how='left', on='user_id') 156 | # derive column1 157 | labels['is_positive'] = 0 158 | labels.loc[labels['has_order']>0, 'is_positive'] = 1 159 | # derive column2 160 | labels['is_negative'] = 0 161 | labels.loc[pd.isnull(labels['has_order']), 'is_negative'] = 1 162 | # 3.add one hot encoding for sku list 163 | sku_df = pd.read_csv(sku_file, sep=',', header=0, encoding='utf-8') 164 | sku_list = sku_df['sku_id'].values.tolist() 165 | def get_sku_one_hot_encoding(sku_list, sku_id): 166 | encoding = [0] * len(sku_list) 167 | if sku_id in sku_list: 168 | encoding[sku_list.index(sku_id)] = 1 169 | return encoding 170 | # 4.convert to list 171 | user = labels['user_id'].values.tolist() 172 | label = labels[['is_positive', 'is_negative']].values.tolist() 173 | sku = [-1 if math.isnan(i) else int(i) for i in labels['sku_id'].values.tolist()] 174 | ordered_sku = [get_sku_one_hot_encoding(sku_list, sku_id) for sku_id in sku] 175 | labels = zip(user, label, sku, ordered_sku) 176 | # 5.dump data to pickle 177 | with open(outfile, 'wb') as handle: 178 | pickle.dump(labels, handle, protocol=pickle.HIGHEST_PROTOCOL) 179 | 180 | def count_order_num_per_user(x_file, y_file, out_file): 181 | # count number of previous actions per user before target window 182 | df1 = pd.read_csv(x_file, sep=',', header=0, encoding='utf-8') 183 | df1 = df1[['user_id']] \ 184 | .groupby('user_id') \ 185 | .size() \ 186 | .to_frame(name = 'count_action') \ 187 | .reset_index() \ 188 | .sort_values(['count_action'], ascending=[False]) 189 | 190 | # count number of orders per user in target window 191 | df2 = pd.read_csv(y_file, sep=',', header=0, encoding='utf-8') 192 | df2 = df2[(df2['category']==8) & (df2['type']==4)] 193 | df2 = df2[['user_id']] \ 194 | .groupby('user_id') \ 195 | .size() \ 196 | .to_frame(name = 'count_order') \ 197 | .reset_index() \ 198 | .sort_values(['count_order'], ascending=[False]) 199 | 200 | # count number of previous actions (within 4 weeks) per user before target window 201 | start_dt = datetime.date(2016,3,12) 202 | df_temp = pd.read_csv(x_file, sep=',', header=0, encoding='utf-8') 203 | df_temp['date'] = pd.to_datetime(df_temp['date'], errors='coerce') 204 | df3 = df_temp[(df_temp['date'] >= start_dt)] 205 | df3 = df3[['user_id']] \ 206 | .groupby('user_id') \ 207 | .size() \ 208 | .to_frame(name = 'count_action_28') \ 209 | .reset_index() \ 210 | .sort_values(['count_action_28'], ascending=[False]) 211 | 212 | # merge and save 213 | df = df1.merge(df2, how='left', on='user_id') \ 214 | .merge(df3, how='left', on='user_id') \ 215 | .sort_values(['count_order', 'count_action_28'], ascending=[False, True]) 216 | df.to_csv(out_file, sep=',', index=False, encoding='utf-8') 217 | 218 | def get_event_sequence(infile, outfile, keep_latest_events=200): 219 | ''' 220 | Result: 221 | [ 222 | (200002, array([[seq_500], [seq_499], ..., [seq_1]]), 500) 223 | (200003, array([[seq_36], ..., [seq_1], [fake_seq], ..., [fake_seq]]), 36) 224 | ... 225 | ] 226 | ''' 227 | # 1.reverse the event history and keep latest events for each user 228 | df = pd.read_csv(infile, sep=',', header=0, encoding='utf-8') 229 | df = df.sort_values(['user_id', 'time', 'sku_id', 'type', 'model_id'], ascending=[True, False, False, False, False]) \ 230 | .groupby('user_id') \ 231 | .head(keep_latest_events) 232 | #df.to_csv(MASTER_DATA + '_x_reverse', sep=',', index=False, encoding='utf-8') 233 | 234 | # 2.prepare sequence data 235 | def refactor_seq(seq, max_length): 236 | def padding(list): 237 | length = len(list) 238 | list += [0 for i in range(max_length - length)] 239 | return list 240 | s = [] 241 | feature_num = len(seq[0]) 242 | for i in range(feature_num): 243 | list = [action[i] for action in seq] 244 | list = padding(list) 245 | s += list 246 | return s 247 | 248 | # find the max datetime as observation timestamp 249 | max_timestamp = max(df['time']) 250 | max_timestamp = datetime.datetime.strptime(max_timestamp, '%Y-%m-%d %H:%M:%S') 251 | max_timestamp = int(max_timestamp.strftime('%s')) 252 | 253 | # init lists 254 | data = [] 255 | user = [] 256 | seq = [] 257 | seq_len = [] 258 | last_user_id = '' 259 | 260 | for index, row in df.iterrows(): 261 | this_user_id = row['user_id'] 262 | 263 | # preprocessing feature 264 | sku_id = row['sku_id'] 265 | model_id = int(-1 if np.isnan(row['model_id']) else row['model_id']) 266 | type = row['type'] 267 | category = row['category'] 268 | brand = row['brand'] 269 | a1 = int(0 if np.isnan(row['a1']) else row['a1']) 270 | a2 = int(0 if np.isnan(row['a2']) else row['a2']) 271 | a3 = int(0 if np.isnan(row['a3']) else row['a3']) 272 | timestamp = datetime.datetime.strptime(row['time'], '%Y-%m-%d %H:%M:%S') 273 | timestamp = int(timestamp.strftime('%s')) 274 | till_obs = max_timestamp - timestamp 275 | if last_user_id != this_user_id: # for the very first record 276 | till_next = 9999999 # set it to a very large number, since there's no next action 277 | else: 278 | till_next = next_timestamp - timestamp 279 | next_timestamp = timestamp 280 | 281 | # create feature list 282 | action = [ 283 | sku_id, 284 | model_id, 285 | type, 286 | category, 287 | brand, 288 | a1, 289 | a2, 290 | a3, 291 | till_next, 292 | till_obs, 293 | ] 294 | 295 | if last_user_id == '': 296 | user.append(this_user_id) 297 | seq.append(action) 298 | elif this_user_id == last_user_id: 299 | seq.append(action) 300 | else: 301 | # when meet new user 302 | user.append(this_user_id) 303 | seq_len.append(len(seq[:])) # append last user's seq_len 304 | data.append(refactor_seq(seq[:], keep_latest_events)) # append last user's seq 305 | seq = [] # init seq for the new user 306 | seq.append(action) 307 | last_user_id = this_user_id 308 | # append the last user 309 | seq_len.append(len(seq[:])) 310 | data.append(refactor_seq(seq[:], keep_latest_events)) 311 | 312 | # 3.reshape and transpose 313 | size = len(data) 314 | n_steps = EVENT_LENGTH 315 | n_input = len(data[0]) / n_steps 316 | data = np.array(data).reshape(size, n_input, n_steps) 317 | data = np.transpose(data, (0,2,1)) # transpose of n_input and n_steps 318 | 319 | # 4.zip (user_id, data, seq_len) as data 320 | data = zip(user, data, seq_len) 321 | 322 | # 5.dump data to pickle 323 | with open(outfile, 'wb') as handle: 324 | pickle.dump(data, handle, protocol=pickle.HIGHEST_PROTOCOL) 325 | 326 | # 6.return sequence data 327 | return data 328 | 329 | def split_train_test(data_pkl, labels_pkl, trainset, testset, train_rate=0.7): 330 | # load pickle 331 | data = load_pickle(data_pkl) 332 | labels = load_pickle(labels_pkl) 333 | # shuffle 334 | rows = zip(data, labels) 335 | random.shuffle(rows) 336 | # dump to pickle 337 | cut_point = int(train_rate * len(rows)) 338 | dump_pickle(rows[:cut_point], trainset) 339 | dump_pickle(rows[cut_point:], testset) 340 | # print info 341 | print '> %s users in trainset' % len(rows[:cut_point]) 342 | print '> %s users in testset' % len(rows[cut_point:]) 343 | print '> sample record:' 344 | print rows[:cut_point][0] 345 | 346 | class SequenceData(object): 347 | """ Generate sequence of data with dynamic length. 348 | NOTICE: 349 | We have to pad each sequence to reach 'max_seq_len' for TensorFlow 350 | consistency (we cannot feed a numpy array with inconsistent 351 | dimensions). The dynamic calculation will then be perform thanks to 352 | 'seqlen' attribute that records every actual sequence length. 353 | """ 354 | def __init__(self, dataset, sku_df, brand_df, label_type='order'): 355 | self.user = [data[0] for (data, label) in dataset] 356 | self.data = [data[1] for (data, label) in dataset] 357 | self.seqlen = [data[2] for (data, label) in dataset] 358 | 359 | self.order_label = [label[1] for (data, label) in dataset] 360 | self.sku_label = [label[3] for (data, label) in dataset] 361 | 362 | self.sku_list = sku_df['sku_id'].values.tolist() 363 | self.brand_list = brand_df['brand'].values.tolist() 364 | 365 | self.label_type = label_type 366 | self.batch_id = 0 367 | 368 | # get dataset stats 369 | u, d, s, l = self.next(1) 370 | self.length = len(self.user) 371 | self.n_steps = len(d[0]) 372 | self.n_input = len(d[0][0]) 373 | self.n_classes = len(l[0]) 374 | self.batch_id -= 1 375 | 376 | def transform_seq(self, seq): 377 | ''' 378 | input shape: n_steps * n_input 379 | ''' 380 | def norm_by_value(value, max_value): 381 | return [1.0 * value / max_value] 382 | 383 | def one_hot_encoding(value, value_list): 384 | default_list = [0] * (len(value_list) + 1) 385 | if value in value_list: 386 | default_list[value_list.index(value)] = 1 387 | else: 388 | # the last cell stands for other values 389 | default_list[-1] = 1 390 | return default_list 391 | 392 | def process_model_id(model_id): 393 | if model_id == -1: 394 | return -2 395 | elif model_id == 0: 396 | return -1 397 | else: 398 | return int(math.floor(1.0 * model_id / 100)) 399 | 400 | seq_list = [] 401 | for rec in seq: 402 | sku_id = one_hot_encoding(int(rec[0]), self.sku_list) 403 | model_id = one_hot_encoding(process_model_id(rec[1]), [-1, 0, 1, 2, 3]) 404 | type = one_hot_encoding(int(rec[2]), [1, 2, 3, 4, 5, 6]) 405 | category = one_hot_encoding(int(rec[3]), [4, 5, 6, 7, 8, 9]) 406 | brand = one_hot_encoding(int(rec[4]), self.brand_list) 407 | a1 = one_hot_encoding(int(rec[5]), [-1, 1, 2, 3]) 408 | a2 = one_hot_encoding(int(rec[6]), [-1, 1, 2]) 409 | a3 = one_hot_encoding(int(rec[7]), [-1, 1, 2]) 410 | till_next = norm_by_value(rec[8], 9999999) 411 | till_obs = norm_by_value(rec[9], 9999999) 412 | # norm_rec 413 | norm_rec = sku_id + model_id + type + category + brand + a1 + a2 + a3 + till_next + till_obs 414 | seq_list.append(norm_rec) 415 | return np.array(seq_list) 416 | 417 | def next(self, batch_size): 418 | """ Return a batch of data. When dataset end is reached, start over. 419 | """ 420 | if self.batch_id + batch_size <= len(self.user): 421 | end_cursor = self.batch_id + batch_size 422 | batch_user = self.user[self.batch_id:end_cursor] 423 | batch_data = self.data[self.batch_id:end_cursor] 424 | batch_seqlen = self.seqlen[self.batch_id:end_cursor] 425 | if self.label_type == 'order': 426 | batch_label = self.order_label[self.batch_id:end_cursor] 427 | else: 428 | batch_label = self.sku_label[self.batch_id:end_cursor] 429 | self.batch_id += batch_size 430 | else: 431 | end_cursor = self.batch_id + batch_size - len(self.user) 432 | batch_user = self.user[self.batch_id:] + self.user[:end_cursor] 433 | batch_data = self.data[self.batch_id:] + self.data[:end_cursor] 434 | batch_seqlen = self.seqlen[self.batch_id:] + self.seqlen[:end_cursor] 435 | if self.label_type == 'order': 436 | batch_label = self.order_label[self.batch_id:] + self.order_label[:end_cursor] 437 | else: 438 | batch_label = self.sku_label[self.batch_id:] + self.sku_label[:end_cursor] 439 | self.batch_id = self.batch_id + batch_size - len(self.user) 440 | # do normalization & one-hot-encoding 441 | batch_data = [self.transform_seq(seq) for seq in batch_data] 442 | return batch_user, batch_data, batch_seqlen, batch_label 443 | 444 | def run_rnn(trainset, testset, scoreset, trainset_result, testset_result, scoreset_result, step_file, training_iters=5000000, label_type='order'): 445 | # rnn parameters 446 | learning_rate = 0.01 447 | batch_size = 128 448 | display_step = 100 449 | n_hidden = 64 # hidden layer num of features 450 | 451 | # count input 452 | print '> %s records in trainset' % trainset.length 453 | print '> %s records in testset' % testset.length 454 | print '> %s records in scoreset' % scoreset.length 455 | 456 | # model parameters 457 | n_steps = trainset.n_steps 458 | n_input = trainset.n_input 459 | n_classes = trainset.n_classes 460 | print 'n_steps: %s' % n_steps 461 | print 'n_input: %s' % n_input 462 | print 'n_classes: %s' % n_classes 463 | 464 | # tf Graph input 465 | x = tf.placeholder("float", [None, n_steps, n_input]) 466 | y = tf.placeholder("float", [None, n_classes]) 467 | # A placeholder for indicating each sequence length 468 | seqlen = tf.placeholder(tf.int32, [None]) 469 | 470 | # define weights 471 | weights = { 472 | 'out': tf.Variable(tf.random_normal([n_hidden, n_classes])) 473 | } 474 | biases = { 475 | 'out': tf.Variable(tf.random_normal([n_classes])) 476 | } 477 | 478 | # define RNN model 479 | def dynamicRNN(x, seqlen, weights, biases): 480 | # prepare data shape to match `rnn` function requirements 481 | # current data input shape: (batch_size, n_steps, n_input) 482 | # required shape: `n_steps` tensors list of shape (batch_size, n_input) 483 | 484 | # unstack to get a list of 'n_steps' tensors of shape (batch_size, n_input) 485 | x = tf.unstack(x, n_steps, 1) 486 | 487 | # define a lstm cell with tensorflow 488 | lstm_cell = rnn.BasicLSTMCell(n_hidden) 489 | 490 | # get lstm cell output 491 | outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32, sequence_length=seqlen) 492 | 493 | # when performing dynamic calculation, we must retrieve the last 494 | # dynamically computed output, i.e., if a sequence length is 10, we need 495 | # to retrieve the 10th output. 496 | 497 | # `output` is a list of output at every timestep, we pack them in a tensor 498 | # and change back dimension to [batch_size, n_step, n_hidden] 499 | outputs = tf.stack(outputs) 500 | outputs = tf.transpose(outputs, [1, 0, 2]) 501 | batch_size = tf.shape(outputs)[0] 502 | index = tf.range(0, batch_size) * n_steps + (seqlen-1) 503 | outputs = tf.gather(tf.reshape(outputs, [-1, n_hidden]), index) 504 | 505 | # linear activation, using outputs computed above 506 | return tf.matmul(outputs, weights['out']) + biases['out'] 507 | 508 | def RNN(x, seqlen, weights, biases): 509 | # prepare data shape to match `rnn` function requirements 510 | # current data input shape: (batch_size, n_steps, n_input) 511 | # required shape: `n_steps` tensors list of shape (batch_size, n_input) 512 | 513 | # unstack to get a list of 'n_steps' tensors of shape (batch_size, n_input) 514 | x = tf.unstack(x, n_steps, 1) 515 | 516 | # define a lstm cell with tensorflow 517 | lstm_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0) 518 | 519 | # get lstm cell output 520 | outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32) 521 | 522 | # linear activation, using rnn inner loop last output 523 | return tf.matmul(outputs[-1], weights['out']) + biases['out'] 524 | 525 | pred = dynamicRNN(x, seqlen, weights, biases) 526 | #pred = RNN(x, seqlen, weights, biases) 527 | 528 | # define results 529 | # why use softmax, not sigmoid: just one output unit to fire with a large value 530 | results = tf.nn.softmax(pred, name='results') 531 | 532 | # define loss 533 | cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y)) 534 | 535 | # define optimizer (train step) 536 | optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost) 537 | #optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) 538 | 539 | # evaluate model 540 | correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1)) 541 | accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) 542 | 543 | # initialzing the variables 544 | init = tf.global_variables_initializer() 545 | 546 | # set configuration 547 | config = tf.ConfigProto() 548 | config.gpu_options.allow_growth = True 549 | 550 | # launch the graph 551 | with tf.Session(config=config) as sess: 552 | sess.run(init) 553 | step = 1 554 | 555 | def cal_scores(dataset, batch_size): 556 | # create an empty list to contain output 557 | res = [] 558 | # separate dataset to partitions, to avoid the out-of-memory issue 559 | rec_num = len(dataset.user) 560 | partition_num = int(math.ceil(1.0*rec_num/batch_size)) 561 | # calculate results for each partition 562 | for i in range(partition_num): 563 | user, data, seqlength, label = dataset.next(batch_size) 564 | score = sess.run(results, feed_dict={x: data, y: label, seqlen: seqlength}) 565 | part_res = zip(user, label, score) 566 | res += part_res 567 | # remove duplicated users 568 | uniq_res = [] 569 | user_set = set([]) 570 | for i in res: 571 | user_id = i[0] 572 | if user_id not in user_set: 573 | uniq_res.append(i) 574 | user_set.add(user_id) 575 | return uniq_res 576 | 577 | print('> Start training...') 578 | # keep training until reach max iterations 579 | step_result = [] 580 | while step * batch_size < training_iters: 581 | batch_user, batch_x, batch_seqlen, batch_y = trainset.next(batch_size) 582 | # run optimization op (backprop) 583 | sess.run(optimizer, feed_dict={x: batch_x, y: batch_y, seqlen: batch_seqlen}) 584 | if step % display_step == 0: 585 | # calculate auc 586 | def cal_auc(score_list, label_type): 587 | def get_sku_ind(rec): 588 | sku_ind_list = rec[1] 589 | sku_prob_list = rec[2].tolist() 590 | max_ind_index = sku_ind_list.index(max(sku_ind_list)) 591 | max_prob_index = sku_prob_list.index(max(sku_prob_list)) 592 | if max_ind_index == max_prob_index: 593 | return 1 594 | else: 595 | return 0 596 | 597 | if label_type == 'order': 598 | ind = [i[1][0] for i in score_list] 599 | prob = [i[2][0] for i in score_list] 600 | else: 601 | ind = [get_sku_ind(i) for i in score_list] 602 | prob = [max(i[2].tolist()) for i in score_list] 603 | fpr, tpr, thres = roc_curve(ind, prob, pos_label=1) 604 | return auc(fpr, tpr) 605 | train_auc = cal_auc(cal_scores(trainset, batch_size), label_type=label_type) 606 | test_auc = cal_auc(cal_scores(testset, batch_size), label_type=label_type) 607 | 608 | # calculate batch accuracy 609 | acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y, seqlen: batch_seqlen}) 610 | # calculate batch loss 611 | loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y, seqlen: batch_seqlen}) 612 | print( \ 613 | "Iter %s, " % str(step*batch_size) + \ 614 | "Minibatch Loss %.5f, " % loss + \ 615 | "Training Accuracy %.5f, " % acc + \ 616 | "Training AUC %.5f, " % train_auc + \ 617 | "Test AUC %.5f" % test_auc \ 618 | ) 619 | # save step result 620 | step_result.append((step*batch_size, train_auc, test_auc)) 621 | step += 1 622 | print("Optimization Finished!") 623 | # TODO calculate optimal iteration numbers using step_result 624 | 625 | # save result 626 | dump_pickle(step_result, step_file) 627 | dump_pickle(cal_scores(trainset, batch_size), trainset_result) 628 | dump_pickle(cal_scores(testset, batch_size), testset_result) 629 | dump_pickle(cal_scores(scoreset, batch_size), scoreset_result) 630 | 631 | def get_fake_labels(score_sequence, train_labels, save_file): 632 | fake_label = train_labels[0][1:] 633 | score_labels = [(i[0],) + fake_label for i in score_sequence] 634 | dump_pickle(score_labels, save_file) 635 | 636 | def get_scoreset(score_sequence, score_labels, scoreset): 637 | rows = zip(load_pickle(score_sequence), load_pickle(score_labels)) 638 | dump_pickle(rows, scoreset) 639 | print '> %s users in scoreset' % len(rows) 640 | print '> sample record:' 641 | print rows[0] 642 | 643 | def get_result(user_res, sku_res, sku_file, save_file): 644 | # format user level result 645 | user = [i[0] for i in user_res] 646 | order_ind = [i[1][0] for i in user_res] 647 | order_prob = [i[2][0] for i in user_res] 648 | df1 = pd.DataFrame({ 649 | 'user_id' : user, 650 | 'order_ind' : order_ind, 651 | 'order_prob': order_prob, 652 | }) 653 | 654 | # format sku level result 655 | sku_df = pd.read_csv(sku_file, sep=',', header=0, encoding='utf-8') 656 | sku_list = sku_df['sku_id'].values.tolist() 657 | def get_sku_id(rec): 658 | if 1 in rec: 659 | return sku_list[rec.index(1)] 660 | else: 661 | return -1 662 | def guess_sku_id(rec): 663 | rec = rec.tolist() 664 | max_score = max(rec) 665 | sku_id = sku_list[rec.index(max_score)] 666 | return sku_id, max_score 667 | user = [i[0] for i in sku_res] 668 | sku_order_id = [get_sku_id(i[1]) for i in sku_res] 669 | sku_guess = [guess_sku_id(i[2]) for i in sku_res] 670 | sku_guess_id = [i[0] for i in sku_guess] 671 | sku_guess_score = [i[1] for i in sku_guess] 672 | df2 = pd.DataFrame({ 673 | 'user_id' : user, 674 | 'sku_order_id' : sku_order_id, 675 | 'sku_guess_id' : sku_guess_id, 676 | 'sku_guess_score': sku_guess_score, 677 | }) 678 | def guess_right(row): 679 | if row['sku_order_id'] == row['sku_guess_id']: 680 | return 1 681 | else: 682 | return 0 683 | df2['guess_right'] = df2.apply(lambda row:guess_right(row), axis=1) 684 | 685 | # merge dfs 686 | result = df1.merge(df2, how='left', on='user_id') \ 687 | .sort_values(['order_prob'], ascending=[False]) 688 | dump_pickle(result, save_file) 689 | 690 | def eval_roc(df): 691 | def plot_roc(prob, ind): 692 | # params 693 | lw = 2 694 | # calculate fpr, tpr, auc 695 | fpr, tpr, thres = roc_curve(ind, prob, pos_label=1) 696 | roc_auc = auc(fpr, tpr) 697 | # plot roc curve 698 | plt.figure() 699 | plt.plot(fpr, tpr, color='darkorange', lw=lw, label='ROC Curve (auc=%0.2f)' % roc_auc) 700 | plt.plot([0,1], [0,1], color='navy', lw=lw, linestyle='--') 701 | plt.xlim([0.0, 1.0]) 702 | plt.ylim([0.0, 1.0]) 703 | plt.xlabel('False Positive Rate') 704 | plt.ylabel('True Positive Rate') 705 | plt.legend(loc='lower right') 706 | plt.show() 707 | 708 | # check order prob 709 | prob = df['order_prob'].values.tolist() 710 | ind = df['order_ind'].values.tolist() 711 | print '> Plot order prob ROC (%s records)...' % len(df) 712 | plot_roc(prob, ind) 713 | 714 | # check sku prob 715 | df2 = df[df['sku_order_id'] > 0] \ 716 | .sort_values(['sku_guess_score'], ascending=[False]) 717 | 718 | prob2 = df2['sku_guess_score'].values.tolist() 719 | ind2 = df2['guess_right'].values.tolist() 720 | print '> Plot sku prob ROC (%s records)...' % len(df2) 721 | plot_roc(prob2, ind2) 722 | 723 | def gen_upload_result(trainset, testset, scoreset, save_file, score_file): 724 | def cal_precision_recall(dataset, cutoff): 725 | # select dataset according to cutoff 726 | dataset = dataset.sort_values(['order_prob'], ascending=[False]) 727 | guessset = dataset[dataset['order_prob'] >= cutoff] 728 | # count values 729 | total_order = sum(dataset['order_ind'].values.tolist()) 730 | total_guess = len(guessset) 731 | guess_order_right = max(1, sum(guessset['order_ind'].values.tolist())) 732 | guess_sku_right = max(1, sum([1 for i in guessset['guess_right'].values.tolist() if i > 0.0])) 733 | # calculate precision and recall 734 | if total_guess > 0: 735 | f1_pre = 1.0 * guess_order_right / total_guess 736 | f1_rec = 1.0 * guess_order_right / total_order 737 | f2_pre = 1.0 * guess_sku_right / total_guess 738 | f2_rec = 1.0 * guess_sku_right / total_order 739 | # F1 value 740 | f1 = 6.0 * f1_rec * f1_pre / (5.0 * f1_rec + 1.0 * f1_pre) 741 | f2 = 5.0 * f2_rec * f2_pre / (2.0 * f2_rec + 3.0 * f2_pre) 742 | f = 0.4 * f1 + 0.6 * f2 743 | return f1, f2, f 744 | else: 745 | return 0.0, 0.0, 0.0 746 | 747 | def select_cutoff(results): 748 | cutoff_list = [i[0] for i in results] 749 | f1_list = [i[1] for i in results] 750 | f2_list = [i[2] for i in results] 751 | f_list = [i[3] for i in results] 752 | max_idx = f_list.index(max(f_list)) 753 | optimal_cutoff = cutoff_list[max_idx] 754 | print '> The cutoff is %s, where:' % optimal_cutoff 755 | print '> f1 score: %s' % f1_list[max_idx] 756 | print '> f2 score: %s' % f2_list[max_idx] 757 | print '> f score: %s' % f_list[max_idx] 758 | return optimal_cutoff 759 | 760 | # calculate f score and select optimal cutoff 761 | interval = 0.0001 762 | loop_num = int(1.0 / interval) 763 | results = [] 764 | for i in reversed(range(1, loop_num+1)): 765 | cutoff = 1.0 * i / loop_num 766 | f1, f2, f = cal_precision_recall(testset, cutoff) 767 | results.append((cutoff, f1, f2 ,f)) 768 | optimal_cutoff = select_cutoff(results) 769 | 770 | # generate score file 771 | score_df = scoreset[['user_id', 'order_prob', 'sku_guess_id', 'sku_guess_score']] \ 772 | .sort_values(['order_prob'], ascending=[False]) \ 773 | .to_csv(score_file, sep=',', index=False, encoding='GBK') 774 | 775 | # generate upload file 776 | scoreset = scoreset.sort_values(['order_prob'], ascending=[False]) 777 | scoreset = scoreset[scoreset['order_prob'] >= optimal_cutoff] 778 | scoreset = scoreset[['user_id', 'sku_guess_id']] \ 779 | .rename(columns={'sku_guess_id': 'sku_id'}) 780 | scoreset.to_csv(save_file, sep=',', index=False, encoding='GBK') 781 | 782 | def eval_auc(res_list): 783 | df = pd.DataFrame({ 784 | 'train_auc': [i[1] for i in res_list], 785 | 'test_auc': [i[2] for i in res_list] 786 | }, index=[i[0] for i in res_list]) 787 | df.plot(ylim=(0,1)) 788 | df.head(30).plot(ylim=(0,1)) 789 | plt.show() 790 | 791 | 792 | if __name__ == '__main__': 793 | 794 | # ---------- Data Preparation ---------- # 795 | 796 | # 1.split time window 797 | #separate_time_window(MASTER_DATA, MASTER_DATA_X, MASTER_DATA_Y) # 20min 798 | #get_users(MASTER_DATA_X, USERS) # 2min 799 | #get_skus(MASTER_DATA_Y, SKUS) # 3min 800 | #get_brands(MASTER_DATA_Y, BRANDS) # 3min 801 | 802 | # 2.prepare input sequence 803 | #get_event_sequence(MASTER_DATA_X, TRAIN_SEQUENCE, keep_latest_events=EVENT_LENGTH) # 83min 804 | #get_event_sequence(MASTER_DATA, SCORE_SEQUENCE, keep_latest_events=EVENT_LENGTH) # 89min 805 | 806 | # 3.prepare labels 807 | #get_train_labels(USERS, SKUS, MASTER_DATA_Y, TRAIN_LABELS) # 2min 808 | #get_fake_labels(load_pickle(SCORE_SEQUENCE), load_pickle(TRAIN_LABELS), SCORE_LABELS) # 1min 809 | 810 | # 4.merge input sequence & labels 811 | #split_train_test(TRAIN_SEQUENCE, TRAIN_LABELS, TRAINSET, TESTSET, 0.5) # 2.5min 812 | #get_scoreset(SCORE_SEQUENCE, SCORE_LABELS, SCORESET) # 0.1min 813 | 814 | # ---------- Model Training ---------- # 815 | 816 | # 1.train, test & score at user level 817 | trainset = load_pickle(TRAINSET) 818 | testset = load_pickle(TESTSET) 819 | scoreset = load_pickle(SCORESET) 820 | # create objects 821 | trainset = SequenceData(trainset, load_csv(SKUS), load_csv(BRANDS), label_type='order') 822 | testset = SequenceData(testset, load_csv(SKUS), load_csv(BRANDS), label_type='order') 823 | scoreset = SequenceData(scoreset, load_csv(SKUS), load_csv(BRANDS), label_type='order') 824 | run_rnn(trainset, testset, scoreset, TRAINSET_USER_RESULT, TESTSET_USER_RESULT, SCORESET_USER_RESULT, USER_STEP_RESULT, training_iters=3000000, label_type='order') # 590min for 5000000 iters 825 | 826 | # 2.train, test & score at sku level 827 | ## select users who have orders and the ordered sku_id is in sku list 828 | #trainset = [i for i in load_pickle(TRAINSET) if sum(i[1][3]) > 0] 829 | #testset = [i for i in load_pickle(TESTSET) if sum(i[1][3]) > 0] 830 | #scoreset = [i for i in load_pickle(SCORESET)] 831 | ## create objects 832 | #trainset = SequenceData(trainset, load_csv(SKUS), load_csv(BRANDS), label_type='sku') 833 | #testset = SequenceData(testset, load_csv(SKUS), load_csv(BRANDS), label_type='sku') 834 | #scoreset = SequenceData(scoreset, load_csv(SKUS), load_csv(BRANDS), label_type='sku') 835 | #run_rnn(trainset, testset, scoreset, TRAINSET_SKU_RESULT, TESTSET_SKU_RESULT, SCORESET_SKU_RESULT, SKU_STEP_RESULT, training_iters=210000, label_type='sku') # 217min for 5000000 iters 836 | 837 | # ---------- Model Evaluation ---------- # 838 | 839 | # 1.Merge user level & sku level result 840 | #get_result(load_pickle(TRAINSET_USER_RESULT), load_pickle(TRAINSET_SKU_RESULT), SKUS, TRAINSET_RESULT) 841 | #get_result(load_pickle(TESTSET_USER_RESULT), load_pickle(TESTSET_SKU_RESULT), SKUS, TESTSET_RESULT) 842 | #get_result(load_pickle(SCORESET_USER_RESULT), load_pickle(SCORESET_SKU_RESULT), SKUS, SCORESET_RESULT) 843 | 844 | # 2.Check train & test auc for each step 845 | #eval_auc(load_pickle(USER_STEP_RESULT)) 846 | #eval_auc(load_pickle(SKU_STEP_RESULT)) 847 | 848 | # 3.Check final roc curve 849 | #eval_roc(load_pickle(TRAINSET_RESULT)) 850 | #eval_roc(load_pickle(TESTSET_RESULT)) 851 | 852 | # 4.Select cutoff and generate upload file 853 | #gen_upload_result(load_pickle(TRAINSET_RESULT), load_pickle(TESTSET_RESULT), load_pickle(SCORESET_RESULT), OUTPUT_FILE, SCORE_FILE) 854 | 855 | # ---------- No Longer Needed ---------- # 856 | #count_order_num_per_user(MASTER_DATA_X, MASTER_DATA_Y, PROF_ACTION_NUM) # 5min 857 | #get_model_ids(MASTER_DATA_Y, MODEL_IDS) # 3min 858 | 859 | -------------------------------------------------------------------------------- /run.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import sys 5 | import os 6 | import glob 7 | import numpy as np 8 | import pandas as pd 9 | 10 | # ---------- directories definition ---------- # 11 | MAIN_DIR = os.path.dirname(os.path.abspath(__file__)) 12 | 13 | def get_dir(dir_name): 14 | dir_path = os.path.join(MAIN_DIR, dir_name) 15 | if not os.path.exists(dir_path): 16 | os.makedirs(dir_path) 17 | return dir_path 18 | 19 | DATA_DIR = get_dir('data') 20 | TEMP_DIR = get_dir('temp') 21 | PROF_DIR = get_dir('prof') 22 | 23 | # ---------- datasets definition ---------- # 24 | USER_DATA = os.path.join(DATA_DIR, 'JData_User.csv') 25 | PROD_DATA = os.path.join(DATA_DIR, 'JData_Product.csv') 26 | COMMENT_DATA = os.path.join(DATA_DIR, 'JData_Comment.csv') 27 | ACTION_DATA = os.path.join(DATA_DIR, 'JData_Action_*.csv') 28 | 29 | MASTER_DATA = os.path.join(TEMP_DIR, 'master.csv') 30 | 31 | # ---------- Preprocessing ---------- # 32 | def get_user(): 33 | df = pd.read_csv(USER_DATA, sep=',', header=0, encoding='GBK') 34 | df['user_reg_tm'] = pd.to_datetime(df['user_reg_tm'], errors='coerce') 35 | return df 36 | 37 | def get_prod(): 38 | df = pd.read_csv(PROD_DATA, sep=',', header=0, encoding='GBK') 39 | return df 40 | 41 | def get_comment(): 42 | df = pd.read_csv(COMMENT_DATA, sep=',', header=0, encoding='GBK') 43 | df['dt'] = pd.to_datetime(df['dt'], errors='coerce') 44 | return df 45 | 46 | def get_action(): 47 | files = glob.glob(ACTION_DATA) 48 | dfs = (pd.read_csv(file, sep=',', header=0, encoding='GBK') for file in files) 49 | df = pd.concat(dfs, ignore_index=True) 50 | df['time'] = pd.to_datetime(df['time'], errors='coerce') 51 | df[['user_id']] = df[['user_id']].astype(int) 52 | return df 53 | 54 | # ---------- Profiling ---------- # 55 | def prof_user(): 56 | df = get_user() 57 | output_file = os.path.join(PROF_DIR, 'prof_user.txt') 58 | with open(output_file, 'wb') as f: 59 | orig_stdout = sys.stdout 60 | sys.stdout = f 61 | 62 | print '===== Check user data =====' 63 | 64 | print '\n> Check sample records...' 65 | print df.head(10) 66 | 67 | print '\n> Check column data type...' 68 | print df.dtypes 69 | 70 | print '\n> Count records...' 71 | print len(df) 72 | 73 | print '\n> Count unique user_id...' 74 | print len(df['user_id'].unique()) 75 | 76 | print '\n> Count users by age...' 77 | print df['age'].value_counts(dropna=False) 78 | 79 | print '\n> Count users by sex...' 80 | print df['sex'].value_counts(dropna=False) 81 | 82 | print '\n> Count users by level...' 83 | print df['user_lv_cd'].value_counts(dropna=False) 84 | 85 | print '\n> Count users by reg date...' 86 | print df['user_reg_tm'].value_counts(dropna=False).sort_index() 87 | 88 | sys.stdout = orig_stdout 89 | 90 | def prof_prod(): 91 | df = get_prod() 92 | output_file = os.path.join(PROF_DIR, 'prof_product.txt') 93 | with open(output_file, 'wb') as f: 94 | orig_stdout = sys.stdout 95 | sys.stdout = f 96 | 97 | print '===== Check product data =====' 98 | 99 | print '\n> Check sample records...' 100 | print df.head(10) 101 | 102 | print '\n> Check column data type...' 103 | print df.dtypes 104 | 105 | print '\n> Count records...' 106 | print len(df) 107 | 108 | print '\n> Count unique sku_id...' 109 | print len(df['sku_id'].unique()) 110 | 111 | print '\n> Count products by a1...' 112 | print df['a1'].value_counts(dropna=False) 113 | 114 | print '\n> Count products by a2...' 115 | print df['a2'].value_counts(dropna=False) 116 | 117 | print '\n> Count products by a3...' 118 | print df['a3'].value_counts(dropna=False) 119 | 120 | print '\n> Count products by category...' 121 | print df['cate'].value_counts(dropna=False) 122 | 123 | print '\n> Count products by brand...' 124 | print df['brand'].value_counts(dropna=False) 125 | 126 | sys.stdout = orig_stdout 127 | 128 | def prof_comment(): 129 | df = get_comment() 130 | output_file = os.path.join(PROF_DIR, 'prof_comment.txt') 131 | with open(output_file, 'wb') as f: 132 | orig_stdout = sys.stdout 133 | sys.stdout = f 134 | 135 | print '===== Check comment data =====' 136 | 137 | print '\n> Check sample records...' 138 | print df.head(10) 139 | 140 | print '\n> Check column data type...' 141 | print df.dtypes 142 | 143 | print '\n> Count records...' 144 | print len(df) 145 | 146 | print '\n> Count comments by dt...' 147 | print df['dt'].value_counts(dropna=False).sort_index() 148 | 149 | print '\n> Count unique sku_id...' 150 | print len(df['sku_id'].unique()) 151 | 152 | print '\n> Count records by comment_num...' 153 | print df['comment_num'].value_counts(dropna=False) 154 | 155 | print '\n> Count records by has_bad_comment...' 156 | print df['has_bad_comment'].value_counts(dropna=False) 157 | 158 | print '\n> Count records by bad_comment_rate...' 159 | print df['bad_comment_rate'].value_counts(dropna=False).sort_index() 160 | 161 | sys.stdout = orig_stdout 162 | 163 | def prof_action(): 164 | df = get_action() 165 | output_file = os.path.join(PROF_DIR, 'prof_action.txt') 166 | with open(output_file, 'wb') as f: 167 | orig_stdout = sys.stdout 168 | sys.stdout = f 169 | 170 | print '===== Check action data =====' 171 | 172 | print '\n> Check sample records...' 173 | print df.head(10) 174 | 175 | print '\n> Check column data type...' 176 | print df.dtypes 177 | 178 | print '\n> Count records...' 179 | print len(df) 180 | 181 | print '\n> Count unique user_id...' 182 | print len(df['user_id'].unique()) 183 | 184 | print '\n> Count unique sku_id...' 185 | print len(df['sku_id'].unique()) 186 | 187 | print '\n> Count records by model_id...' 188 | print df['model_id'].value_counts(dropna=False) 189 | 190 | print '\n> Count records by type...' 191 | print df['type'].value_counts(dropna=False) 192 | 193 | print '\n> Count records by category...' 194 | print df['cate'].value_counts(dropna=False) 195 | 196 | print '\n> Count records by brand...' 197 | print df['brand'].value_counts(dropna=False) 198 | 199 | print '\n> Count records by time...' 200 | print df['time'].value_counts(dropna=False).sort_index() 201 | 202 | print '\n> Count unique sku_id (1.used to be ordered; 2.in cate8)...' 203 | print len(df[(df['type']==4) & (df['cate']==8)]['sku_id'].unique()) 204 | 205 | print '\n> Count total orders (1.used to be ordered; 2.in cate8)...' 206 | print len(df[(df['type']==4) & (df['cate']==8)]) 207 | 208 | print '\n> Count total orders by sku_id(1.used to be ordered; 2.in cate8)...' 209 | print df[(df['type']==4) & (df['cate']==8)]['sku_id'].value_counts(dropna=False) 210 | 211 | sys.stdout = orig_stdout 212 | 213 | def get_session(outfile): 214 | print 'add session' 215 | # read action 216 | df = get_action() 217 | # get uniq sorted user_id * time pair 218 | df = df[['user_id', 'time']] \ 219 | .drop_duplicates() \ 220 | .sort_values(['user_id', 'time'], ascending=[True, True]) 221 | # derive session_id 222 | session_num = 1 223 | def get_session_id(r): 224 | global session_num 225 | session_interval = 1800.0 # 30min 226 | time_diff = (r['time'] - r['last_time']) / np.timedelta64(1, 's') 227 | if r['user_id'] != r['last_user']: 228 | session_num = 1 229 | elif time_diff > session_interval: 230 | session_num += 1 231 | return session_num 232 | df['last_time'] = df['time'].shift(1) 233 | df['last_user'] = df['user_id'].shift(1) 234 | df['session_id'] = df.apply(lambda r : get_session_id(r), axis=1) 235 | df = df.drop(['last_time', 'last_user'], axis=1) 236 | # save to file 237 | df.to_csv(outfile, sep=',', index=False, encoding='utf-8') 238 | 239 | def get_master(outfile): 240 | # read inputs 241 | user = get_user() 242 | prod = get_prod() 243 | comment = get_comment() 244 | action = get_action() 245 | 246 | # read session_id 247 | sess = pd.read_csv(MASTER_DATA + '_sess', sep=',', header=0, encoding='GBK') 248 | sess['time'] = pd.to_datetime(sess['time'], errors='coerce') 249 | 250 | # expand comments 251 | start_dt = '2016-02-01' 252 | end_dt = '2016-04-20' 253 | date_range = pd.DataFrame({'date': pd.date_range(start_dt, end_dt).format()}) 254 | date_range['date'] = pd.to_datetime(date_range['date'], errors='coerce') 255 | date_range['week_start'] = date_range['date'].dt.to_period('W').apply(lambda r : r.start_time) 256 | comment = comment.merge(date_range, how='inner', left_on='dt', right_on='week_start') 257 | comment = comment.drop(['week_start', 'dt'], axis=1) 258 | 259 | # merge action, user, product and comment 260 | action['date'] = action['time'].dt.date 261 | action['date'] = pd.to_datetime(action['date'], errors='coerce') 262 | df = action.merge(user, how='left', on='user_id') \ 263 | .merge(prod, how='left', on='sku_id') \ 264 | .merge(comment, how='left', on=['date', 'sku_id']) \ 265 | .merge(sess, how='left', on=['user_id', 'time']) \ 266 | .rename(columns={ 267 | 'cate_x': 'category', 268 | 'brand_x': 'brand', 269 | }) \ 270 | .drop(['cate_y', 'brand_y'], axis=1) \ 271 | .sort_values(['user_id', 'time', 'sku_id', 'type', 'model_id'], ascending=[True, True, True, True, True]) 272 | 273 | # save to file 274 | df.to_csv(outfile, sep=',', index=False, encoding='utf-8') 275 | 276 | def get_train_input(): 277 | # read master table 278 | #df = pd.read_csv(MASTER_DATA, sep=',', header=0, encoding='utf-8', nrows=30000) #TODO 279 | df = pd.read_csv(MASTER_DATA, sep=',', header=0, encoding='utf-8') 280 | 281 | # change column type 282 | df['time'] = pd.to_datetime(df['time'], errors='coerce') 283 | df['date'] = pd.to_datetime(df['date'], errors='coerce') 284 | df['user_reg_tm'] = pd.to_datetime(df['user_reg_tm'], errors='coerce') 285 | 286 | return df 287 | 288 | if __name__ == '__main__': 289 | #prof_user() 290 | #prof_prod() 291 | #prof_comment() 292 | #prof_action() 293 | #get_session(MASTER_DATA + '_sess') 294 | get_master(MASTER_DATA) 295 | #get_train_input() 296 | 297 | -------------------------------------------------------------------------------- /signal_generation/MASTER_creation_v1.sql: -------------------------------------------------------------------------------- 1 | /* 2 | * @Author: han.jiang 3 | * @Date: 2017-04-28 21:38:44 4 | * @Last Modified by: han.jiang 5 | * @Last Modified time: 2017-05-09 11:48:15 6 | */ 7 | 8 | SELECT COUNT() FROM JDATA_USER_TBL; 9 | -- 105321 10 | SELECT COUNT() FROM JDATA_PRODUCT_TBL; 11 | -- 24187 12 | SELECT COUNT() FROM JDATA_COMMENT_TBL; 13 | -- 558552 14 | SELECT COUNT() FROM JDATA_ACTION_TBL; 15 | -- 50601736 16 | 17 | 18 | 19 | -- master table for model building 20 | drop table if exists MASTER_0201_0408; 21 | create table MASTER_0201_0408 as 22 | select 23 | A.* 24 | ,B.* 25 | ,C.* 26 | from LABEL_0409_0413 as A 27 | left join 28 | SKU_MASTER_0201_0408 AS B 29 | ON A.SKU_ID = B.SKU_ID 30 | left join USER_ACTION_0201_0408 as C 31 | ON A.SKU_ID = C.SKU_ID 32 | left join 33 | ; 34 | 35 | select count() from SKU_MASTER_0201_0410; 36 | -- 27907 37 | 38 | 39 | -- master table for model scoring 40 | drop table if exists MASTER_0201_0415; 41 | create table MASTER_0201_0415 as 42 | 43 | -------------------------------------------------------------------------------- /signal_generation/create_tbl_server.sql: -------------------------------------------------------------------------------- 1 | /* 2 | * @Author: shu.wen 3 | * @Date: 2017-04-28 21:38:44 4 | * @Last Modified by: shu.wen 5 | * @Last Modified time: 2017-05-05 01:15:11 6 | */ 7 | 8 | sqlite3.exe JData.db 9 | 10 | .show 11 | .headers on 12 | .separator "," 13 | 14 | 15 | -- 客户数据 16 | DROP TABLE IF EXISTS JDATA_USER_TBL; 17 | CREATE TABLE JDATA_USER_TBL ( 18 | USER_ID INTEGER, 19 | AGE TEXT, 20 | SEX TEXT, 21 | USER_LV_CD INTEGER, 22 | USER_REG_TM DATE 23 | ); 24 | 25 | 26 | -- 商品属性数据 27 | DROP TABLE IF EXISTS JDATA_PRODUCT_TBL; 28 | CREATE TABLE JDATA_PRODUCT_TBL ( 29 | SKU_ID INTEGER, 30 | A1 TEXT, 31 | A2 TEXT, 32 | A3 TEXT, 33 | CATE INTEGER, 34 | BRAND INTEGER 35 | ); 36 | 37 | 38 | 39 | 40 | -- 产品评论数据 41 | DROP TABLE IF EXISTS JDATA_COMMENT_TBL; 42 | CREATE TABLE JDATA_COMMENT_TBL ( 43 | DT DATE, 44 | SKU_ID INTEGER, 45 | COMMENT_NUM INTEGER, 46 | HAS_BAD_COMMENT INTEGER, 47 | BAD_COMMENT_RATE REAL 48 | 49 | ); 50 | 51 | 52 | 53 | 54 | --行为数据 55 | DROP TABLE IF EXISTS JDATA_ACTION_TBL ; 56 | CREATE TABLE JDATA_ACTION_TBL ( 57 | USER_ID INTEGER, 58 | SKU_ID INTEGER, 59 | TIME DATETIME, 60 | MODEL_ID INTEGER, 61 | TYPE INTEGER, 62 | CATE INTEGER, 63 | BRAND INTEGER 64 | ); 65 | 66 | 67 | 68 | 69 | .import /root/data/JData_User_no_header.csv JDATA_USER_TBL 70 | .import /root/data/JData_Product_no_header.csv JDATA_PRODUCT_TBL 71 | .import /root/data/JData_Comment_no_header.csv JDATA_COMMENT_TBL 72 | .import /root/data/JData_Action_All_no_header.csv JDATA_ACTION_TBL 73 | 74 | 75 | SELECT COUNT() FROM JDATA_USER_TBL; 76 | -- 210642 77 | SELECT COUNT() FROM JDATA_PRODUCT_TBL; 78 | -- 24187 79 | SELECT COUNT() FROM JDATA_COMMENT_TBL; 80 | -- 558552 81 | SELECT COUNT() FROM JDATA_ACTION_TBL; 82 | -- 50601736 83 | 84 | 85 | -- 准备训练集/测试集 X Y 86 | 87 | ---- Y: SKU CATE 88 | DROP TABLE IF EXISTS JDATA_LABEL_SKU_0304_TBL; 89 | CREATE TABLE JDATA_LABEL_SKU_0304_TBL AS 90 | SELECT 91 | USER_ID 92 | , SKU_ID 93 | , MAX(CASE WHEN TYPE = 4 THEN 1 ELSE 0 END) AS Y 94 | FROM ( 95 | SELECT 96 | * 97 | FROM JDATA_ACTION_TBL 98 | WHERE 99 | 1=1 100 | AND TIME >= DATE('2016-04-01') 101 | AND TIME < DATE('2016-04-06') 102 | AND CATE = '8' 103 | AND TYPE = '4' 104 | ) 105 | GROUP BY 106 | USER_ID 107 | , SKU_ID 108 | ; 109 | SELECT COUNT() FROM JDATA_LABEL_SKU_0304_TBL; 110 | -- 934 111 | 112 | 113 | DROP TABLE IF EXISTS JDATA_LABEL_CATE_0304_TBL; 114 | CREATE TABLE JDATA_LABEL_CATE_0304_TBL AS 115 | SELECT 116 | USER_ID 117 | , MAX(Y) AS Y 118 | FROM JDATA_LABEL_SKU_0304_TBL 119 | GROUP BY 120 | USER_ID 121 | ; 122 | SELECT COUNT() FROM JDATA_LABEL_CATE_0304_TBL; 123 | -- 920 124 | 125 | 126 | 127 | ---- X 128 | DROP TABLE IF EXISTS JDATA_ACTION_0304_TBL; 129 | CREATE TABLE JDATA_ACTION_0304_TBL AS 130 | SELECT 131 | USER_ID 132 | , SKU_ID 133 | , TIME 134 | , MODEL_ID 135 | , TYPE 136 | , CATE 137 | , BRAND 138 | FROM JDATA_ACTION_TBL 139 | WHERE 140 | 1=1 141 | AND TIME >= DATE('2016-03-01') 142 | AND TIME < DATE('2016-04-01') 143 | ; 144 | SELECT COUNT() FROM JDATA_ACTION_0304_TBL; 145 | -- 25916191 146 | 147 | 148 | -- 客户在观察窗口期间所有对第8类的各种行为 149 | DROP TABLE IF EXISTS JDATA_X_CATE_ACTION_0304_TBL; 150 | CREATE TABLE JDATA_X_CATE_ACTION_0304_TBL AS 151 | SELECT 152 | USER_ID 153 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and TYPE = "1" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_1_CNT_3 154 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and TYPE = "2" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_2_CNT_3 155 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and TYPE = "3" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_3_CNT_3 156 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and TYPE = "4" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_4_CNT_3 157 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and TYPE = "5" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_5_CNT_3 158 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and TYPE = "6" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_6_CNT_3 159 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and TYPE = "1" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_1_RATE_3 160 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and TYPE = "2" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_2_RATE_3 161 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and TYPE = "3" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_3_RATE_3 162 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and TYPE = "4" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_4_RATE_3 163 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and TYPE = "5" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_5_RATE_3 164 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and TYPE = "6" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_6_RATE_3 165 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%w', TIME) = '0' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_00_CNT_3 166 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%w', TIME) = '1' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_01_CNT_3 167 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%w', TIME) = '2' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_02_CNT_3 168 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%w', TIME) = '3' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_03_CNT_3 169 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%w', TIME) = '4' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_04_CNT_3 170 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%w', TIME) = '5' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_05_CNT_3 171 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%w', TIME) = '6' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_06_CNT_3 172 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%w', TIME) = '0' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_00_RATE_3 173 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%w', TIME) = '1' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_01_RATE_3 174 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%w', TIME) = '2' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_02_RATE_3 175 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%w', TIME) = '3' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_03_RATE_3 176 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%w', TIME) = '4' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_04_RATE_3 177 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%w', TIME) = '5' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_05_RATE_3 178 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%w', TIME) = '6' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_06_RATE_3 179 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%H', TIME) in ('02', '03', '04', '05', '06', '07', '08' ) THEN 1 ELSE 0 END) AS USER_CATE_8_HOUR_02_08_CNT_3 180 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%H', TIME) in ('09', '10', '11', '12' ) THEN 1 ELSE 0 END) AS USER_CATE_8_HOUR_09_12_CNT_3 181 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%H', TIME) in ('13', '14', '15', '16', '17', '18' ) THEN 1 ELSE 0 END) AS USER_CATE_8_HOUR_13_18_CNT_3 182 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%H', TIME) in ('19', '20', '21', '22', '23', '00', '01' ) THEN 1 ELSE 0 END) AS USER_CATE_8_HOUR_19_01_CNT_3 183 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%H', TIME) in ('02', '03', '04', '05', '06', '07', '08' ) THEN 1 ELSE 0 END) *1.0 / COUNT() AS USER_CATE_8_HOUR_02_08_RATE_3 184 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%H', TIME) in ('09', '10', '11', '12' ) THEN 1 ELSE 0 END) *1.0 / COUNT() AS USER_CATE_8_HOUR_09_12_RATE_3 185 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%H', TIME) in ('13', '14', '15', '16', '17', '18' ) THEN 1 ELSE 0 END) *1.0 / COUNT() AS USER_CATE_8_HOUR_13_18_RATE_3 186 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 3) and STRFTIME('%H', TIME) in ('19', '20', '21', '22', '23', '00', '01' ) THEN 1 ELSE 0 END) *1.0 / COUNT() AS USER_CATE_8_HOUR_19_01_RATE_3 187 | 188 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and TYPE = "1" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_1_CNT_7 189 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and TYPE = "2" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_2_CNT_7 190 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and TYPE = "3" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_3_CNT_7 191 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and TYPE = "4" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_4_CNT_7 192 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and TYPE = "5" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_5_CNT_7 193 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and TYPE = "6" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_6_CNT_7 194 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and TYPE = "1" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_1_RATE_7 195 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and TYPE = "2" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_2_RATE_7 196 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and TYPE = "3" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_3_RATE_7 197 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and TYPE = "4" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_4_RATE_7 198 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and TYPE = "5" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_5_RATE_7 199 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and TYPE = "6" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_6_RATE_7 200 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%w', TIME) = '0' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_00_CNT_7 201 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%w', TIME) = '1' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_01_CNT_7 202 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%w', TIME) = '2' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_02_CNT_7 203 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%w', TIME) = '3' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_03_CNT_7 204 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%w', TIME) = '4' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_04_CNT_7 205 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%w', TIME) = '5' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_05_CNT_7 206 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%w', TIME) = '6' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_06_CNT_7 207 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%w', TIME) = '0' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_00_RATE_7 208 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%w', TIME) = '1' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_01_RATE_7 209 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%w', TIME) = '2' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_02_RATE_7 210 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%w', TIME) = '3' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_03_RATE_7 211 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%w', TIME) = '4' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_04_RATE_7 212 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%w', TIME) = '5' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_05_RATE_7 213 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%w', TIME) = '6' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_06_RATE_7 214 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%H', TIME) in ('02', '03', '04', '05', '06', '07', '08' ) THEN 1 ELSE 0 END) AS USER_CATE_8_HOUR_02_08_CNT_7 215 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%H', TIME) in ('09', '10', '11', '12' ) THEN 1 ELSE 0 END) AS USER_CATE_8_HOUR_09_12_CNT_7 216 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%H', TIME) in ('13', '14', '15', '16', '17', '18' ) THEN 1 ELSE 0 END) AS USER_CATE_8_HOUR_13_18_CNT_7 217 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%H', TIME) in ('19', '20', '21', '22', '23', '00', '01' ) THEN 1 ELSE 0 END) AS USER_CATE_8_HOUR_19_01_CNT_7 218 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%H', TIME) in ('02', '03', '04', '05', '06', '07', '08' ) THEN 1 ELSE 0 END) *1.0 / COUNT() AS USER_CATE_8_HOUR_02_08_RATE_7 219 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%H', TIME) in ('09', '10', '11', '12' ) THEN 1 ELSE 0 END) *1.0 / COUNT() AS USER_CATE_8_HOUR_09_12_RATE_7 220 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%H', TIME) in ('13', '14', '15', '16', '17', '18' ) THEN 1 ELSE 0 END) *1.0 / COUNT() AS USER_CATE_8_HOUR_13_18_RATE_7 221 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 7) and STRFTIME('%H', TIME) in ('19', '20', '21', '22', '23', '00', '01' ) THEN 1 ELSE 0 END) *1.0 / COUNT() AS USER_CATE_8_HOUR_19_01_RATE_7 222 | 223 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and TYPE = "1" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_1_CNT_15 224 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and TYPE = "2" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_2_CNT_15 225 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and TYPE = "3" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_3_CNT_15 226 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and TYPE = "4" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_4_CNT_15 227 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and TYPE = "5" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_5_CNT_15 228 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and TYPE = "6" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_6_CNT_15 229 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and TYPE = "1" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_1_RATE_15 230 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and TYPE = "2" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_2_RATE_15 231 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and TYPE = "3" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_3_RATE_15 232 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and TYPE = "4" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_4_RATE_15 233 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and TYPE = "5" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_5_RATE_15 234 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and TYPE = "6" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_6_RATE_15 235 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%w', TIME) = '0' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_00_CNT_15 236 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%w', TIME) = '1' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_01_CNT_15 237 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%w', TIME) = '2' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_02_CNT_15 238 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%w', TIME) = '3' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_03_CNT_15 239 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%w', TIME) = '4' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_04_CNT_15 240 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%w', TIME) = '5' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_05_CNT_15 241 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%w', TIME) = '6' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_06_CNT_15 242 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%w', TIME) = '0' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_00_RATE_15 243 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%w', TIME) = '1' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_01_RATE_15 244 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%w', TIME) = '2' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_02_RATE_15 245 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%w', TIME) = '3' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_03_RATE_15 246 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%w', TIME) = '4' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_04_RATE_15 247 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%w', TIME) = '5' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_05_RATE_15 248 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%w', TIME) = '6' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_06_RATE_15 249 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%H', TIME) in ('02', '03', '04', '05', '06', '07', '08' ) THEN 1 ELSE 0 END) AS USER_CATE_8_HOUR_02_08_CNT_15 250 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%H', TIME) in ('09', '10', '11', '12' ) THEN 1 ELSE 0 END) AS USER_CATE_8_HOUR_09_12_CNT_15 251 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%H', TIME) in ('13', '14', '15', '16', '17', '18' ) THEN 1 ELSE 0 END) AS USER_CATE_8_HOUR_13_18_CNT_15 252 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%H', TIME) in ('19', '20', '21', '22', '23', '00', '01' ) THEN 1 ELSE 0 END) AS USER_CATE_8_HOUR_19_01_CNT_15 253 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%H', TIME) in ('02', '03', '04', '05', '06', '07', '08' ) THEN 1 ELSE 0 END) *1.0 / COUNT() AS USER_CATE_8_HOUR_02_08_RATE_15 254 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%H', TIME) in ('09', '10', '11', '12' ) THEN 1 ELSE 0 END) *1.0 / COUNT() AS USER_CATE_8_HOUR_09_12_RATE_15 255 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%H', TIME) in ('13', '14', '15', '16', '17', '18' ) THEN 1 ELSE 0 END) *1.0 / COUNT() AS USER_CATE_8_HOUR_13_18_RATE_15 256 | , SUM(CASE WHEN (JULIANDAY('2016-04-01') - JULIANDAY(TIME) <= 15) and STRFTIME('%H', TIME) in ('19', '20', '21', '22', '23', '00', '01' ) THEN 1 ELSE 0 END) *1.0 / COUNT() AS USER_CATE_8_HOUR_19_01_RATE_15 257 | 258 | 259 | , SUM(CASE WHEN TYPE = "1" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_1_CNT 260 | , SUM(CASE WHEN TYPE = "2" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_2_CNT 261 | , SUM(CASE WHEN TYPE = "3" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_3_CNT 262 | , SUM(CASE WHEN TYPE = "4" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_4_CNT 263 | , SUM(CASE WHEN TYPE = "5" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_5_CNT 264 | , SUM(CASE WHEN TYPE = "6" THEN 1 ELSE 0 END) AS USER_CATE_8_TYPE_6_CNT 265 | , SUM(CASE WHEN TYPE = "1" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_1_RATE 266 | , SUM(CASE WHEN TYPE = "2" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_2_RATE 267 | , SUM(CASE WHEN TYPE = "3" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_3_RATE 268 | , SUM(CASE WHEN TYPE = "4" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_4_RATE 269 | , SUM(CASE WHEN TYPE = "5" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_5_RATE 270 | , SUM(CASE WHEN TYPE = "6" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_TYPE_6_RATE 271 | , SUM(CASE WHEN STRFTIME('%w', TIME) = '0' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_00_CNT 272 | , SUM(CASE WHEN STRFTIME('%w', TIME) = '1' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_01_CNT 273 | , SUM(CASE WHEN STRFTIME('%w', TIME) = '2' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_02_CNT 274 | , SUM(CASE WHEN STRFTIME('%w', TIME) = '3' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_03_CNT 275 | , SUM(CASE WHEN STRFTIME('%w', TIME) = '4' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_04_CNT 276 | , SUM(CASE WHEN STRFTIME('%w', TIME) = '5' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_05_CNT 277 | , SUM(CASE WHEN STRFTIME('%w', TIME) = '6' THEN 1 ELSE 0 END) AS USER_CATE_8_WEEK_06_CNT 278 | , SUM(CASE WHEN STRFTIME('%w', TIME) = '0' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_00_RATE 279 | , SUM(CASE WHEN STRFTIME('%w', TIME) = '1' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_01_RATE 280 | , SUM(CASE WHEN STRFTIME('%w', TIME) = '2' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_02_RATE 281 | , SUM(CASE WHEN STRFTIME('%w', TIME) = '3' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_03_RATE 282 | , SUM(CASE WHEN STRFTIME('%w', TIME) = '4' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_04_RATE 283 | , SUM(CASE WHEN STRFTIME('%w', TIME) = '5' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_05_RATE 284 | , SUM(CASE WHEN STRFTIME('%w', TIME) = '6' THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_CATE_8_WEEK_06_RATE 285 | , SUM(CASE WHEN STRFTIME('%H', TIME) in ('02', '03', '04', '05', '06', '07', '08' ) THEN 1 ELSE 0 END) AS USER_CATE_8_HOUR_02_08_CNT 286 | , SUM(CASE WHEN STRFTIME('%H', TIME) in ('09', '10', '11', '12' ) THEN 1 ELSE 0 END) AS USER_CATE_8_HOUR_09_12_CNT 287 | , SUM(CASE WHEN STRFTIME('%H', TIME) in ('13', '14', '15', '16', '17', '18' ) THEN 1 ELSE 0 END) AS USER_CATE_8_HOUR_13_18_CNT 288 | , SUM(CASE WHEN STRFTIME('%H', TIME) in ('19', '20', '21', '22', '23', '00', '01' ) THEN 1 ELSE 0 END) AS USER_CATE_8_HOUR_19_01_CNT 289 | , SUM(CASE WHEN STRFTIME('%H', TIME) in ('02', '03', '04', '05', '06', '07', '08' ) THEN 1 ELSE 0 END) *1.0 / COUNT() AS USER_CATE_8_HOUR_02_08_RATE 290 | , SUM(CASE WHEN STRFTIME('%H', TIME) in ('09', '10', '11', '12' ) THEN 1 ELSE 0 END) *1.0 / COUNT() AS USER_CATE_8_HOUR_09_12_RATE 291 | , SUM(CASE WHEN STRFTIME('%H', TIME) in ('13', '14', '15', '16', '17', '18' ) THEN 1 ELSE 0 END) *1.0 / COUNT() AS USER_CATE_8_HOUR_13_18_RATE 292 | , SUM(CASE WHEN STRFTIME('%H', TIME) in ('19', '20', '21', '22', '23', '00', '01' ) THEN 1 ELSE 0 END) *1.0 / COUNT() AS USER_CATE_8_HOUR_19_01_RATE 293 | 294 | 295 | FROM JDATA_ACTION_0304_TBL 296 | WHERE 297 | 1=1 298 | AND CATE = '8' 299 | GROUP BY 300 | USER_ID 301 | ; 302 | SELECT COUNT() FROM JDATA_X_CATE_ACTION_0304_TBL; 303 | -- 88808 304 | 305 | 306 | 307 | -- 客户在观察窗口期间所有品类和行为汇总 308 | DROP TABLE IF EXISTS JDATA_X_CATE_USER_TYPE_HIST_0304_TBL ; 309 | CREATE TABLE JDATA_X_CATE_USER_TYPE_HIST_0304_TBL AS 310 | SELECT 311 | USER_ID 312 | , SUM(CASE WHEN TYPE = "1" THEN 1 ELSE 0 END) AS USER_TYPE_1_CNT 313 | , SUM(CASE WHEN TYPE = "2" THEN 1 ELSE 0 END) AS USER_TYPE_2_CNT 314 | , SUM(CASE WHEN TYPE = "3" THEN 1 ELSE 0 END) AS USER_TYPE_3_CNT 315 | , SUM(CASE WHEN TYPE = "4" THEN 1 ELSE 0 END) AS USER_TYPE_4_CNT 316 | , SUM(CASE WHEN TYPE = "5" THEN 1 ELSE 0 END) AS USER_TYPE_5_CNT 317 | , SUM(CASE WHEN TYPE = "6" THEN 1 ELSE 0 END) AS USER_TYPE_6_CNT 318 | 319 | , SUM(CASE WHEN TYPE = "1" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_TYPE_1_RATE 320 | , SUM(CASE WHEN TYPE = "2" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_TYPE_2_RATE 321 | , SUM(CASE WHEN TYPE = "3" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_TYPE_3_RATE 322 | , SUM(CASE WHEN TYPE = "4" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_TYPE_4_RATE 323 | , SUM(CASE WHEN TYPE = "5" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_TYPE_5_RATE 324 | , SUM(CASE WHEN TYPE = "6" THEN 1 ELSE 0 END)*1.0 / COUNT() AS USER_TYPE_6_RATE 325 | 326 | , SUM(CASE WHEN TYPE = "4" THEN 1 ELSE 0 END)*1.0 / SUM(CASE WHEN TYPE = "1" THEN 1 ELSE 0 END) AS USER_TYPE_1_4_RATE 327 | , SUM(CASE WHEN TYPE = "4" THEN 1 ELSE 0 END)*1.0 / SUM(CASE WHEN TYPE = "2" THEN 1 ELSE 0 END) AS USER_TYPE_2_4_RATE 328 | , SUM(CASE WHEN TYPE = "4" THEN 1 ELSE 0 END)*1.0 / SUM(CASE WHEN TYPE = "3" THEN 1 ELSE 0 END) AS USER_TYPE_3_4_RATE 329 | , SUM(CASE WHEN TYPE = "4" THEN 1 ELSE 0 END)*1.0 / SUM(CASE WHEN TYPE = "5" THEN 1 ELSE 0 END) AS USER_TYPE_5_4_RATE 330 | , SUM(CASE WHEN TYPE = "4" THEN 1 ELSE 0 END)*1.0 / SUM(CASE WHEN TYPE = "6" THEN 1 ELSE 0 END) AS USER_TYPE_6_4_RATE 331 | 332 | FROM JDATA_ACTION_0304_TBL 333 | GROUP BY 334 | USER_ID 335 | ; 336 | 337 | SELECT COUNT() FROM JDATA_X_CATE_USER_TYPE_HIST_0304_TBL; 338 | -- 96087 339 | 340 | 341 | 342 | -- MST TBL 343 | DROP TABLE IF EXISTS JDATA_X_CATE_MST_0304_TBL ; 344 | CREATE TABLE JDATA_X_CATE_MST_0304_TBL AS 345 | SELECT 346 | A.* 347 | , B.USER_TYPE_1_CNT 348 | , B.USER_TYPE_2_CNT 349 | , B.USER_TYPE_3_CNT 350 | , B.USER_TYPE_4_CNT 351 | , B.USER_TYPE_5_CNT 352 | , B.USER_TYPE_6_CNT 353 | , B.USER_TYPE_1_RATE 354 | , B.USER_TYPE_2_RATE 355 | , B.USER_TYPE_3_RATE 356 | , B.USER_TYPE_4_RATE 357 | , B.USER_TYPE_5_RATE 358 | , B.USER_TYPE_6_RATE 359 | , B.USER_TYPE_1_4_RATE 360 | , B.USER_TYPE_2_4_RATE 361 | , B.USER_TYPE_3_4_RATE 362 | , B.USER_TYPE_5_4_RATE 363 | , B.USER_TYPE_6_4_RATE 364 | , C.AGE 365 | , C.SEX 366 | , C.USER_LV_CD 367 | , CAST(ROUND((JULIANDAY('2016-04-01') - JULIANDAY(C.USER_REG_TM))/30) AS INT) AS USER_REG_M_CNT 368 | , CASE WHEN E.Y = 1 THEN 1 ELSE 0 END AS Y 369 | 370 | FROM JDATA_X_CATE_ACTION_0304_TBL A 371 | LEFT JOIN JDATA_X_CATE_USER_TYPE_HIST_0304_TBL B 372 | ON A.USER_ID = B.USER_ID 373 | LEFT JOIN JDATA_USER_TBL C 374 | ON A.USER_ID = C.USER_ID 375 | LEFT JOIN JDATA_LABEL_CATE_0304_TBL E 376 | ON A.USER_ID = E.USER_ID 377 | ; 378 | 379 | SELECT COUNT() FROM JDATA_X_CATE_MST_0304_TBL; 380 | -- 88808 381 | 382 | .output JDATA_X_CATE_MST_0304.csv 383 | SELECT * FROM JDATA_X_CATE_MST_0304_TBL ; 384 | .output stdout 385 | 386 | 387 | 388 | 389 | ------------------------------ sku level 宽表 ------------------------------------------------------------------- 390 | 391 | -- SKU_ID在观察窗口期间所有品类和行为汇总 392 | DROP TABLE IF EXISTS JDATA_X_CATE_SKU_TYPE_HIST_0304_TBL ; 393 | CREATE TABLE JDATA_X_CATE_SKU_TYPE_HIST_0304_TBL AS 394 | SELECT 395 | SKU_ID 396 | , SUM(CASE WHEN TYPE = "1" THEN 1 ELSE 0 END) AS SKU_TYPE_1_CNT 397 | , SUM(CASE WHEN TYPE = "2" THEN 1 ELSE 0 END) AS SKU_TYPE_2_CNT 398 | , SUM(CASE WHEN TYPE = "3" THEN 1 ELSE 0 END) AS SKU_TYPE_3_CNT 399 | , SUM(CASE WHEN TYPE = "4" THEN 1 ELSE 0 END) AS SKU_TYPE_4_CNT 400 | , SUM(CASE WHEN TYPE = "5" THEN 1 ELSE 0 END) AS SKU_TYPE_5_CNT 401 | , SUM(CASE WHEN TYPE = "6" THEN 1 ELSE 0 END) AS SKU_TYPE_6_CNT 402 | 403 | , SUM(CASE WHEN TYPE = "1" THEN 1 ELSE 0 END)*1.0 / COUNT() AS SKU_TYPE_1_RATE 404 | , SUM(CASE WHEN TYPE = "2" THEN 1 ELSE 0 END)*1.0 / COUNT() AS SKU_TYPE_2_RATE 405 | , SUM(CASE WHEN TYPE = "3" THEN 1 ELSE 0 END)*1.0 / COUNT() AS SKU_TYPE_3_RATE 406 | , SUM(CASE WHEN TYPE = "4" THEN 1 ELSE 0 END)*1.0 / COUNT() AS SKU_TYPE_4_RATE 407 | , SUM(CASE WHEN TYPE = "5" THEN 1 ELSE 0 END)*1.0 / COUNT() AS SKU_TYPE_5_RATE 408 | , SUM(CASE WHEN TYPE = "6" THEN 1 ELSE 0 END)*1.0 / COUNT() AS SKU_TYPE_6_RATE 409 | 410 | , SUM(CASE WHEN TYPE = "4" THEN 1 ELSE 0 END)*1.0 / SUM(CASE WHEN TYPE = "1" THEN 1 ELSE 0 END) AS SKU_TYPE_1_4_RATE 411 | , SUM(CASE WHEN TYPE = "4" THEN 1 ELSE 0 END)*1.0 / SUM(CASE WHEN TYPE = "2" THEN 1 ELSE 0 END) AS SKU_TYPE_2_4_RATE 412 | , SUM(CASE WHEN TYPE = "4" THEN 1 ELSE 0 END)*1.0 / SUM(CASE WHEN TYPE = "3" THEN 1 ELSE 0 END) AS SKU_TYPE_3_4_RATE 413 | , SUM(CASE WHEN TYPE = "4" THEN 1 ELSE 0 END)*1.0 / SUM(CASE WHEN TYPE = "5" THEN 1 ELSE 0 END) AS SKU_TYPE_5_4_RATE 414 | , SUM(CASE WHEN TYPE = "4" THEN 1 ELSE 0 END)*1.0 / SUM(CASE WHEN TYPE = "6" THEN 1 ELSE 0 END) AS SKU_TYPE_6_4_RATE 415 | 416 | FROM JDATA_ACTION_0304_TBL 417 | GROUP BY 418 | SKU_ID 419 | ; 420 | SELECT COUNT() FROM JDATA_X_CATE_SKU_TYPE_HIST_0304_TBL; 421 | -- 23753 422 | 423 | 424 | 425 | 426 | -- SKU_ID在观察窗口期间 comment 427 | DROP TABLE IF EXISTS JDATA_X_SKU_COMMENT_0304_TBL ; 428 | CREATE TABLE JDATA_X_SKU_COMMENT_0304_TBL AS 429 | SELECT 430 | SKU_ID 431 | , COMMENT_NUM 432 | , HAS_BAD_COMMENT 433 | , BAD_COMMENT_RATE 434 | FROM JDATA_COMMENT_TBL 435 | WHERE 436 | 1=1 437 | AND DT = DATE('2016-03-28') 438 | ; 439 | SELECT COUNT() FROM JDATA_X_SKU_COMMENT_0304_TBL; 440 | -- 46546 441 | -------------------------------------------------------------------------------- /signal_generation/file_merging_master_for_application.py: -------------------------------------------------------------------------------- 1 | #!usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import pandas as pd 5 | 6 | mst = pd.read_csv('/root/data/USER_LIST_0416_0420.csv', sep=",") 7 | sku_master = pd.read_csv('/root/data/SKU_MASTER_0201_0415.csv', sep=",") 8 | 9 | mst = mst.merge(sku_master, how='left', on=['SKU_ID']) 10 | out_file = '/root/data/MODEL_MASTER_0201_0415_step1.csv' 11 | mst.to_csv(out_file, sep=",", index=False) 12 | 13 | del sku_master 14 | 15 | user_action = pd.read_csv('/root/data/USER_ACTION_0201_0415.csv', sep=",") 16 | 17 | mst = pd.merge(mst, user_action, how='left', on=['USER_ID']) 18 | out_file = '/root/data/MODEL_MASTER_0201_0415_step2.csv' 19 | mst.to_csv(out_file, sep=",", index=False) 20 | 21 | del user_action 22 | 23 | user_sku_action = pd.read_csv('/root/data/SKU_ACTION_0201_0410.csv', sep=",") 24 | 25 | mst = pd.merge(mst, user_sku_action, how='left', on=['USER_ID', 'SKU_ID']) 26 | out_file = '/root/data/MODEL_MASTER_0201_0415.csv' 27 | mst.to_csv(out_file, sep=",", index=False) 28 | 29 | del user_sku_action 30 | -------------------------------------------------------------------------------- /signal_generation/gbdtmodel_for_application.py: -------------------------------------------------------------------------------- 1 | #!usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import sys 5 | import os 6 | import time 7 | from sklearn import metrics 8 | import numpy as np 9 | import cPickle as pickle 10 | import pandas as pd 11 | from sklearn.externals import joblib 12 | from sklearn import preprocessing 13 | # from sklearn.model_selection import GridSearchCV 14 | # from sklearn.ensemble import GradientBoostingClassifier 15 | 16 | reload(sys) 17 | sys.setdefaultencoding('utf8') 18 | 19 | 20 | 21 | # GBDT(Gradient Boosting Decision Tree) Classifier 22 | def gradient_boosting_classifier(train_x, train_y): 23 | from sklearn.ensemble import GradientBoostingClassifier 24 | # model = GradientBoostingClassifier(n_estimators=200) 25 | model = GradientBoostingClassifier(n_estimators=200, max_depth=7, random_state=2017) 26 | model.fit(train_x, train_y) 27 | return model 28 | 29 | 30 | 31 | if __name__ == '__main__': 32 | print 'reading data...' 33 | 34 | x = pd.read_csv("/root/users/WSY/master_table_for_application.csv") 35 | 36 | num_row, num_feat = x.shape 37 | 38 | print '******************** Data Info *********************' 39 | print '# data: %d, dimension: %d' % (num_row, num_feat) 40 | 41 | start_time = time.time() 42 | model = joblib.load('/root/users/WSY/trained_gbdt_model_0513_200trees_5') 43 | y = model.predict(x) 44 | print 'training took %fs!' % (time.time() - start_time) 45 | predictdf = pd.DataFrame(y, columns=['result']) 46 | outpath = '/root/users/WSY/predicted_for_application.csv' 47 | predictdf.to_csv(outpath, sep=",", index=False) 48 | 49 | outpath = '/root/users/WSY/expected_user_sku_pair_for_application.csv' 50 | result = x.drop(y[y['result'] == 0]].index) 51 | result.to_csv(outpath, sep=",", index=False) 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | -------------------------------------------------------------------------------- /signal_generation/master_x_for_application.py: -------------------------------------------------------------------------------- 1 | #!usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import pandas as pd 5 | from sklearn import preprocessing 6 | import numpy as np 7 | 8 | # raw_file = pd.read_csv('/root/users/WSY/master_table_v1.csv') 9 | raw_file = pd.read_csv('/root/data/MODEL_MASTER_0201_0415.csv') 10 | 11 | num_rows, num_cols = raw_file.shape 12 | print 'rows: %d' % num_rows 13 | print 'cols: %d' % num_cols 14 | 15 | 16 | def age_divide(string_num): 17 | if string_num == '-1': 18 | return -1 19 | if string_num == '15岁以下': 20 | return 1 21 | if string_num == '16-25岁': 22 | return 2 23 | if string_num == '26-35岁': 24 | return 3 25 | if string_num == '36-45岁': 26 | return 4 27 | if string_num == '46-55岁': 28 | return 5 29 | if string_num == '56岁以上': 30 | return 6 31 | 32 | raw_file['AGE_USABLE'] = raw_file['AGE'].apply(age_divide) 33 | 34 | raw_file = raw_file.drop('AGE', 1) 35 | raw_file = raw_file.drop('USER_ID', 1) 36 | raw_file = raw_file.drop('SKU_ID', 1) 37 | 38 | raw_file = raw_file.astype(np.float64, copy=False) 39 | raw_file = raw_file.replace([np.inf, -np.inf], np.nan).fillna(0) 40 | raw_file = (raw_file - raw_file.mean())/raw_file.std() 41 | raw_file = raw_file.replace([np.inf, -np.inf], np.nan).fillna(0) 42 | 43 | out_file = '/root/users/WSY/master_table_for_application.csv' 44 | raw_file.to_csv(out_file, sep=",", index=False, header=False) 45 | 46 | -------------------------------------------------------------------------------- /signal_generation/model_master.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | from datetime import * 4 | 5 | # MODEL DATA 6 | readin_file1 = '/root/data/LABEL_0409_0413.csv' 7 | rdin_file1 = pd.read_csv(readin_file1, sep=",") 8 | # 141125 records, 1008 Y=1, 140117 Y=0 9 | 10 | user_list = rdin_file1['USER_ID'].drop_duplicates() 11 | 12 | readin_file2 = '/root/data/SKU_MASTER_0201_0408.csv' 13 | rdin_file2 = pd.read_csv(readin_file2, sep=",") 14 | # 27653 records 15 | 16 | readin_file3 = '/root/data/USER_ACTION_0201_0408.csv' 17 | rdin_file3 = pd.read_csv(readin_file3, sep=",") 18 | 19 | #readin_file4 = '/root/data/USER_SKU_ACTION_0201_0408.csv' 20 | #rdin_file4 = pd.read_csv(readin_file4, sep=",") 21 | 22 | print(len(rdin_file1)) 23 | # 141125 24 | mst = rdin_file1.merge(rdin_file2, how='left', on='SKU_ID') 25 | print(len(mst)) 26 | # 141125 27 | 28 | mst2 = mst.merge(rdin_file3, how='left', on='USER_ID') 29 | print(len(mst)) 30 | # 141125 31 | 32 | out_file = '/root/data/MODEL_INPUT_MASTER_0201_0408.csv' 33 | mst2.to_csv(out_file, sep=",", index=False) 34 | 35 | in_1 = '/root/data/MODEL_INPUT_MASTER_0201_0408.csv' 36 | in_2 = '/root/data/USER_SKU_ACTION_0201_0408.csv' 37 | df = pd.read_csv(in_2, sep=",") 38 | 39 | df_2 = df.merge(user_list,how='inner', on='USER_ID') 40 | 41 | df_1 = pd.read_csv(in_1, sep=",") 42 | df_2 = pd.read_csv(in_2, sep=",") 43 | 44 | print(len(df_1)) 45 | # 141125 46 | print(len(df_2)) 47 | # 1147219 48 | mst = df_1.merge(df_2, how='left', on=['USER_ID', 'SKU_ID']) 49 | print(len(mst)) 50 | # 141125 51 | 52 | 53 | out_file = '/root/data/MODEL_MASTER_0201_0408.csv' 54 | mst.to_csv(out_file, sep=",", index=False) 55 | 56 | 57 | 58 | # SCORING DATA 59 | readin_file1 = '/root/data/LABEL_0409_0413.csv' 60 | rdin_file1 = pd.read_csv(readin_file1, sep=",") 61 | # 141125 records, 1008 Y=1, 140117 Y=0 62 | 63 | user_list = rdin_file1['USER_ID'].drop_duplicates() 64 | 65 | readin_file2 = '/root/data/SKU_MASTER_0201_0408.csv' 66 | rdin_file2 = pd.read_csv(readin_file2, sep=",") 67 | # 27653 records 68 | 69 | readin_file3 = '/root/data/USER_ACTION_0201_0408.csv' 70 | rdin_file3 = pd.read_csv(readin_file3, sep=",") 71 | 72 | #readin_file4 = '/root/data/USER_SKU_ACTION_0201_0408.csv' 73 | #rdin_file4 = pd.read_csv(readin_file4, sep=",") 74 | 75 | print(len(rdin_file1)) 76 | # 141125 77 | mst = rdin_file1.merge(rdin_file2, how='left', on='SKU_ID') 78 | print(len(mst)) 79 | # 141125 80 | 81 | mst2 = mst.merge(rdin_file3, how='left', on='USER_ID') 82 | print(len(mst)) 83 | # 141125 84 | 85 | out_file = '/root/data/MODEL_INPUT_MASTER_0201_0408.csv' 86 | mst2.to_csv(out_file, sep=",", index=False) 87 | 88 | in_1 = '/root/data/MODEL_INPUT_MASTER_0201_0408.csv' 89 | in_2 = '/root/data/USER_SKU_ACTION_0201_0408.csv' 90 | df = pd.read_csv(in_2, sep=",") 91 | 92 | df_2 = df.merge(user_list,how='inner', on='USER_ID') 93 | 94 | df_1 = pd.read_csv(in_1, sep=",") 95 | df_2 = pd.read_csv(in_2, sep=",") 96 | 97 | print(len(df_1)) 98 | # 141125 99 | print(len(df_2)) 100 | # 1147219 101 | mst = df_1.merge(df_2, how='left', on=['USER_ID', 'SKU_ID']) 102 | print(len(mst)) 103 | # 141125 104 | 105 | out_file = '/root/data/MODEL_MASTER_0201_0408.csv' 106 | mst.to_csv(out_file, sep=",", index=False) 107 | -------------------------------------------------------------------------------- /submission/submit_20170504__sw_0.03168.csv: -------------------------------------------------------------------------------- 1 | user_id,sku_id 2 | 200033,44854 3 | 200339,84389 4 | 200419,68615 5 | 200446,128988 6 | 200599,92909 7 | 200678,131300 8 | 201393,103652 9 | 201441,61581 10 | 201449,7498 11 | 201471,63006 12 | 201632,24371 13 | 201883,12564 14 | 202096,20308 15 | 202277,5505 16 | 202386,53638 17 | 202608,166354 18 | 202627,84389 19 | 202754,36307 20 | 203004,63006 21 | 203125,154636 22 | 203138,81462 23 | 203359,6533 24 | 203614,12564 25 | 203636,31662 26 | 203799,52343 27 | 203939,154636 28 | 203978,32465 29 | 203984,36307 30 | 204066,94795 31 | 204079,126146 32 | 204296,62351 33 | 204306,61531 34 | 204333,31662 35 | 204424,79520 36 | 204539,84409 37 | 204592,56792 38 | 204676,154636 39 | 204703,90294 40 | 204747,160208 41 | 204989,36748 42 | 205117,84409 43 | 205193,10325 44 | 205219,64467 45 | 205359,19960 46 | 205366,63006 47 | 205399,103652 48 | 205497,32465 49 | 205624,57018 50 | 205683,166707 51 | 205715,12564 52 | 205783,85557 53 | 205829,153395 54 | 205868,61226 55 | 205875,126394 56 | 205966,7881 57 | 206005,69209 58 | 206076,116746 59 | 206115,95850 60 | 206147,111225 61 | 206148,63006 62 | 206319,129010 63 | 206351,62863 64 | 206357,81462 65 | 206365,18412 66 | 206383,68966 67 | 206513,32465 68 | 206565,63006 69 | 206574,12564 70 | 206602,134619 71 | 206683,9702 72 | 206685,154636 73 | 206698,113452 74 | 206722,164258 75 | 206868,63006 76 | 206877,131300 77 | 206904,40336 78 | 206978,145946 79 | 206985,99625 80 | 207149,154636 81 | 207312,115851 82 | 207329,20308 83 | 207380,44854 84 | 207393,5825 85 | 207444,97025 86 | 207768,63006 87 | 208098,18412 88 | 208215,20308 89 | 208216,18412 90 | 208768,144589 91 | 209331,126535 92 | 209347,63006 93 | 209780,112332 94 | 209859,52476 95 | 210213,116489 96 | 210603,108824 97 | 210621,103652 98 | 210744,154636 99 | 210901,10325 100 | 210931,45320 101 | 210942,63006 102 | 211052,10316 103 | 211116,63006 104 | 211253,61531 105 | 211331,97025 106 | 211381,59175 107 | 211394,63006 108 | 211422,112274 109 | 211465,57018 110 | 211490,32465 111 | 211567,63006 112 | 211578,169819 113 | 211605,88295 114 | 211643,79520 115 | 211649,134619 116 | 211672,154636 117 | 211794,83384 118 | 211795,40336 119 | 211961,70352 120 | 212063,152872 121 | 212208,149641 122 | 212215,14433 123 | 212222,68767 124 | 212291,13334 125 | 212360,12540 126 | 212378,21457 127 | 212437,102110 128 | 212474,84389 129 | 212485,23696 130 | 212493,154636 131 | 212524,154636 132 | 212589,10325 133 | 212596,166707 134 | 212599,18412 135 | 212662,32465 136 | 212785,75877 137 | 212863,21147 138 | 212996,31662 139 | 213469,166707 140 | 213513,107012 141 | 213527,57018 142 | 213717,11640 143 | 213753,89802 144 | 213980,5825 145 | 214020,57018 146 | 214031,85557 147 | 214249,12564 148 | 214315,55715 149 | 214533,11640 150 | 214546,103132 151 | 214579,166354 152 | 214616,14433 153 | 214697,67924 154 | 214766,102204 155 | 214772,52343 156 | 214785,12564 157 | 214839,68767 158 | 214892,108824 159 | 215113,63006 160 | 215286,151327 161 | 215421,57018 162 | 215441,40336 163 | 215517,63006 164 | 215524,12564 165 | 215837,108824 166 | 215838,62863 167 | 216054,103652 168 | 216181,133436 169 | 216187,12564 170 | 216249,17470 171 | 216359,12564 172 | 216381,57018 173 | 216638,154636 174 | 217007,103234 175 | 217039,169819 176 | 217068,57018 177 | 217138,126906 178 | 217174,153551 179 | 217190,75877 180 | 217207,18412 181 | 217545,103234 182 | 217712,69209 183 | 217870,32465 184 | 217934,101181 185 | 217943,145946 186 | 218075,63006 187 | 218174,152478 188 | 218211,47107 189 | 219007,61531 190 | 219101,151327 191 | 219294,149641 192 | 219855,152478 193 | 219873,18103 194 | 219960,103652 195 | 220614,91393 196 | 220656,40336 197 | 220855,18412 198 | 220862,103652 199 | 221103,144267 200 | 221149,153551 201 | 221437,126146 202 | 221503,101181 203 | 221922,69209 204 | 222138,157827 205 | 222209,75527 206 | 222410,15259 207 | 222534,128988 208 | 222630,79520 209 | 222753,151940 210 | 222881,63006 211 | 223130,12564 212 | 223249,52343 213 | 223390,85557 214 | 223551,36371 215 | 223870,154636 216 | 223961,12564 217 | 224039,125756 218 | 224204,164258 219 | 224341,12564 220 | 224365,169819 221 | 224441,5825 222 | 224456,79520 223 | 224631,5825 224 | 224640,21457 225 | 224690,88295 226 | 224717,138778 227 | 224836,134619 228 | 224955,63006 229 | 225030,69355 230 | 225089,44854 231 | 225312,157827 232 | 225374,18412 233 | 225584,81462 234 | 225661,88524 235 | 226391,63006 236 | 226672,5825 237 | 226807,24371 238 | 226827,31662 239 | 226833,18412 240 | 226933,128988 241 | 226966,57018 242 | 226994,79520 243 | 227020,154636 244 | 227261,97025 245 | 227310,32465 246 | 227319,81708 247 | 227427,36307 248 | 227615,57018 249 | 227796,81462 250 | 228234,63006 251 | 228383,71657 252 | 228670,107711 253 | 228735,37957 254 | 228742,6533 255 | 228769,52343 256 | 228834,103132 257 | 228933,31662 258 | 229023,79520 259 | 229074,149641 260 | 229311,18412 261 | 229475,32465 262 | 229518,166354 263 | 229548,31662 264 | 229570,168651 265 | 229588,128988 266 | 229636,26439 267 | 229826,77571 268 | 229906,12564 269 | 229975,140807 270 | 230048,12564 271 | 230114,18412 272 | 230214,96887 273 | 230250,18412 274 | 230299,154636 275 | 230403,57018 276 | 230430,142477 277 | 230879,75877 278 | 230973,21457 279 | 230979,103132 280 | 231028,96887 281 | 231276,32465 282 | 231319,153551 283 | 231394,14433 284 | 231490,164544 285 | 231577,36307 286 | 231614,92909 287 | 231685,32465 288 | 231737,70839 289 | 231768,166354 290 | 232225,97097 291 | 232288,103652 292 | 232293,59175 293 | 232309,81708 294 | 232519,59175 295 | 232647,81462 296 | 232842,63006 297 | 232954,37957 298 | 233129,154636 299 | 233167,32465 300 | 233266,55715 301 | 233525,109715 302 | 233552,9702 303 | 233777,65520 304 | 233859,63006 305 | 234039,11640 306 | 234095,103652 307 | 234298,44854 308 | 234463,36171 309 | 234506,64794 310 | 234529,63006 311 | 234549,77571 312 | 234626,59175 313 | 234641,152478 314 | 234735,69209 315 | 234747,126906 316 | 234886,109728 317 | 234980,138853 318 | 235063,18412 319 | 235104,57018 320 | 235184,31662 321 | 235314,149641 322 | 235371,44854 323 | 235400,2677 324 | 235471,73398 325 | 235781,107012 326 | 235813,13334 327 | 235862,32465 328 | 236014,21457 329 | 236106,152478 330 | 236343,11640 331 | 236415,32465 332 | 236578,40336 333 | 236604,47895 334 | 236690,154636 335 | 237014,15259 336 | 237082,160476 337 | 237148,68615 338 | 237301,13785 339 | 237436,103652 340 | 237741,31662 341 | 237808,134330 342 | 237924,13785 343 | 237979,2677 344 | 238000,63006 345 | 238137,21147 346 | 238190,130291 347 | 238191,14163 348 | 238491,7498 349 | 238563,69355 350 | 238794,20308 351 | 238842,12564 352 | 239036,64467 353 | 239045,14163 354 | 239123,123578 355 | 239306,12564 356 | 239315,101181 357 | 239488,57018 358 | 239605,62476 359 | 239726,84389 360 | 239842,57018 361 | 239922,115215 362 | 240011,38639 363 | 240118,32465 364 | 240239,125756 365 | 240316,103652 366 | 240615,132782 367 | 240649,93169 368 | 240654,92394 369 | 240803,164211 370 | 240858,62351 371 | 240871,60230 372 | 241062,5825 373 | 241110,57018 374 | 241115,151327 375 | 241121,52343 376 | 241268,145946 377 | 241327,160476 378 | 241491,161265 379 | 241498,21147 380 | 241521,7746 381 | 241588,5825 382 | 241730,66291 383 | 241742,70352 384 | 241801,81462 385 | 242025,126144 386 | 242395,62863 387 | 242476,168136 388 | 242527,103132 389 | 242612,66088 390 | 242630,97025 391 | 242632,20308 392 | 242669,60861 393 | 242882,79520 394 | 242918,166354 395 | 242968,25987 396 | 243175,12564 397 | 243250,84389 398 | 243493,84389 399 | 243593,146523 400 | 243727,12564 401 | 243767,18412 402 | 243816,59038 403 | 243968,24371 404 | 243982,125756 405 | 243983,112332 406 | 244052,74141 407 | 244189,89802 408 | 244191,135409 409 | 244196,24371 410 | 244237,164506 411 | 244323,65520 412 | 244347,149641 413 | 244372,32465 414 | 244429,126146 415 | 244470,103132 416 | 244543,18412 417 | 244859,81462 418 | 245164,63006 419 | 245284,6533 420 | 245296,55715 421 | 245384,99646 422 | 245412,149641 423 | 245544,154636 424 | 245694,84389 425 | 246024,7746 426 | 246160,63006 427 | 246549,164258 428 | 246884,68767 429 | 247539,97025 430 | 247693,38222 431 | 247920,125756 432 | 247925,78694 433 | 247944,103652 434 | 248029,38639 435 | 248115,63006 436 | 248549,166624 437 | 248679,123016 438 | 248809,134619 439 | 248837,63006 440 | 248883,115867 441 | 248937,79520 442 | 249059,169819 443 | 249064,145946 444 | 249446,79520 445 | 249485,6533 446 | 249566,64974 447 | 250163,18412 448 | 250350,61531 449 | 250548,126146 450 | 250567,44854 451 | 250693,101181 452 | 250695,81708 453 | 250816,79520 454 | 251469,63006 455 | 251577,24371 456 | 251601,79636 457 | 251634,80462 458 | 251965,166876 459 | 252096,70839 460 | 252122,109728 461 | 252337,31662 462 | 252799,36307 463 | 252956,60861 464 | 253012,44854 465 | 253075,151327 466 | 253159,149641 467 | 253167,62863 468 | 253312,150314 469 | 253696,66291 470 | 253836,149641 471 | 253977,12564 472 | 254055,108858 473 | 254073,153395 474 | 254143,63006 475 | 254531,154636 476 | 254534,75877 477 | 254584,138778 478 | 254630,142667 479 | 254715,75877 480 | 254718,34314 481 | 254760,12564 482 | 254797,149641 483 | 255113,51670 484 | 255171,57161 485 | 255296,24371 486 | 255423,57018 487 | 255479,146523 488 | 255582,18412 489 | 255619,153551 490 | 255887,12564 491 | 256078,32465 492 | 256474,166354 493 | 256534,116489 494 | 256547,63006 495 | 256609,20308 496 | 256662,160476 497 | 256705,154636 498 | 256778,52343 499 | 256938,18412 500 | 256966,63006 501 | 257089,154636 502 | 257240,154636 503 | 257408,63006 504 | 257852,166707 505 | 258174,73469 506 | 258245,6533 507 | 258379,84389 508 | 258466,32465 509 | 258590,76270 510 | 258825,109728 511 | 258873,12564 512 | 259100,32465 513 | 259155,57018 514 | 259534,152478 515 | 259717,145990 516 | 259896,63229 517 | 260211,85557 518 | 260438,109728 519 | 260722,102090 520 | 260773,11640 521 | 260789,80030 522 | 260906,79520 523 | 260930,153551 524 | 260934,44854 525 | 260954,12564 526 | 261135,52343 527 | 261322,154636 528 | 261431,123773 529 | 261677,89802 530 | 261798,47193 531 | 261944,70839 532 | 261970,154636 533 | 261991,52343 534 | 262019,84389 535 | 262171,149854 536 | 262414,12564 537 | 262474,134289 538 | 262526,149641 539 | 262527,42199 540 | 262611,152478 541 | 262674,52343 542 | 262738,32465 543 | 263074,115851 544 | 263366,59175 545 | 263497,32465 546 | 263581,63006 547 | 263845,126146 548 | 263875,63006 549 | 264323,47895 550 | 264352,166707 551 | 264487,31662 552 | 264631,57018 553 | 264647,18412 554 | 264688,129001 555 | 264694,5825 556 | 264735,38639 557 | 264755,108824 558 | 264823,75877 559 | 264909,138151 560 | 265009,24371 561 | 265020,52343 562 | 265092,97025 563 | 266288,14433 564 | 266674,67755 565 | 267007,14577 566 | 267079,31662 567 | 267125,40336 568 | 267276,60230 569 | 267392,73427 570 | 267773,163754 571 | 267848,103132 572 | 267875,57018 573 | 267880,142477 574 | 268113,83384 575 | 268167,15259 576 | 268367,166354 577 | 268478,12564 578 | 268540,52343 579 | 268550,77571 580 | 268706,152478 581 | 268724,12564 582 | 268733,44854 583 | 268842,128988 584 | 269072,169819 585 | 269136,73398 586 | 269569,63006 587 | 269736,153395 588 | 269893,59175 589 | 269983,160209 590 | 270148,933 591 | 270238,74517 592 | 270340,61531 593 | 270348,151327 594 | 270354,116489 595 | 270423,103652 596 | 270524,64467 597 | 270706,24810 598 | 270841,38222 599 | 270977,21147 600 | 271164,103652 601 | 271256,152478 602 | 271410,40336 603 | 271430,37957 604 | 271506,47895 605 | 271749,90621 606 | 271750,6533 607 | 271757,108399 608 | 271851,164258 609 | 272063,154636 610 | 272179,77723 611 | 272311,63006 612 | 272512,122341 613 | 272524,154636 614 | 272535,63006 615 | 272626,10325 616 | 272662,128988 617 | 272748,74782 618 | 273229,57018 619 | 273538,124997 620 | 273560,69209 621 | 273616,149641 622 | 273663,40336 623 | 273818,37638 624 | 274072,166707 625 | 274240,57018 626 | 274276,65520 627 | 274347,60861 628 | 274417,154636 629 | 274494,52343 630 | 274668,79636 631 | 274800,6809 632 | 274869,153698 633 | 275442,109728 634 | 275592,152478 635 | 275723,63006 636 | 275840,18412 637 | 275994,39830 638 | 276054,63006 639 | 276564,124997 640 | 276975,169819 641 | 277050,68615 642 | 277150,103234 643 | 277180,62863 644 | 277235,24371 645 | 277272,131777 646 | 277403,161870 647 | 277440,74517 648 | 277459,156064 649 | 277710,52343 650 | 277729,79520 651 | 277759,164252 652 | 277784,81462 653 | 277827,94561 654 | 277831,70873 655 | 277939,109728 656 | 278044,109093 657 | 278064,149641 658 | 278205,64974 659 | 278252,153698 660 | 278569,65520 661 | 278637,63006 662 | 278846,14433 663 | 279201,154636 664 | 279467,89802 665 | 279472,69209 666 | 279615,74141 667 | 279765,63006 668 | 279776,21147 669 | 279938,154636 670 | 279969,57018 671 | 280101,4366 672 | 280114,14163 673 | 280179,154636 674 | 280236,153395 675 | 280287,79636 676 | 280298,160476 677 | 280323,69209 678 | 280333,103234 679 | 280378,63006 680 | 280515,18412 681 | 280567,127735 682 | 280668,75877 683 | 280859,152478 684 | 281059,93490 685 | 281263,153551 686 | 281326,164211 687 | 281397,15259 688 | 281721,21147 689 | 281792,73469 690 | 282071,153395 691 | 282448,112332 692 | 282514,166354 693 | 282705,128988 694 | 283535,60861 695 | 283726,166354 696 | 283790,164258 697 | 284138,61226 698 | 284352,151327 699 | 284564,103652 700 | 284616,79520 701 | 284656,37638 702 | 284711,153551 703 | 284871,57018 704 | 284913,52343 705 | 285014,149641 706 | 285057,53638 707 | 285083,12564 708 | 285086,79636 709 | 285241,79636 710 | 285248,89802 711 | 285354,123773 712 | 285383,81462 713 | 285905,111225 714 | 286259,15661 715 | 286291,111225 716 | 286486,134619 717 | 286584,103132 718 | 286688,152478 719 | 286855,65520 720 | 286871,152042 721 | 286930,160476 722 | 287018,123773 723 | 287020,81462 724 | 287135,139586 725 | 287177,15258 726 | 287340,57018 727 | 287585,111225 728 | 287698,128988 729 | 288087,12564 730 | 288114,36307 731 | 288211,154636 732 | 288296,57018 733 | 288387,94482 734 | 288746,128988 735 | 288754,157120 736 | 288893,59175 737 | 288932,52343 738 | 289190,166707 739 | 289362,71282 740 | 289551,154636 741 | 289884,31662 742 | 289891,81708 743 | 290022,128988 744 | 290029,128567 745 | 290466,79636 746 | 290536,145895 747 | 290589,73469 748 | 290680,57018 749 | 290852,63006 750 | 290864,107012 751 | 290924,37995 752 | 291317,124997 753 | 291633,48895 754 | 291726,94944 755 | 291838,24371 756 | 291842,133436 757 | 291893,154636 758 | 292177,128988 759 | 292373,84389 760 | 292404,36307 761 | 292458,95850 762 | 292514,37638 763 | 293172,55143 764 | 293192,32465 765 | 293215,57018 766 | 293220,98253 767 | 293244,44854 768 | 293301,134619 769 | 293382,7746 770 | 293428,88295 771 | 293497,57018 772 | 293834,73469 773 | 293864,60861 774 | 294110,84389 775 | 294190,63006 776 | 294191,31662 777 | 294301,103652 778 | 294699,115215 779 | 294736,37638 780 | 294751,12564 781 | 294830,36307 782 | 294873,18412 783 | 294913,69355 784 | 294986,15259 785 | 295046,7746 786 | 295120,152478 787 | 295173,80030 788 | 295304,5505 789 | 295657,60861 790 | 295746,53638 791 | 295835,97025 792 | 295853,128988 793 | 296157,6533 794 | 296604,32465 795 | 296609,169819 796 | 296739,81708 797 | 296751,169819 798 | 296827,127735 799 | 296990,89329 800 | 297008,44854 801 | 297022,149641 802 | 297094,31662 803 | 297394,12564 804 | 297402,61226 805 | 297554,31662 806 | 297595,69209 807 | 297613,59175 808 | 297628,63006 809 | 297850,12564 810 | 297856,42199 811 | 298079,32465 812 | 298475,40336 813 | 298708,130291 814 | 298949,57018 815 | 298998,97025 816 | 299072,145990 817 | 299084,134619 818 | 299112,79520 819 | 299164,160476 820 | 299329,70839 821 | 299373,153551 822 | 299494,44854 823 | 299552,145990 824 | 299585,138778 825 | 299773,10325 826 | 299926,12564 827 | 300128,31662 828 | 300187,154636 829 | 300210,126146 830 | 300711,31662 831 | 300800,134084 832 | 301002,144364 833 | 301044,32465 834 | 301121,142140 835 | 301132,14163 836 | 301150,128988 837 | 301401,164258 838 | 301683,15259 839 | 301733,18412 840 | 301812,152478 841 | 301861,154636 842 | 302017,15106 843 | 302376,5505 844 | 302510,24371 845 | 302705,7836 846 | 302940,34314 847 | 303058,103132 848 | 303116,80462 849 | 303150,66331 850 | 303274,31662 851 | 303425,24371 852 | 303457,39425 853 | 303472,65520 854 | 303509,92909 855 | 304061,109083 856 | 304240,36307 857 | 304522,37254 858 | 304546,50290 859 | 304780,18412 860 | 304928,166707 861 | 304938,85557 862 | 305034,57161 863 | 305054,42199 864 | -------------------------------------------------------------------------------- /submission/submit_20170512_gny_0.04036.csv: -------------------------------------------------------------------------------- 1 | user_id,sku_id 2 | 294887,9702 3 | 231770,154636 4 | 241651,154636 5 | 248451,154636 6 | 255953,44854 7 | 265610,79520 8 | 246664,79520 9 | 251262,79520 10 | 207174,31662 11 | 225855,31662 12 | 289558,9702 13 | 296877,9702 14 | 231803,154636 15 | 270311,154636 16 | 301809,14433 17 | 220224,154636 18 | 223578,154636 19 | 258197,14433 20 | 303874,14433 21 | 235372,140807 22 | 279040,14433 23 | 213485,68767 24 | 246036,103652 25 | 284432,154636 26 | 244431,14433 27 | 207336,126146 28 | 231980,103652 29 | 238004,59175 30 | 218616,149641 31 | 277618,103652 32 | 250305,103652 33 | 290259,149641 34 | 206879,149641 35 | 201646,145946 36 | 230571,103652 37 | 243466,103652 38 | 283601,103652 39 | 263598,154636 40 | 200288,154636 41 | 230471,154636 42 | 276682,154636 43 | 260329,154636 44 | 245993,166707 45 | 297250,154636 46 | 262513,59175 47 | 268564,154636 48 | 274553,57018 49 | 304141,75877 50 | 227840,152478 51 | 304478,149641 52 | 208048,149641 53 | 243771,154636 54 | 257134,149641 55 | 207552,56792 56 | 218664,152478 57 | 210325,149641 58 | 298513,154636 59 | 207534,149641 60 | 302166,154636 61 | 302853,149641 62 | 257305,103652 63 | 290074,103652 64 | 236598,154636 65 | 230285,166354 66 | 294696,57018 67 | 243238,57018 68 | 202340,57018 69 | 297264,57018 70 | 236190,166354 71 | 215604,57018 72 | 246349,57018 73 | 282091,123773 74 | 273530,12564 75 | 299869,154636 76 | 279436,154636 77 | 207519,79520 78 | 267625,116489 79 | 207511,149641 80 | 297622,149641 81 | 250690,126146 82 | 303799,79520 83 | 251755,116489 84 | 284242,57018 85 | 270664,5505 86 | 205066,56792 87 | 281649,149641 88 | 299117,152478 89 | 245036,79520 90 | 279943,56792 91 | 207554,152478 92 | 271234,152478 93 | 207545,149641 94 | 275530,56792 95 | 200808,152478 96 | 270787,152478 97 | 244552,56792 98 | 287249,56792 99 | 248490,152478 100 | 215003,152478 101 | 295210,152478 102 | 229671,56792 103 | 284663,56792 104 | 252238,152478 105 | 223125,152478 106 | 286741,152478 107 | 277966,152478 108 | 217924,152478 109 | 110 | -------------------------------------------------------------------------------- /submission/submit_20170512_sw_0.00549.csv: -------------------------------------------------------------------------------- 1 | user_id,sku_id 2 | 200287,38604 3 | 200642,17660 4 | 201046,81838 5 | 201089,162108 6 | 202246,126791 7 | 202277,275 8 | 202459,24688 9 | 202949,135272 10 | 203041,22140 11 | 203052,156484 12 | 203354,22140 13 | 203458,156484 14 | 203725,118238 15 | 203875,166384 16 | 204442,20739 17 | 204747,17529 18 | 204913,135376 19 | 205189,128416 20 | 205453,87901 21 | 205497,30971 22 | 205517,79122 23 | 205570,142606 24 | 205575,96959 25 | 205730,165190 26 | 206018,86759 27 | 206104,139174 28 | 206157,99425 29 | 206606,43960 30 | 206645,140741 31 | 206677,36503 32 | 206681,28276 33 | 206731,102034 34 | 206835,81163 35 | 207079,37546 36 | 207155,99993 37 | 207296,72967 38 | 207661,20719 39 | 207984,20476 40 | 208048,171183 41 | 208085,155223 42 | 208721,125460 43 | 208849,21151 44 | 209830,30971 45 | 209861,72967 46 | 209939,162344 47 | 210395,73467 48 | 210427,77008 49 | 210541,96403 50 | 211598,45980 51 | 211822,21078 52 | 211847,77745 53 | 212106,68654 54 | 212647,135272 55 | 212692,64197 56 | 212773,124500 57 | 213887,22469 58 | 214254,67295 59 | 214360,142295 60 | 214371,111778 61 | 214727,86220 62 | 214843,168119 63 | 214869,165614 64 | 215038,148209 65 | 215624,22962 66 | 215813,118053 67 | 216003,13931 68 | 216225,135865 69 | 216249,2148 70 | 216380,64197 71 | 216647,2480 72 | 216781,114823 73 | 217053,97007 74 | 217187,170400 75 | 217456,83325 76 | 217490,96168 77 | 217545,137805 78 | 218158,90646 79 | 218183,111778 80 | 218395,9087 81 | 218616,25942 82 | 218717,70593 83 | 219190,33258 84 | 219205,73515 85 | 219715,135376 86 | 220223,135272 87 | 220224,111532 88 | 220333,105971 89 | 220872,151611 90 | 221544,99425 91 | 221675,108410 92 | 221739,71207 93 | 222300,54209 94 | 222818,91182 95 | 222832,28276 96 | 222875,170626 97 | 222902,10598 98 | 223133,119831 99 | 223365,87114 100 | 223578,70491 101 | 223810,71940 102 | 223926,155223 103 | 223960,8210 104 | 223970,59051 105 | 224032,135272 106 | 224390,64419 107 | 224495,114568 108 | 224675,74900 109 | 224706,128416 110 | 225061,131955 111 | 225487,78508 112 | 226304,96959 113 | 227461,66968 114 | 227865,45980 115 | 228141,74798 116 | 228718,79829 117 | 228763,7196 118 | 228796,95974 119 | 229028,20819 120 | 229208,27582 121 | 229240,65210 122 | 229289,117882 123 | 229625,41009 124 | 229718,135865 125 | 230048,114568 126 | 230060,74280 127 | 230079,62080 128 | 230118,170400 129 | 230226,154504 130 | 230571,141243 131 | 230645,72348 132 | 230973,165290 133 | 231740,142295 134 | 231822,139119 135 | 232095,37546 136 | 232096,110904 137 | 232313,737 138 | 232688,36638 139 | 233084,134683 140 | 233106,127838 141 | 233208,156484 142 | 233366,106814 143 | 233439,34648 144 | 233545,163550 145 | 233874,143981 146 | 234273,136405 147 | 234368,28276 148 | 234742,107178 149 | 235940,56737 150 | 236145,125699 151 | 236245,7099 152 | 236494,57239 153 | 236598,86551 154 | 236618,124869 155 | 236728,106667 156 | 237017,38983 157 | 237059,112068 158 | 237458,5690 159 | 237631,156484 160 | 238004,122369 161 | 238258,39139 162 | 238430,54209 163 | 238538,80159 164 | 238696,112141 165 | 238845,10598 166 | 238903,22962 167 | 239162,20719 168 | 239223,102313 169 | 239525,149442 170 | 240117,75096 171 | 240148,75588 172 | 240256,20719 173 | 240486,163550 174 | 240571,99415 175 | 240701,135272 176 | 241426,49474 177 | 241535,97094 178 | 241795,99432 179 | 241809,11876 180 | 241869,59409 181 | 242546,32744 182 | 242569,72967 183 | 242586,58599 184 | 242799,64433 185 | 242880,146593 186 | 242922,149841 187 | 243193,145303 188 | 243250,129598 189 | 243356,141198 190 | 243400,72967 191 | 243414,111778 192 | 243466,106971 193 | 243939,8753 194 | 245139,49671 195 | 245199,93852 196 | 245325,135272 197 | 245408,135272 198 | 245821,58599 199 | 246263,125372 200 | 246474,47753 201 | 246480,126791 202 | 246570,129598 203 | 247006,68096 204 | 248076,72967 205 | 248203,28276 206 | 248246,135272 207 | 248352,141198 208 | 248489,53371 209 | 248788,108328 210 | 248807,107178 211 | 248831,111527 212 | 248867,65210 213 | 249094,34585 214 | 249263,45656 215 | 249799,165614 216 | 250216,54357 217 | 250422,88764 218 | 250673,96959 219 | 250763,117882 220 | 250840,30183 221 | 250868,646 222 | 251121,142606 223 | 251713,75851 224 | 251790,16579 225 | 252873,96959 226 | 253095,64160 227 | 253125,92795 228 | 253189,139174 229 | 253301,111887 230 | 253308,43670 231 | 253384,28276 232 | 253612,79616 233 | 253643,95584 234 | 254067,141243 235 | 254287,103085 236 | 254812,107178 237 | 255843,2148 238 | 256179,124765 239 | 256461,20719 240 | 256479,77008 241 | 256480,72348 242 | 256517,36638 243 | 256792,26229 244 | 257305,111778 245 | 257345,108618 246 | 257404,42497 247 | 257806,147898 248 | 257950,22962 249 | 258695,13931 250 | 258701,72967 251 | 258982,70757 252 | 259004,33051 253 | 259877,72967 254 | 260220,34648 255 | 260481,16579 256 | 260645,75737 257 | 260660,150863 258 | 261132,47213 259 | 261378,111391 260 | 261981,111778 261 | 262316,35464 262 | 262459,76301 263 | 262501,98445 264 | 262513,111527 265 | 262651,24900 266 | 262723,111778 267 | 262905,21151 268 | 262977,143981 269 | 263074,73118 270 | 263082,22140 271 | 263110,140412 272 | 263135,163680 273 | 263318,116820 274 | 263466,43405 275 | 263598,58756 276 | 263630,77008 277 | 263685,144242 278 | 263745,32876 279 | 264276,159610 280 | 264562,65914 281 | 264966,111532 282 | 265214,95976 283 | 265263,109349 284 | 265343,59409 285 | 265355,24369 286 | 265368,7196 287 | 265690,85780 288 | 265717,2072 289 | 265804,9906 290 | 266013,103594 291 | 266098,135865 292 | 266541,116470 293 | 266689,72348 294 | 266911,135272 295 | 266997,170400 296 | 267076,45980 297 | 267152,120140 298 | 267702,35464 299 | 267872,2072 300 | 267984,73074 301 | 268460,40933 302 | 268520,20719 303 | 268686,41887 304 | 268800,135272 305 | 268825,109349 306 | 269178,54209 307 | 269639,107178 308 | 269685,99415 309 | 269736,38741 310 | 269893,10984 311 | 270432,78697 312 | 270937,135147 313 | 270980,10984 314 | 271072,27333 315 | 271152,143037 316 | 271332,129598 317 | 271447,86220 318 | 271999,27333 319 | 272083,84639 320 | 272497,121009 321 | 273383,135272 322 | 273669,127189 323 | 274171,169 324 | 274613,6468 325 | 275071,36638 326 | 275208,170400 327 | 276372,47328 328 | 276380,72967 329 | 276535,9968 330 | 276603,98634 331 | 276906,111277 332 | 277221,54570 333 | 277222,167118 334 | 277682,95976 335 | 277796,72967 336 | 277812,38469 337 | 278421,5888 338 | 278699,25003 339 | 278874,95376 340 | 279405,135272 341 | 279768,77008 342 | 279965,66968 343 | 280338,98685 344 | 280545,33726 345 | 280860,54209 346 | 280961,126759 347 | 281236,161251 348 | 281374,135272 349 | 281483,49275 350 | 282380,25003 351 | 282524,22962 352 | 282579,47284 353 | 282825,72967 354 | 282839,126791 355 | 283316,72348 356 | 283601,28443 357 | 283610,16579 358 | 283996,135272 359 | 284367,148209 360 | 285040,147173 361 | 285101,48721 362 | 286263,162108 363 | 286484,55792 364 | 286541,28276 365 | 286614,96941 366 | 286957,90402 367 | 287240,135376 368 | 287411,123096 369 | 288215,135272 370 | 289052,32529 371 | 289094,19213 372 | 289415,72967 373 | 289603,21040 374 | 289645,86246 375 | 290441,33258 376 | 290559,125188 377 | 290688,20719 378 | 290712,7418 379 | 290715,117882 380 | 290923,69342 381 | 291084,15168 382 | 292348,170400 383 | 292362,28276 384 | 292533,126831 385 | 292625,49474 386 | 293192,97714 387 | 293526,22962 388 | 293823,155223 389 | 294135,70411 390 | 295316,102034 391 | 295518,75376 392 | 296327,20719 393 | 296803,90902 394 | 296877,64433 395 | 297491,71184 396 | 297548,90646 397 | 297999,156484 398 | 298107,97007 399 | 298699,87057 400 | 298909,95399 401 | 298997,22140 402 | 299573,129598 403 | 300048,82481 404 | 300266,126791 405 | 300851,84205 406 | 301017,12994 407 | 301121,112141 408 | 301197,79454 409 | 301238,126791 410 | 301460,76996 411 | 301524,28276 412 | 301656,7418 413 | 301669,22140 414 | 302066,128416 415 | 302666,85246 416 | 302793,127490 417 | 302819,165190 418 | 302882,48000 419 | 303507,158230 420 | 303894,21040 421 | 304026,83325 422 | 304264,135272 423 | 305011,9397 424 | 305296,129598 425 | 200407,111527 426 | 201001,170626 427 | 201378,30971 428 | 201732,45656 429 | 201841,33258 430 | 202096,111778 431 | 202203,102488 432 | 202295,117771 433 | 202484,107178 434 | 202597,82259 435 | 202627,149841 436 | 202816,160600 437 | 202984,83325 438 | 202987,95976 439 | 203014,49474 440 | 203044,1189 441 | 203123,163360 442 | 203443,135272 443 | 203545,72967 444 | 203554,7199 445 | 203988,25003 446 | 204217,114568 447 | 204364,49611 448 | 204496,72967 449 | 204588,106647 450 | 204750,40623 451 | 204770,28276 452 | 205419,78508 453 | 205637,96558 454 | 206513,60755 455 | 206621,139648 456 | 206696,149841 457 | -------------------------------------------------------------------------------- /submission/submit_20170513_sw_0.05148.csv: -------------------------------------------------------------------------------- 1 | "user_id ",sku_id 2 | 201409,103652 3 | 203624,50688 4 | 205855,95850 5 | 206143,154636 6 | 206484,57018 7 | 206544,18412 8 | 206606,56792 9 | 206967,80462 10 | 207048,111999 11 | 207073,57161 12 | 207089,146704 13 | 207097,153645 14 | 207147,58475 15 | 207155,18412 16 | 207157,46911 17 | 207237,124997 18 | 207240,79520 19 | 207274,13785 20 | 207277,5505 21 | 207288,12564 22 | 207289,21147 23 | 207334,154636 24 | 207357,154636 25 | 207363,128988 26 | 207390,18412 27 | 207391,93295 28 | 207433,126535 29 | 207450,18412 30 | 207493,68767 31 | 207494,152478 32 | 207522,12564 33 | 207526,63006 34 | 207601,81708 35 | 207982,14163 36 | 208000,79520 37 | 210412,15259 38 | 210801,109083 39 | 211454,144267 40 | 211928,69355 41 | 212752,5825 42 | 212996,31662 43 | 213743,6486 44 | 213968,103234 45 | 214839,31662 46 | 215525,48895 47 | 217763,59175 48 | 218204,63006 49 | 219543,61531 50 | 219973,24371 51 | 220282,154732 52 | 220724,69209 53 | 221414,164252 54 | 221992,52343 55 | 222683,114640 56 | 223769,84409 57 | 224144,119979 58 | 224533,37995 59 | 225515,79636 60 | 225886,145974 61 | 225890,92909 62 | 226388,43062 63 | 227236,5505 64 | 227849,31493 65 | 227959,47107 66 | 229250,154636 67 | 231687,6533 68 | 234139,133029 69 | 234379,164211 70 | 236261,154732 71 | 236950,62351 72 | 237453,5505 73 | 237523,21147 74 | 238419,133477 75 | 238606,14163 76 | 238689,123016 77 | 240871,60230 78 | 241653,31662 79 | 242259,31662 80 | 244001,162658 81 | 245751,32465 82 | 246639,63006 83 | 248451,74517 84 | 248572,36307 85 | 248604,154636 86 | 248713,38955 87 | 249048,13334 88 | 249668,5505 89 | 249736,61226 90 | 250674,128747 91 | 251443,131300 92 | 251511,14433 93 | 254179,73469 94 | 254797,149641 95 | 255053,161641 96 | 255317,83144 97 | 255365,135409 98 | 256534,116489 99 | 256571,166707 100 | 256833,47895 101 | 258422,126146 102 | 259260,63006 103 | 260185,171182 104 | 260329,117452 105 | 262167,69209 106 | 263525,111225 107 | 263572,31662 108 | 264402,75877 109 | 264647,18412 110 | 266966,73842 111 | 267010,90621 112 | 268485,65520 113 | 269595,20308 114 | 269984,138151 115 | 271419,59820 116 | 274051,154636 117 | 274276,15106 118 | 274370,5825 119 | 274616,63006 120 | 275530,62872 121 | 275634,154636 122 | 277327,32465 123 | 278405,52343 124 | 279866,32465 125 | 280250,65520 126 | 280664,160476 127 | 281411,126146 128 | 283182,60230 129 | 283335,31662 130 | 284471,107090 131 | 285478,109728 132 | 285597,149641 133 | 286766,164258 134 | 287186,21147 135 | 287249,65520 136 | 288172,126146 137 | 289428,18412 138 | 290923,63006 139 | 291403,6533 140 | 295940,75877 141 | 296717,76959 142 | 298241,88295 143 | 298461,65520 144 | 299721,39830 145 | 299781,31662 146 | 299869,128747 147 | 302029,63006 148 | 302115,68767 149 | 302912,36307 150 | 304114,142477 151 | 304731,154636 152 | -------------------------------------------------------------------------------- /submission/submit_20170518_sw_0.079.csv: -------------------------------------------------------------------------------- 1 | user_id," sku_id" 2 | 200288,18412 3 | 200585,65520 4 | 200808,154636 5 | 201409,103652 6 | 201646,110697 7 | 201995,126146 8 | 203624,62351 9 | 205066,97025 10 | 205855,95850 11 | 206143,154636 12 | 206544,18412 13 | 206593,100589 14 | 206606,56792 15 | 206879,169819 16 | 206978,145946 17 | 206995,115851 18 | 207097,153645 19 | 207155,18412 20 | 207236,153551 21 | 207341,84409 22 | 207351,12564 23 | 207357,154636 24 | 207363,128988 25 | 207378,25783 26 | 207385,164318 27 | 207433,126535 28 | 207450,18412 29 | 207468,5505 30 | 207483,32465 31 | 207494,152478 32 | 207511,75527 33 | 207513,15106 34 | 207522,12564 35 | 207526,63006 36 | 207534,154636 37 | 207545,57018 38 | 207552,44854 39 | 207554,88295 40 | 207982,14163 41 | 208000,79520 42 | 208048,147796 43 | 208945,65520 44 | 209032,149641 45 | 210325,63006 46 | 210339,125110 47 | 210412,24371 48 | 210801,109083 49 | 211454,144267 50 | 212752,5825 51 | 212996,31662 52 | 213743,65520 53 | 213968,103234 54 | 214233,31662 55 | 214839,31662 56 | 215512,78694 57 | 216389,21457 58 | 217763,126146 59 | 218616,21147 60 | 218664,65520 61 | 218801,133477 62 | 219060,124997 63 | 219973,24371 64 | 220224,39625 65 | 220282,154732 66 | 220724,69209 67 | 222683,114640 68 | 223486,36371 69 | 223578,28295 70 | 224533,88295 71 | 225390,37957 72 | 225886,145974 73 | 225890,92909 74 | 226388,152872 75 | 227840,88295 76 | 227841,22876 77 | 227849,31493 78 | 227959,47107 79 | 229060,128747 80 | 229671,113305 81 | 230471,57161 82 | 231770,75527 83 | 231803,164544 84 | 231980,36307 85 | 233957,31662 86 | 234139,133029 87 | 234185,38222 88 | 234379,164211 89 | 236261,164252 90 | 236598,12564 91 | 237453,18412 92 | 237523,21147 93 | 238004,56792 94 | 238419,133477 95 | 238606,14163 96 | 238689,123016 97 | 239113,79520 98 | 239360,63006 99 | 240760,281 100 | 240871,60230 101 | 240924,77571 102 | 241651,18412 103 | 241653,31662 104 | 242598,69355 105 | 243466,154636 106 | 243686,166707 107 | 243771,131300 108 | 244552,36307 109 | 245036,31662 110 | 245993,44854 111 | 246036,166876 112 | 246211,12564 113 | 246639,63006 114 | 248451,74517 115 | 248490,167310 116 | 248572,36307 117 | 248604,154636 118 | 249736,84409 119 | 250305,24371 120 | 250674,128747 121 | 251262,121950 122 | 251511,14433 123 | 254179,73469 124 | 254454,31662 125 | 254797,149641 126 | 255953,153698 127 | 256833,47895 128 | 257134,52343 129 | 257305,62351 130 | 258197,36685 131 | 258422,126146 132 | 258735,57018 133 | 259260,63006 134 | 260329,117452 135 | 260954,12564 136 | 262167,69209 137 | 262513,63006 138 | 263056,18412 139 | 263572,154636 140 | 263598,68615 141 | 264189,36307 142 | 264197,126146 143 | 264647,18412 144 | 264872,32465 145 | 265610,57161 146 | 265690,285 147 | 268485,65520 148 | 268564,123350 149 | 269595,20308 150 | 269682,31662 151 | 269984,138151 152 | 270311,63006 153 | 270348,151327 154 | 270787,154636 155 | 271234,18412 156 | 271419,59820 157 | 272123,62863 158 | 272618,44854 159 | 273417,154636 160 | 273530,140807 161 | 274051,154636 162 | 274276,65520 163 | 274370,5825 164 | 274616,63006 165 | 275530,145946 166 | 276310,63006 167 | 276682,109083 168 | 277327,32465 169 | 277732,31662 170 | 278405,52343 171 | 279040,131930 172 | 279198,126146 173 | 279436,32465 174 | 279943,60861 175 | 280250,65520 176 | 280664,160476 177 | 280800,38222 178 | 281649,117008 179 | 283182,149854 180 | 283335,154636 181 | 283477,12564 182 | 283601,75877 183 | 284432,18412 184 | 284663,31662 185 | 285747,47193 186 | 286338,75877 187 | 286766,164258 188 | 287180,155707 189 | 287249,65520 190 | 288172,126146 191 | 289558,69355 192 | 290074,12564 193 | 290259,152478 194 | 290923,63006 195 | 291838,24371 196 | 292592,79520 197 | 292766,154636 198 | 295210,154636 199 | 296877,63006 200 | 297250,65233 201 | 297622,154636 202 | 298461,65520 203 | 298513,63006 204 | 298795,63006 205 | 299117,62863 206 | 299721,164258 207 | 299869,128747 208 | 301809,154636 209 | 302029,63006 210 | 302115,68767 211 | 302166,3318 212 | 302912,36307 213 | 303874,52195 214 | 304114,142477 215 | 304478,44854 216 | 304731,154636 217 | 219543,61531 218 | 229250,154636 219 | 281411,126146 220 | 206980,154636 221 | 207276,32465 222 | 207473,134084 223 | 246664,154636 224 | 248574,12564 225 | 269328,138778 226 | 207391,93295 227 | 294887,128747 228 | 303799,157120 229 | 207601,81708 230 | 256571,166707 231 | 244001,162658 232 | 213709,142140 233 | 290525,147796 234 | 265571,149641 235 | 207237,124997 236 | 213485,149641 237 | 217924,154636 238 | 277966,47895 239 | 286741,84389 240 | 232814,64467 241 | 218204,63006 242 | 244431,44854 243 | 302853,149854 244 | 249048,13334 245 | 298241,88295 246 | 278446,162658 247 | 252238,65657 248 | 207289,21147 249 | 280777,69355 250 | 207519,128988 251 | 207334,154636 252 | 225515,79636 253 | 251443,131300 254 | 207210,6533 255 | 267010,90621 256 | 223769,84409 257 | 207307,124997 258 | 298993,5505 259 | 251755,57161 260 | 288549,21147 261 | 285478,109728 262 | 207147,58475 263 | 207288,12564 264 | 274553,154636 265 | 291403,6533 266 | 304141,160208 267 | 235372,75877 268 | 207174,47193 269 | 223125,84389 270 | 285597,149641 271 | 263525,111225 272 | 207493,68767 273 | 240822,135409 274 | 255317,83144 275 | 238320,152478 276 | 275634,154636 277 | 207259,63006 278 | 207336,164318 279 | 212813,52343 280 | 221414,164252 281 | 227236,5505 282 | 255365,135409 283 | 289428,18412 284 | 207464,149641 285 | 257898,128988 286 | 267625,3067 287 | 207390,18412 288 | 224144,119979 289 | 241196,166707 290 | 249370,154636 291 | 260185,171182 292 | 287186,44854 293 | 225855,57161 294 | 256534,116489 295 | 294977,107012 296 | 222915,149854 297 | 231687,6533 298 | 207220,110279 299 | 245751,32465 300 | 221992,52343 301 | 273237,126906 302 | 284471,107090 303 | 201669,124997 304 | 207215,32774 305 | 207274,69355 306 | 207277,5505 307 | 213691,63006 308 | 217575,12564 309 | 230974,9702 310 | 236950,62351 311 | 248713,38955 312 | 299569,57161 313 | 206484,57018 314 | 207145,63006 315 | 206967,80462 316 | -------------------------------------------------------------------------------- /submission/submit_20170518_sw_0.09218.csv: -------------------------------------------------------------------------------- 1 | user_id," sku_id" 2 | 200288,18412 3 | 200585,65520 4 | 200808,154636 5 | 201342,39425 6 | 201409,103652 7 | 201568,12564 8 | 201646,110697 9 | 201669,124997 10 | 201995,126146 11 | 202319,47693 12 | 202340,25783 13 | 203029,75877 14 | 203309,15106 15 | 203624,62351 16 | 204535,154636 17 | 205066,97025 18 | 205607,138778 19 | 205632,69355 20 | 205644,32465 21 | 205855,95850 22 | 206136,168651 23 | 206143,154636 24 | 206276,44854 25 | 206484,57018 26 | 206534,24371 27 | 206544,18412 28 | 206593,100589 29 | 206606,56792 30 | 206863,134289 31 | 206879,169819 32 | 206921,103234 33 | 206923,5825 34 | 206967,80462 35 | 206978,145946 36 | 206980,154636 37 | 206995,115851 38 | 207029,24371 39 | 207047,154636 40 | 207048,111999 41 | 207073,152478 42 | 207079,154636 43 | 207089,146704 44 | 207097,153645 45 | 207138,164252 46 | 207145,63006 47 | 207147,58475 48 | 207155,18412 49 | 207157,46911 50 | 207172,31662 51 | 207174,47193 52 | 207210,6533 53 | 207215,32774 54 | 207220,110279 55 | 207236,153551 56 | 207237,124997 57 | 207240,147796 58 | 207259,63006 59 | 207274,69355 60 | 207276,32465 61 | 207277,5505 62 | 207288,12564 63 | 207289,21147 64 | 207307,124997 65 | 207334,154636 66 | 207336,164318 67 | 207341,84409 68 | 207351,12564 69 | 207357,154636 70 | 207363,128988 71 | 207378,25783 72 | 207385,164318 73 | 207390,18412 74 | 207391,93295 75 | 207433,126535 76 | 207450,18412 77 | 207458,107012 78 | 207464,149641 79 | 207468,5505 80 | 207473,134084 81 | 207483,32465 82 | 207493,68767 83 | 207494,152478 84 | 207511,86942 85 | 207513,15106 86 | 207519,128988 87 | 207522,12564 88 | 207526,63006 89 | 207534,154636 90 | 207545,57018 91 | 207552,44854 92 | 207554,88295 93 | 207601,81708 94 | 207900,8466 95 | 207982,14163 96 | 208000,79520 97 | 208048,147796 98 | 208812,60230 99 | 208902,98253 100 | 208945,65520 101 | 209032,149641 102 | 209101,57018 103 | 209113,169819 104 | 209357,146704 105 | 210325,63006 106 | 210339,125110 107 | 210412,24371 108 | 210646,24371 109 | 210801,109083 110 | 211309,123773 111 | 211454,144267 112 | 211633,18412 113 | 211928,69355 114 | 212145,144267 115 | 212752,5825 116 | 212813,52343 117 | 212864,63006 118 | 212996,31662 119 | 213485,149641 120 | 213691,63006 121 | 213709,142140 122 | 213743,65520 123 | 213968,103234 124 | 214052,12564 125 | 214064,116489 126 | 214096,31662 127 | 214233,31662 128 | 214645,31662 129 | 214839,31662 130 | 215023,154636 131 | 215329,84389 132 | 215512,78694 133 | 215525,48895 134 | 215604,25783 135 | 215986,68615 136 | 216389,21457 137 | 217041,6533 138 | 217575,12564 139 | 217763,126146 140 | 217924,154636 141 | 218012,32465 142 | 218204,63006 143 | 218460,37638 144 | 218540,62351 145 | 218616,21147 146 | 218664,65520 147 | 218769,164258 148 | 218801,133477 149 | 219060,124997 150 | 219543,61531 151 | 219786,5505 152 | 219825,47193 153 | 219964,5825 154 | 219973,24371 155 | 220224,39625 156 | 220282,154732 157 | 220362,166876 158 | 220724,69209 159 | 221414,164252 160 | 221488,63006 161 | 221703,63229 162 | 221992,52343 163 | 222037,13843 164 | 222138,157827 165 | 222386,154636 166 | 222436,62351 167 | 222559,75836 168 | 222683,114640 169 | 222777,145974 170 | 222830,63006 171 | 222915,149854 172 | 223125,84389 173 | 223486,36371 174 | 223578,28295 175 | 223700,19762 176 | 223769,84409 177 | 224144,119979 178 | 224533,88295 179 | 224811,39425 180 | 224906,26372 181 | 225249,111741 182 | 225390,37957 183 | 225515,79636 184 | 225780,154636 185 | 225855,88295 186 | 225886,145974 187 | 225890,92909 188 | 226388,152872 189 | 226401,39625 190 | 226546,116489 191 | 226679,75525 192 | 226862,97948 193 | 227236,5505 194 | 227245,83144 195 | 227363,153551 196 | 227840,14163 197 | 227841,22876 198 | 227849,31493 199 | 227959,47107 200 | 228419,52343 201 | 228718,103652 202 | 228774,63006 203 | 229060,128747 204 | 229250,154636 205 | 229435,63006 206 | 229671,113305 207 | 230224,154636 208 | 230285,169819 209 | 230471,57161 210 | 230571,63006 211 | 230761,70352 212 | 230974,9702 213 | 230989,44854 214 | 231269,145946 215 | 231425,128747 216 | 231462,131300 217 | 231687,6533 218 | 231770,75527 219 | 231803,164544 220 | 231980,147796 221 | 232814,64467 222 | 232984,31662 223 | 232996,88295 224 | 233957,31662 225 | 234139,133029 226 | 234185,38222 227 | 234379,164211 228 | 234872,52343 229 | 235372,75877 230 | 235566,144762 231 | 235887,81708 232 | 236190,161594 233 | 236261,164252 234 | 236338,31662 235 | 236598,12564 236 | 236950,62351 237 | 237153,5505 238 | 237453,18412 239 | 237523,21147 240 | 238004,56792 241 | 238320,152478 242 | 238419,133477 243 | 238606,14163 244 | 238689,123016 245 | 238714,153698 246 | 239113,79520 247 | 239360,63006 248 | 239680,149641 249 | 240666,52343 250 | 240760,281 251 | 240822,135409 252 | 240871,60230 253 | 240924,77571 254 | 241196,166707 255 | 241384,152179 256 | 241651,18412 257 | 241653,31662 258 | 242020,14433 259 | 242259,31662 260 | 242512,116489 261 | 242598,69355 262 | 243466,154636 263 | 243686,166707 264 | 243771,131300 265 | 244001,162658 266 | 244109,59820 267 | 244431,44854 268 | 244552,36307 269 | 244998,160750 270 | 245036,31662 271 | 245751,32465 272 | 245993,44854 273 | 246036,166876 274 | 246211,12564 275 | 246349,117452 276 | 246639,63006 277 | 246664,154636 278 | 246723,169819 279 | 246825,57161 280 | 246893,95850 281 | 246953,31662 282 | 247034,12564 283 | 247346,116489 284 | 247720,103132 285 | 248451,74517 286 | 248490,167310 287 | 248572,36307 288 | 248574,12564 289 | 248604,154636 290 | 248713,38955 291 | 248935,131300 292 | 249030,154636 293 | 249048,13334 294 | 249163,9702 295 | 249370,154636 296 | 249668,12564 297 | 249736,84409 298 | 249987,15259 299 | 250261,18412 300 | 250305,24371 301 | 250609,31662 302 | 250652,30827 303 | 250674,128747 304 | 250690,5825 305 | 251262,121950 306 | 251309,154636 307 | 251443,131300 308 | 251498,70352 309 | 251511,14433 310 | 251560,12564 311 | 251712,14433 312 | 251755,57161 313 | 251861,165814 314 | 252238,65657 315 | 252923,152478 316 | 253233,90521 317 | 253732,48895 318 | 254179,73469 319 | 254454,31662 320 | 254643,25409 321 | 254797,149641 322 | 255053,161641 323 | 255317,83144 324 | 255365,135409 325 | 255701,52343 326 | 255953,153698 327 | 256534,116489 328 | 256571,166707 329 | 256681,31662 330 | 256833,47895 331 | 256862,111225 332 | 257052,100744 333 | 257134,52343 334 | 257138,116489 335 | 257305,62351 336 | 257599,15259 337 | 257621,154636 338 | 257898,128988 339 | 258197,36685 340 | 258422,126146 341 | 258529,113486 342 | 258735,57018 343 | 258784,31662 344 | 259260,63006 345 | 260074,12564 346 | 260185,171182 347 | 260329,117452 348 | 260954,12564 349 | 262167,69209 350 | 262208,154636 351 | 262513,63006 352 | 262884,154636 353 | 263056,18412 354 | 263525,111225 355 | 263572,154636 356 | 263598,68615 357 | 264189,36307 358 | 264197,65520 359 | 264402,154636 360 | 264616,138151 361 | 264647,18412 362 | 264872,32465 363 | 265049,57161 364 | 265571,149641 365 | 265610,57161 366 | 265666,31662 367 | 265690,285 368 | 266014,109728 369 | 266966,73842 370 | 267010,90621 371 | 267625,3067 372 | 267651,124507 373 | 268196,154636 374 | 268485,65520 375 | 268487,69209 376 | 268564,123350 377 | 268649,126146 378 | 269328,138778 379 | 269595,20308 380 | 269682,31662 381 | 269781,47193 382 | 269984,138151 383 | 270311,63006 384 | 270348,151327 385 | 270552,125756 386 | 270654,164215 387 | 270664,152478 388 | 270787,154636 389 | 271234,18412 390 | 271419,59820 391 | 271628,47193 392 | 271989,124997 393 | 272123,62863 394 | 272420,101181 395 | 272618,44854 396 | 273237,138778 397 | 273417,154636 398 | 273530,140807 399 | 274051,154636 400 | 274276,65520 401 | 274289,142667 402 | 274370,5825 403 | 274553,154636 404 | 274616,63006 405 | 275077,24371 406 | 275119,24371 407 | 275414,12564 408 | 275530,145946 409 | 275634,154636 410 | 275769,40336 411 | 275815,68615 412 | 276310,63006 413 | 276380,63006 414 | 276549,32865 415 | 276682,109083 416 | 277164,150496 417 | 277327,32465 418 | 277618,63006 419 | 277732,31662 420 | 277966,47895 421 | 278405,52343 422 | 278446,162658 423 | 278989,89802 424 | 279040,131930 425 | 279079,86942 426 | 279198,126146 427 | 279436,32465 428 | 279866,32465 429 | 279883,145990 430 | 279943,60861 431 | 280250,65520 432 | 280664,160476 433 | 280777,69355 434 | 280786,31662 435 | 280800,38222 436 | 281404,31628 437 | 281411,126146 438 | 281649,117008 439 | 281876,154636 440 | 282091,128988 441 | 282244,47193 442 | 282670,166707 443 | 282731,149641 444 | 283014,67241 445 | 283182,149854 446 | 283335,154636 447 | 283477,12564 448 | 283601,5505 449 | 284242,13785 450 | 284432,18412 451 | 284447,44854 452 | 284471,107090 453 | 284663,31662 454 | 284735,154636 455 | 285168,156646 456 | 285478,109728 457 | 285529,166707 458 | 285586,154636 459 | 285597,149641 460 | 285747,47193 461 | 286050,18412 462 | 286338,75877 463 | 286741,84389 464 | 286766,164258 465 | 286861,12564 466 | 287180,155707 467 | 287186,44854 468 | 287249,65520 469 | 287718,69355 470 | 288131,6486 471 | 288172,126146 472 | 288205,93295 473 | 288416,5825 474 | 288549,21147 475 | 289273,154636 476 | 289375,13334 477 | 289428,18412 478 | 289558,69355 479 | 290074,12564 480 | 290076,151327 481 | 290259,152478 482 | 290525,147796 483 | 290923,63006 484 | 291193,24371 485 | 291403,6533 486 | 291621,65520 487 | 291838,24371 488 | 291885,169819 489 | 292534,108399 490 | 292592,79520 491 | 292766,154636 492 | 293521,17242 493 | 293609,81462 494 | 294224,154636 495 | 294251,169819 496 | 294508,12564 497 | 294578,106622 498 | 294696,32465 499 | 294792,58475 500 | 294803,52343 501 | 294887,128747 502 | 294977,107012 503 | 295210,154636 504 | 295940,128988 505 | 296199,156064 506 | 296220,60586 507 | 296479,47193 508 | 296699,154636 509 | 296717,76959 510 | 296877,63006 511 | 296926,166549 512 | 297250,65233 513 | 297264,57018 514 | 297622,154636 515 | 298037,154636 516 | 298241,88295 517 | 298444,112513 518 | 298461,65520 519 | 298507,97025 520 | 298513,63006 521 | 298529,142140 522 | 298551,37995 523 | 298592,154636 524 | 298633,44854 525 | 298697,47895 526 | 298795,63006 527 | 298993,5505 528 | 299117,62863 529 | 299277,57062 530 | 299343,75489 531 | 299569,57161 532 | 299721,164258 533 | 299781,154636 534 | 299869,128747 535 | 300118,9702 536 | 300258,57161 537 | 300259,57018 538 | 300275,57018 539 | 301106,84409 540 | 301809,154636 541 | 301987,7746 542 | 302029,63006 543 | 302115,68767 544 | 302166,95850 545 | 302241,154636 546 | 302853,149854 547 | 302865,131300 548 | 302912,36307 549 | 303590,61531 550 | 303799,157120 551 | 303874,52195 552 | 304114,142477 553 | 304141,160208 554 | 304214,166707 555 | 304248,57018 556 | 304478,101181 557 | 304731,154636 558 | 305131,65520 559 | 305275,63006 560 | -------------------------------------------------------------------------------- /submission/submit_20170519_sw_0.09512.csv: -------------------------------------------------------------------------------- 1 | user_id," sku_id" 2 | 200288,18412 3 | 200585,65520 4 | 200808,154636 5 | 201342,39425 6 | 201409,103652 7 | 201568,12564 8 | 201646,110697 9 | 201669,124997 10 | 201995,126146 11 | 202319,47693 12 | 202340,25783 13 | 203029,75877 14 | 203309,15106 15 | 203327,124997 16 | 203624,62351 17 | 204399,44854 18 | 204535,154636 19 | 204915,154636 20 | 205066,97025 21 | 205607,138778 22 | 205632,69355 23 | 205644,32465 24 | 205855,95850 25 | 206136,168651 26 | 206143,154636 27 | 206276,44854 28 | 206336,14163 29 | 206484,57018 30 | 206534,24371 31 | 206544,18412 32 | 206593,100589 33 | 206606,56792 34 | 206863,134289 35 | 206879,169819 36 | 206921,103234 37 | 206923,5825 38 | 206933,164258 39 | 206934,63006 40 | 206943,44854 41 | 206960,12564 42 | 206967,80462 43 | 206974,12564 44 | 206978,145946 45 | 206980,154636 46 | 206987,57018 47 | 206989,151992 48 | 206994,57018 49 | 206995,115851 50 | 206996,57018 51 | 206997,107012 52 | 207000,32465 53 | 207009,78335 54 | 207012,32465 55 | 207015,12564 56 | 207016,154636 57 | 207017,5825 58 | 207018,90294 59 | 207021,75877 60 | 207023,63844 61 | 207026,31662 62 | 207029,24371 63 | 207047,154636 64 | 207048,111999 65 | 207055,117013 66 | 207061,154636 67 | 207069,88295 68 | 207070,154636 69 | 207073,152478 70 | 207079,154636 71 | 207084,62863 72 | 207088,162658 73 | 207089,146704 74 | 207095,154636 75 | 207096,21147 76 | 207097,153645 77 | 207099,63006 78 | 207107,63006 79 | 207121,15259 80 | 207127,152478 81 | 207138,164252 82 | 207142,152478 83 | 207145,63006 84 | 207147,58475 85 | 207155,18412 86 | 207157,46911 87 | 207172,31662 88 | 207174,47193 89 | 207176,154636 90 | 207202,154636 91 | 207210,6533 92 | 207215,32774 93 | 207220,110279 94 | 207224,77723 95 | 207233,154636 96 | 207236,153551 97 | 207237,124997 98 | 207240,147796 99 | 207243,132417 100 | 207253,74141 101 | 207259,63006 102 | 207268,75877 103 | 207274,69355 104 | 207276,32465 105 | 207277,5505 106 | 207279,18412 107 | 207286,49126 108 | 207288,12564 109 | 207289,21147 110 | 207295,145946 111 | 207299,52343 112 | 207301,44854 113 | 207307,124997 114 | 207310,62351 115 | 207317,63006 116 | 207334,154636 117 | 207336,164318 118 | 207338,63006 119 | 207340,63006 120 | 207341,84409 121 | 207344,128988 122 | 207351,12564 123 | 207357,154636 124 | 207358,57018 125 | 207363,128988 126 | 207374,52343 127 | 207377,154636 128 | 207378,25783 129 | 207385,164318 130 | 207390,18412 131 | 207391,93295 132 | 207398,154636 133 | 207400,44854 134 | 207401,128747 135 | 207427,154636 136 | 207433,126535 137 | 207450,18412 138 | 207455,74517 139 | 207457,65520 140 | 207458,107012 141 | 207464,149641 142 | 207468,5505 143 | 207473,134084 144 | 207483,32465 145 | 207493,68767 146 | 207494,152478 147 | 207504,154636 148 | 207511,86942 149 | 207513,15106 150 | 207519,128988 151 | 207522,12564 152 | 207526,63006 153 | 207534,154636 154 | 207544,19233 155 | 207545,57018 156 | 207552,44854 157 | 207554,88295 158 | 207601,81708 159 | 207680,14808 160 | 207789,14163 161 | 207851,149854 162 | 207900,8466 163 | 207982,14163 164 | 208000,79520 165 | 208048,147796 166 | 208482,160476 167 | 208707,60293 168 | 208812,60230 169 | 208902,98253 170 | 208945,65520 171 | 209032,149641 172 | 209087,84409 173 | 209101,57018 174 | 209113,169819 175 | 209357,146704 176 | 210325,63006 177 | 210339,125110 178 | 210412,24371 179 | 210646,24371 180 | 210801,109083 181 | 211309,123773 182 | 211365,101181 183 | 211454,144267 184 | 211633,18412 185 | 211840,15106 186 | 211928,69355 187 | 212145,144267 188 | 212752,5825 189 | 212813,52343 190 | 212864,63006 191 | 212981,24371 192 | 212996,31662 193 | 213298,18412 194 | 213314,116489 195 | 213485,149641 196 | 213691,63006 197 | 213709,142140 198 | 213743,65520 199 | 213968,103234 200 | 213999,113319 201 | 214052,12564 202 | 214064,116489 203 | 214096,31662 204 | 214233,31662 205 | 214645,31662 206 | 214839,31662 207 | 215023,154636 208 | 215329,84389 209 | 215512,78694 210 | 215525,48895 211 | 215573,128988 212 | 215585,21147 213 | 215604,25783 214 | 215986,68615 215 | 216389,21457 216 | 217001,9702 217 | 217041,6533 218 | 217119,75525 219 | 217393,69355 220 | 217560,18412 221 | 217575,12564 222 | 217763,126146 223 | 217924,154636 224 | 218012,32465 225 | 218204,63006 226 | 218460,37638 227 | 218540,62351 228 | 218616,21147 229 | 218664,65520 230 | 218769,164258 231 | 218801,133477 232 | 219060,124997 233 | 219543,61531 234 | 219674,123773 235 | 219786,5505 236 | 219825,47193 237 | 219964,5825 238 | 219973,24371 239 | 220047,154636 240 | 220224,39625 241 | 220282,154732 242 | 220362,166876 243 | 220724,69209 244 | 221414,164252 245 | 221415,134084 246 | 221488,63006 247 | 221703,63229 248 | 221716,21147 249 | 221992,52343 250 | 222037,13843 251 | 222138,157827 252 | 222386,154636 253 | 222436,62351 254 | 222559,75836 255 | 222683,114640 256 | 222777,145974 257 | 222830,63006 258 | 222915,149854 259 | 223125,84389 260 | 223486,36371 261 | 223578,28295 262 | 223700,19762 263 | 223769,84409 264 | 224144,119979 265 | 224533,88295 266 | 224811,39425 267 | 224906,26372 268 | 225249,111741 269 | 225387,147796 270 | 225390,37957 271 | 225515,79636 272 | 225780,154636 273 | 225855,88295 274 | 225886,145974 275 | 225890,92909 276 | 226019,62085 277 | 226388,152872 278 | 226401,39625 279 | 226546,116489 280 | 226679,75525 281 | 226862,97948 282 | 227236,5505 283 | 227245,83144 284 | 227363,153551 285 | 227840,14163 286 | 227841,22876 287 | 227849,31493 288 | 227959,47107 289 | 228419,52343 290 | 228689,10321 291 | 228718,103652 292 | 228774,63006 293 | 229060,128747 294 | 229250,154636 295 | 229435,63006 296 | 229671,113305 297 | 229688,149641 298 | 229857,11640 299 | 230224,154636 300 | 230285,169819 301 | 230307,83144 302 | 230367,57018 303 | 230471,57161 304 | 230571,63006 305 | 230761,70352 306 | 230974,9702 307 | 230989,44854 308 | 231269,145946 309 | 231425,128747 310 | 231462,131300 311 | 231687,6533 312 | 231770,75527 313 | 231803,164544 314 | 231980,147796 315 | 232573,63006 316 | 232814,64467 317 | 232984,31662 318 | 232996,88295 319 | 233193,44854 320 | 233957,31662 321 | 234139,133029 322 | 234185,38222 323 | 234364,154636 324 | 234379,164211 325 | 234740,168651 326 | 234872,52343 327 | 234987,144170 328 | 235372,75877 329 | 235566,144762 330 | 235732,116489 331 | 235887,81708 332 | 236190,161594 333 | 236261,164252 334 | 236338,31662 335 | 236598,12564 336 | 236950,62351 337 | 237153,5505 338 | 237364,5825 339 | 237453,18412 340 | 237523,21147 341 | 237864,69209 342 | 238004,56792 343 | 238320,152478 344 | 238419,133477 345 | 238512,84409 346 | 238606,14163 347 | 238671,169819 348 | 238689,123016 349 | 238714,153698 350 | 239113,79520 351 | 239360,63006 352 | 239680,149641 353 | 240666,52343 354 | 240760,281 355 | 240822,135409 356 | 240871,60230 357 | 240900,125756 358 | 240924,77571 359 | 241196,166707 360 | 241384,152179 361 | 241651,18412 362 | 241653,31662 363 | 242020,14433 364 | 242259,31662 365 | 242512,116489 366 | 242598,69355 367 | 243466,154636 368 | 243686,166707 369 | 243771,131300 370 | 244001,162658 371 | 244109,59820 372 | 244431,44854 373 | 244552,36307 374 | 244785,12564 375 | 244998,160750 376 | 245036,31662 377 | 245751,32465 378 | 245993,44854 379 | 246036,166876 380 | 246211,12564 381 | 246349,117452 382 | 246435,55901 383 | 246639,63006 384 | 246664,154636 385 | 246723,169819 386 | 246825,57161 387 | 246893,95850 388 | 246953,31662 389 | 247034,12564 390 | 247346,116489 391 | 247720,103132 392 | 248065,63006 393 | 248451,74517 394 | 248484,126146 395 | 248490,167310 396 | 248498,57161 397 | 248572,36307 398 | 248574,12564 399 | 248604,154636 400 | 248713,38955 401 | 248914,68767 402 | 248935,131300 403 | 249030,154636 404 | 249048,13334 405 | 249163,9702 406 | 249370,154636 407 | 249668,12564 408 | 249736,84409 409 | 249987,15259 410 | 250261,18412 411 | 250305,24371 412 | 250586,65520 413 | 250609,31662 414 | 250652,30827 415 | 250674,128747 416 | 250690,5825 417 | 251262,121950 418 | 251309,154636 419 | 251443,131300 420 | 251498,70352 421 | 251511,14433 422 | 251560,12564 423 | 251712,14433 424 | 251755,57161 425 | 251861,165814 426 | 252238,65657 427 | 252923,152478 428 | 253233,90521 429 | 253732,48895 430 | 254072,154636 431 | 254179,73469 432 | 254398,154636 433 | 254454,31662 434 | 254643,25409 435 | 254797,149641 436 | 255053,161641 437 | 255068,154636 438 | 255317,83144 439 | 255365,135409 440 | 255683,109093 441 | 255701,52343 442 | 255796,62807 443 | 255816,74049 444 | 255953,153698 445 | 256534,116489 446 | 256571,166707 447 | 256681,31662 448 | 256823,63006 449 | 256833,47895 450 | 256862,111225 451 | 257052,100744 452 | 257134,52343 453 | 257138,116489 454 | 257305,62351 455 | 257562,119979 456 | 257599,15259 457 | 257621,154636 458 | 257898,128988 459 | 258197,36685 460 | 258422,126146 461 | 258529,113486 462 | 258735,57018 463 | 258784,31662 464 | 259260,63006 465 | 260074,12564 466 | 260185,171182 467 | 260274,44854 468 | 260329,117452 469 | 260954,12564 470 | 261525,18103 471 | 262167,69209 472 | 262208,154636 473 | 262513,63006 474 | 262884,154636 475 | 262896,68615 476 | 263036,31662 477 | 263056,18412 478 | 263525,111225 479 | 263572,154636 480 | 263598,68615 481 | 264189,36307 482 | 264197,65520 483 | 264402,154636 484 | 264616,138151 485 | 264647,18412 486 | 264862,128747 487 | 264872,32465 488 | 265049,57161 489 | 265571,149641 490 | 265610,57161 491 | 265666,31662 492 | 265690,285 493 | 265946,65520 494 | 266014,109728 495 | 266196,108824 496 | 266966,73842 497 | 267010,90621 498 | 267625,3067 499 | 267651,124507 500 | 268196,154636 501 | 268485,65520 502 | 268487,69209 503 | 268564,123350 504 | 268649,126146 505 | 269328,138778 506 | 269595,20308 507 | 269682,31662 508 | 269781,47193 509 | 269909,147785 510 | 269984,138151 511 | 270311,63006 512 | 270348,151327 513 | 270552,125756 514 | 270654,164215 515 | 270664,152478 516 | 270787,154636 517 | 271234,18412 518 | 271309,55092 519 | 271419,59820 520 | 271628,47193 521 | 271674,68767 522 | 271839,12564 523 | 271922,32465 524 | 271989,124997 525 | 272123,62863 526 | 272420,101181 527 | 272618,44854 528 | 273237,138778 529 | 273417,154636 530 | 273438,63006 531 | 273463,94561 532 | 273530,140807 533 | 274051,154636 534 | 274142,154636 535 | 274276,65520 536 | 274289,142667 537 | 274370,5825 538 | 274553,154636 539 | 274616,63006 540 | 274629,11090 541 | 274645,48895 542 | 275077,24371 543 | 275119,24371 544 | 275380,142140 545 | 275414,12564 546 | 275530,145946 547 | 275634,154636 548 | 275769,40336 549 | 275815,68615 550 | 276281,14163 551 | 276310,63006 552 | 276380,63006 553 | 276549,32865 554 | 276682,109083 555 | 276901,158020 556 | 277164,150496 557 | 277327,32465 558 | 277414,126146 559 | 277618,63006 560 | 277732,31662 561 | 277966,47895 562 | 277983,90521 563 | 278405,52343 564 | 278446,162658 565 | 278452,63006 566 | 278747,156064 567 | 278989,89802 568 | 279040,131930 569 | 279079,86942 570 | 279198,126146 571 | 279436,32465 572 | 279866,32465 573 | 279883,145990 574 | 279943,60861 575 | 280250,65520 576 | 280664,160476 577 | 280777,69355 578 | 280786,31662 579 | 280800,38222 580 | 281404,31628 581 | 281411,126146 582 | 281649,117008 583 | 281698,142140 584 | 281876,154636 585 | 282091,128988 586 | 282244,47193 587 | 282670,166707 588 | 282731,149641 589 | 283014,67241 590 | 283182,149854 591 | 283335,154636 592 | 283477,12564 593 | 283564,131300 594 | 283601,5505 595 | 283628,21147 596 | 283825,63006 597 | 284242,13785 598 | 284432,18412 599 | 284437,61226 600 | 284447,44854 601 | 284471,107090 602 | 284663,31662 603 | 284735,154636 604 | 285168,156646 605 | 285478,109728 606 | 285529,166707 607 | 285586,154636 608 | 285597,149641 609 | 285747,47193 610 | 285935,57018 611 | 286050,18412 612 | 286338,75877 613 | 286741,84389 614 | 286766,164258 615 | 286861,12564 616 | 287180,155707 617 | 287186,44854 618 | 287249,65520 619 | 287718,69355 620 | 288131,6486 621 | 288172,126146 622 | 288205,93295 623 | 288416,5825 624 | 288549,21147 625 | 288720,128988 626 | 289273,154636 627 | 289375,13334 628 | 289428,18412 629 | 289558,69355 630 | 289742,36371 631 | 289844,154636 632 | 290074,12564 633 | 290076,151327 634 | 290259,152478 635 | 290525,147796 636 | 290923,63006 637 | 291193,24371 638 | 291403,6533 639 | 291621,65520 640 | 291838,24371 641 | 291885,169819 642 | 291988,81462 643 | 292534,108399 644 | 292592,79520 645 | 292766,154636 646 | 293521,17242 647 | 293609,81462 648 | 294224,154636 649 | 294251,169819 650 | 294508,12564 651 | 294578,106622 652 | 294696,32465 653 | 294792,58475 654 | 294803,52343 655 | 294887,128747 656 | 294977,107012 657 | 295210,154636 658 | 295909,63006 659 | 295940,128988 660 | 295982,107012 661 | 296059,68767 662 | 296199,156064 663 | 296220,60586 664 | 296479,47193 665 | 296699,154636 666 | 296717,76959 667 | 296877,63006 668 | 296926,166549 669 | 297025,154636 670 | 297143,140121 671 | 297250,65233 672 | 297264,57018 673 | 297622,154636 674 | 298037,154636 675 | 298228,52343 676 | 298241,88295 677 | 298444,112513 678 | 298461,65520 679 | 298507,97025 680 | 298513,63006 681 | 298529,142140 682 | 298551,37995 683 | 298592,154636 684 | 298633,44854 685 | 298697,47895 686 | 298795,63006 687 | 298973,166707 688 | 298993,5505 689 | 299117,62863 690 | 299277,57062 691 | 299343,75489 692 | 299569,57161 693 | 299721,164258 694 | 299781,154636 695 | 299869,128747 696 | 300118,9702 697 | 300258,57161 698 | 300259,57018 699 | 300275,57018 700 | 300305,149641 701 | 300629,63006 702 | 301106,84409 703 | 301809,154636 704 | 301987,7746 705 | 302029,63006 706 | 302115,68767 707 | 302166,95850 708 | 302241,154636 709 | 302853,149854 710 | 302865,131300 711 | 302912,36307 712 | 302982,63006 713 | 303261,138778 714 | 303590,61531 715 | 303799,157120 716 | 303874,52195 717 | 304114,142477 718 | 304141,160208 719 | 304214,166707 720 | 304248,57018 721 | 304478,101181 722 | 304531,154636 723 | 304661,169819 724 | 304720,18412 725 | 304731,154636 726 | 305131,65520 727 | 305275,63006 728 | -------------------------------------------------------------------------------- /submission/提交结果: -------------------------------------------------------------------------------- 1 | 2 | --------------------------------------------------------------------------------