├── README.md ├── 代码 ├── 模型训练 │ ├── XGB.txt │ └── 输入特征.txt ├── 特征工程 │ ├── 1_先还原成原始销量.txt │ ├── 2_分省市车型按时间统计销量.txt │ ├── 3_0_生成oc特征.txt │ ├── 3_1_拼上oc特征.txt │ ├── 3_2_将车型部分连续特征离散化.txt │ ├── 4_1_省市车型时间.txt │ ├── 4_2_ssm_车型.txt │ ├── 4_3_fdfrfir_车型.txt │ ├── 5_0_分出轿车suvmpv等类别.txt │ ├── 5_1_2_ssm_车类型.txt │ ├── 5_1_3_fdfrfir_车类型.txt │ ├── 5_1_省市车类型时间.txt │ ├── 5_2_0_按排量分.txt │ ├── 5_2_1_省市排量时间.txt │ ├── 5_2_2_ssm_排量.txt │ ├── 5_2_3_fdfrfir_排量.txt │ ├── 5_3_0_按新能源类别分.txt │ ├── 5_3_1_省市新能源时间.txt │ ├── 5_3_2_ssm_新能源.txt │ ├── 5_3_3_fdfrfir_新能源.txt │ ├── 6_0_1_省市时间开窗之车型.txt │ ├── 6_0_2_省市时间开窗之车类型.txt │ ├── 6_0_3_省市时间开窗之排量.txt │ ├── 6_0_4_省市时间开窗之新能源.txt │ ├── 7_1_省市车型历史交叉.txt │ ├── 7_2_省市车类型交叉.txt │ ├── 7_3_省市排量交叉.txt │ ├── 7_4_省市新能源交叉.txt │ ├── 8_0_时间之年月.txt │ └── 选出二月的作为训练集.txt └── 预处理 │ └── 分车型看乘的倍数.txt ├── 代码说明.md ├── 初赛相关 └── src │ ├── feature_extraction │ ├── EDA.ipynb │ ├── xiao_utils.py │ ├── 单纯车型信息特征.ipynb │ ├── 历史品牌及车型销量相关特征.ipynb │ ├── 历史品牌销量相关特征.ipynb │ ├── 历史车型销量相关特征.ipynb │ ├── 拼出最终特征.ipynb │ ├── 时间特征.ipynb │ └── 草稿.ipynb │ └── train │ └── train-v0.5.ipynb ├── 截图 ├── 5份数据集拆分.png ├── JOIN_before_f8.png ├── XGB结果处理.png └── 融合.png └── 解题思路.md /README.md: -------------------------------------------------------------------------------- 1 | # yancheng-sales 2 | [天池-印象盐城-乘用车销量预测大赛](https://tianchi.aliyun.com/competition/information.htm?spm=5176.100068.5678.2.45005e866tTWxC&raceId=231640) 的复赛B榜14名的渣渣分享一下渣思路和渣代码...希望能造福一部分比我还新的新手...   3 | 4 | ## 复赛 5 | 6 | - 解题思路见`解题思路.md`;   7 | - `截图`目录放的是PAI平台的几张截图; 8 | - 代码见`代码`目录;   9 | - 代码说明见`代码说明.md`。 10 | 11 | ## 初赛 12 | 13 | 初赛线下做,复赛线上平台做。上面说的截图和代码都是复赛的。 14 | 初赛特征跟复赛类似,只是有些复赛能做的特征,初赛的数据做不出来,自然就没有啦,偷个懒就不再详述初赛特征了。   15 | 16 | 代码见目录`初赛相关/src/`,其中`feature_extraction`下是各类特征的提取,特征含义见注释;先分别提取各类特征后,执行`拼出最终特征.ipynb`中的代码即可得到完整的提完特征的数据集,再运行`src/train/train-v0.5.ipynb`中的代码即可生成结果。其他的思路啥的,见`.ipynb`中的说明以及复赛的解题思路说明。 17 | 18 | **不过**有很多特征是应当被丢弃的。这个版本里训练时使用的可能并不是最好的一组特征。 19 | -------------------------------------------------------------------------------- /代码/模型训练/XGB.txt: -------------------------------------------------------------------------------- 1 | drop table if exists xiao_xgb_pred; 2 | DROP OFFLINEMODEL IF EXISTS xiao_xgb_model; 3 | 4 | -- train 5 | PAI 6 | -name xgboost 7 | -project algo_public 8 | -Deta="${eta}" 9 | -Dobjective="reg:linear" 10 | -DitemDelimiter="," 11 | -Dseed="0" 12 | -Dnum_round="${round}" 13 | -DlabelColName="real_sale" 14 | --训练集 15 | -DinputTableName="${train_X}" 16 | -DenableSparse="false" 17 | -Dmax_depth="8" 18 | -Dsubsample="0.4" 19 | -Dcolsample_bytree="0.6" 20 | -DmodelName="xiao_xgb_model" -- 模型保存的名字 21 | -Dgamma="0" 22 | -Dlambda="50" 23 | -DfeatureColNames="${features}" 24 | -Dbase_score="0.11" 25 | -Dmin_child_weight="100" 26 | -DkvDelimiter=":"; 27 | 28 | 29 | -- predict 30 | PAI 31 | -name prediction 32 | -project algo_public 33 | -DdetailColName="prediction_detail" 34 | -DappendColNames="predict_date,province_id,city_id,class_id,times"--多带了times,用于算出预测分数后乘回倍数 35 | -DmodelName="xiao_xgb_model" -- 模型保存的名字 36 | -DitemDelimiter="," 37 | -DresultColName="prediction_result" 38 | -Dlifecycle="28" 39 | -DoutputTableName="xiao_xgb_pred" --输出结果 40 | -DscoreColName="prediction_score" 41 | -DkvDelimiter=":" 42 | -DfeatureColNames="${features}" 43 | --测试集 44 | -DinputTableName="${test_X}" 45 | -DenableSparse="false"; 46 | 47 | 48 | --融合 49 | -- alter table yc_result_submit_a rename to yc_result_submit_a_9257; 50 | -- alter table yc_result_submit_a_9257 change column predict_quantity rename to p_q_9257; 51 | 52 | drop table if exists xgb_for_ensemble; 53 | create table xgb_for_ensemble as 54 | select predict_date,province_id,city_id,class_id, if(prediction_score<0, 1, prediction_score) as prediction_score, times 55 | from xiao_xgb_pred; 56 | 57 | -- create table table_ensemble as 58 | -- select a.predict_date, a.province_id, a.city_id, a.class_id, a.p_q_9257, b.predict_quantity as p_q_xgb from yc_result_submit_a_9257 a 59 | -- left outer join xgb_for_ensemble b on a.province_id=b.province_id and a.city_id=b.city_id and a.class_id=b.class_id; 60 | 61 | -- create table yc_result_submit_a as 62 | -- select predict_date,province_id,city_id,class_id, (p_q_9257 + p_q_xgb)*0.5 as predict_quantity 63 | -- from table_ensemble; -------------------------------------------------------------------------------- /代码/特征工程/1_先还原成原始销量.txt: -------------------------------------------------------------------------------- 1 | -- alter table yc_sales_with_times_and_real_sale change column sale_quantity rename to real_sale; 2 | -- alter table yc_sales_with_times_and_real_sale change column twisted_sale_qty rename to sale_quantity; 3 | 4 | DROP TABLE IF EXISTS yc_sales_with_times_and_real_sale; 5 | 6 | CREATE TABLE yc_sales_with_times_and_real_sale 7 | AS 8 | SELECT a.*, b.times 9 | , CASE 10 | WHEN b.times IS NOT NULL THEN a.sale_quantity / b.times 11 | ELSE a.sale_quantity 12 | END AS real_sale 13 | FROM yc_passenger_car_sales a 14 | LEFT OUTER JOIN times b 15 | ON a.class_id = b.class_id; 16 | 17 | DROP TABLE IF EXISTS sale_of_class; 18 | 19 | CREATE TABLE sale_of_class 20 | AS 21 | SELECT class_id, SUM(real_sale) AS real_sale, AVG(times) as times , COUNT(*) as count_of_record 22 | FROM yc_sales_with_times_and_real_sale 23 | GROUP BY class_id; 24 | 25 | -- SELECT * 26 | -- FROM sale_of_class 27 | -- ORDER BY count_of_record DESC 28 | -- LIMIT 15000000; 29 | 30 | select times, sum(count_of_record) as count from sale_of_class group by times; -------------------------------------------------------------------------------- /代码/特征工程/2_分省市车型按时间统计销量.txt: -------------------------------------------------------------------------------- 1 | -- 统计 2 | DROP TABLE IF EXISTS sale_only_train; 3 | 4 | CREATE TABLE sale_only_train 5 | AS 6 | SELECT sale_date, province_id, city_id, class_id 7 | , SUM(sale_quantity) AS sale_quantity, SUM(real_sale) AS real_sale 8 | , AVG(times) AS times 9 | FROM yc_sales_with_times_and_real_sale 10 | GROUP BY province_id, 11 | city_id, 12 | class_id, 13 | sale_date; 14 | 15 | --测试集加上times列 16 | drop table if exists test_b_with_times; 17 | create table test_b_with_times as 18 | SELECT a.*, b.times 19 | FROM yc_result_sample_b a 20 | LEFT OUTER JOIN times b 21 | ON a.class_id = b.class_id; 22 | -- 把测试集拼上来 23 | DROP TABLE IF EXISTS whole_f2_step1; 24 | 25 | CREATE TABLE whole_f2_step1 26 | AS 27 | SELECT * 28 | FROM sale_only_train 29 | UNION ALL 30 | SELECT predict_date AS sale_date, province_id, city_id, class_id, predict_quantity AS sale_quantity 31 | , NULL AS real_sale, times 32 | FROM test_b_with_times; 33 | 34 | alter table whole_f2_step1 rename to whole_f2; -------------------------------------------------------------------------------- /代码/特征工程/3_0_生成oc特征.txt: -------------------------------------------------------------------------------- 1 | drop table if exists helper_for_features_on_class; 2 | create table helper_for_features_on_class as 3 | SELECT class_id, brand_id, compartment 4 | , displacement, if_charging, driven_type_id, fuel_type_id 5 | , newenergy_type_id, emission_standards_id, if_mpv_id, if_luxurious_id, power 6 | , car_length, car_width, car_height, rated_passenger 7 | FROM prj_tc_231640_121545_n6t109.yc_passenger_car_sales; 8 | 9 | drop table if exists oc_origin; 10 | create table oc_origin as 11 | select class_id,brand_id,compartment,driven_type_id,fuel_type_id,newenergy_type_id,emission_standards_id,if_mpv_id,if_luxurious_id,car_length,car_width,car_height 12 | , cast(substring(displacement, 1, 3) as double) as displacement 13 | , case if_charging when 'L' then 0 when 'T' then 1 else 2 end as if_charging 14 | , cast(substring(rated_passenger, -1, 1) as bigint) as rated_passenger 15 | , case when power is not null then cast(split_part(power, '/', 1) as double) else null end as power 16 | from helper_for_features_on_class; 17 | 18 | --分组弄出OC表 19 | drop table if exists oc; 20 | create table oc as 21 | select class_id 22 | , median(brand_id) as brand_id_median 23 | , avg(compartment) as compartment_avg, median(compartment) as compartment_median, max(compartment) as compartment_max, min(compartment) as compartment_min 24 | , median(driven_type_id) as driven_type_id_median, max(driven_type_id) as driven_type_id_max, min(driven_type_id) as driven_type_id_min 25 | , avg(fuel_type_id) as fuel_type_id_avg, median(fuel_type_id) as fuel_type_id_median, max(fuel_type_id) as fuel_type_id_max, min(fuel_type_id) as fuel_type_id_min 26 | , avg(newenergy_type_id) as newenergy_type_id_avg, median(newenergy_type_id) as newenergy_type_id_median, max(newenergy_type_id) as newenergy_type_id_max, min(newenergy_type_id) as newenergy_type_id_min 27 | , avg(emission_standards_id) as emission_standards_id_avg, median(emission_standards_id) as emission_standards_id_median, max(emission_standards_id) as emission_standards_id_max, min(emission_standards_id) as emission_standards_id_min 28 | , median(if_mpv_id) as if_mpv_id_median 29 | , median(if_luxurious_id) as if_luxurious_id_median 30 | , avg(car_length) as car_length_avg, median(car_length) as car_length_median, max(car_length) as car_length_max, min(car_length) as car_length_min 31 | , avg(car_width) as car_width_avg, median(car_width) as car_width_median, max(car_width) as car_width_max, min(car_width) as car_width_min 32 | , avg(car_height) as car_height_avg, median(car_height) as car_height_median, max(car_height) as car_height_max, min(car_height) as car_height_min 33 | , median(displacement) as displacement 34 | , median(if_charging) as if_charging 35 | , median(rated_passenger) as rated_passenger 36 | , median(power) as power 37 | from oc_origin group by class_id; -------------------------------------------------------------------------------- /代码/特征工程/3_1_拼上oc特征.txt: -------------------------------------------------------------------------------- 1 | DROP TABLE IF EXISTS whole_f3_step1; 2 | 3 | CREATE TABLE whole_f3_step1 4 | AS 5 | SELECT a.*, b.brand_id_median, b.compartment_avg, b.compartment_median, b.compartment_max 6 | , b.compartment_min, b.driven_type_id_median, b.driven_type_id_max, b.driven_type_id_min, b.fuel_type_id_avg 7 | , b.fuel_type_id_median, b.fuel_type_id_max, b.fuel_type_id_min, b.newenergy_type_id_avg, b.newenergy_type_id_median 8 | , b.newenergy_type_id_max, b.newenergy_type_id_min, b.emission_standards_id_avg, b.emission_standards_id_median, b.emission_standards_id_max 9 | , b.emission_standards_id_min, b.if_mpv_id_median, b.if_luxurious_id_median, b.car_length_avg, b.car_length_median 10 | , b.car_length_max, b.car_length_min, b.car_width_avg, b.car_width_median, b.car_width_max 11 | , b.car_width_min, b.car_height_avg, b.car_height_median, b.car_height_max, b.car_height_min 12 | , b.displacement, b.if_charging, b.rated_passenger, b.power 13 | FROM whole_f2 a 14 | LEFT OUTER JOIN oc b 15 | ON a.class_id = b.class_id; -------------------------------------------------------------------------------- /代码/特征工程/3_2_将车型部分连续特征离散化.txt: -------------------------------------------------------------------------------- 1 | drop table if exists whole_f3_step2 ; 2 | create table whole_f3_step2 as 3 | select * 4 | , case when compartment_median = 2 and car_length_median >= 1600 then 1 else 0 end as if_suv 5 | , case when displacement < 1 then 1 6 | when displacement >=1 and displacement < 1.3 then 2 7 | when displacement >=1.3 and displacement < 1.6 then 3 8 | when displacement >=1.6 and displacement <2.4 then 4 9 | else 5 end as displacement_level 10 | from whole_f3_step1; 11 | 12 | drop table if exists whole_f3_step3; 13 | create table whole_f3_step3 as 14 | select * 15 | , case when if_mpv_id_median = 0 and if_suv = 0 then 1 else 0 end as if_saloon 16 | from whole_f3_step2; 17 | 18 | drop table if exists whole_f3_step4; 19 | create table whole_f3_step4 as 20 | select *, 21 | case when if_saloon = 1 then 1 22 | when if_suv = 1 then 2 23 | when if_mpv_id_median = 1 then 3 24 | else 0 end as type_saloon_suv_mpv 25 | from whole_f3_step3; 26 | 27 | drop table if exists whole_f3; 28 | alter table whole_f3_step4 rename to whole_f3; -------------------------------------------------------------------------------- /代码/特征工程/4_1_省市车型时间.txt: -------------------------------------------------------------------------------- 1 | drop table if exists whole_f4_step2; 2 | create table whole_f4_step2 as 3 | select *, to_date(sale_date, 'yyyymm') as sale_date_dt 4 | from whole_f3; 5 | 6 | 7 | 8 | create table d__has_c_real_som2 as select a.*, b.real_sale as c_real_som2 from whole_f4_step2 a left outer join whole_f4_step2 b on a.sale_date_dt = dateadd(b.sale_date_dt, 2, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 9 | create table d__has_c_real_som3 as select a.*, b.real_sale as c_real_som3 from d__has_c_real_som2 a left outer join d__has_c_real_som2 b on a.sale_date_dt = dateadd(b.sale_date_dt, 3, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 10 | create table d__has_c_real_som4 as select a.*, b.real_sale as c_real_som4 from d__has_c_real_som3 a left outer join d__has_c_real_som3 b on a.sale_date_dt = dateadd(b.sale_date_dt, 4, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 11 | create table d__has_c_real_som5 as select a.*, b.real_sale as c_real_som5 from d__has_c_real_som4 a left outer join d__has_c_real_som4 b on a.sale_date_dt = dateadd(b.sale_date_dt, 5, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 12 | create table d__has_c_real_som6 as select a.*, b.real_sale as c_real_som6 from d__has_c_real_som5 a left outer join d__has_c_real_som5 b on a.sale_date_dt = dateadd(b.sale_date_dt, 6, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 13 | create table d__has_c_real_som7 as select a.*, b.real_sale as c_real_som7 from d__has_c_real_som6 a left outer join d__has_c_real_som6 b on a.sale_date_dt = dateadd(b.sale_date_dt, 7, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 14 | create table d__has_c_real_som8 as select a.*, b.real_sale as c_real_som8 from d__has_c_real_som7 a left outer join d__has_c_real_som7 b on a.sale_date_dt = dateadd(b.sale_date_dt, 8, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 15 | create table d__has_c_real_som9 as select a.*, b.real_sale as c_real_som9 from d__has_c_real_som8 a left outer join d__has_c_real_som8 b on a.sale_date_dt = dateadd(b.sale_date_dt, 9, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 16 | create table d__has_c_real_som10 as select a.*, b.real_sale as c_real_som10 from d__has_c_real_som9 a left outer join d__has_c_real_som9 b on a.sale_date_dt = dateadd(b.sale_date_dt, 10, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 17 | create table d__has_c_real_som11 as select a.*, b.real_sale as c_real_som11 from d__has_c_real_som10 a left outer join d__has_c_real_som10 b on a.sale_date_dt = dateadd(b.sale_date_dt, 11, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 18 | create table d__has_c_real_som12 as select a.*, b.real_sale as c_real_som12 from d__has_c_real_som11 a left outer join d__has_c_real_som11 b on a.sale_date_dt = dateadd(b.sale_date_dt, 12, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 19 | create table d__has_c_real_som13 as select a.*, b.real_sale as c_real_som13 from d__has_c_real_som12 a left outer join d__has_c_real_som12 b on a.sale_date_dt = dateadd(b.sale_date_dt, 13, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 20 | create table d__has_c_real_som14 as select a.*, b.real_sale as c_real_som14 from d__has_c_real_som13 a left outer join d__has_c_real_som13 b on a.sale_date_dt = dateadd(b.sale_date_dt, 14, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 21 | create table d__has_c_real_som15 as select a.*, b.real_sale as c_real_som15 from d__has_c_real_som14 a left outer join d__has_c_real_som14 b on a.sale_date_dt = dateadd(b.sale_date_dt, 15, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 22 | create table d__has_c_real_som16 as select a.*, b.real_sale as c_real_som16 from d__has_c_real_som15 a left outer join d__has_c_real_som15 b on a.sale_date_dt = dateadd(b.sale_date_dt, 16, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 23 | create table d__has_c_real_som17 as select a.*, b.real_sale as c_real_som17 from d__has_c_real_som16 a left outer join d__has_c_real_som16 b on a.sale_date_dt = dateadd(b.sale_date_dt, 17, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 24 | create table d__has_c_real_som18 as select a.*, b.real_sale as c_real_som18 from d__has_c_real_som17 a left outer join d__has_c_real_som17 b on a.sale_date_dt = dateadd(b.sale_date_dt, 18, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 25 | create table d__has_c_real_som19 as select a.*, b.real_sale as c_real_som19 from d__has_c_real_som18 a left outer join d__has_c_real_som18 b on a.sale_date_dt = dateadd(b.sale_date_dt, 19, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 26 | create table d__has_c_real_som20 as select a.*, b.real_sale as c_real_som20 from d__has_c_real_som19 a left outer join d__has_c_real_som19 b on a.sale_date_dt = dateadd(b.sale_date_dt, 20, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 27 | create table d__has_c_real_som21 as select a.*, b.real_sale as c_real_som21 from d__has_c_real_som20 a left outer join d__has_c_real_som20 b on a.sale_date_dt = dateadd(b.sale_date_dt, 21, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 28 | create table d__has_c_real_som22 as select a.*, b.real_sale as c_real_som22 from d__has_c_real_som21 a left outer join d__has_c_real_som21 b on a.sale_date_dt = dateadd(b.sale_date_dt, 22, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 29 | create table d__has_c_real_som23 as select a.*, b.real_sale as c_real_som23 from d__has_c_real_som22 a left outer join d__has_c_real_som22 b on a.sale_date_dt = dateadd(b.sale_date_dt, 23, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 30 | create table d__has_c_real_som24 as select a.*, b.real_sale as c_real_som24 from d__has_c_real_som23 a left outer join d__has_c_real_som23 b on a.sale_date_dt = dateadd(b.sale_date_dt, 24, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 31 | create table d__has_c_real_som25 as select a.*, b.real_sale as c_real_som25 from d__has_c_real_som24 a left outer join d__has_c_real_som24 b on a.sale_date_dt = dateadd(b.sale_date_dt, 25, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 32 | create table d__has_c_real_som26 as select a.*, b.real_sale as c_real_som26 from d__has_c_real_som25 a left outer join d__has_c_real_som25 b on a.sale_date_dt = dateadd(b.sale_date_dt, 26, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 33 | create table d__has_c_real_som27 as select a.*, b.real_sale as c_real_som27 from d__has_c_real_som26 a left outer join d__has_c_real_som26 b on a.sale_date_dt = dateadd(b.sale_date_dt, 27, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 34 | create table d__has_c_real_som28 as select a.*, b.real_sale as c_real_som28 from d__has_c_real_som27 a left outer join d__has_c_real_som27 b on a.sale_date_dt = dateadd(b.sale_date_dt, 28, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 35 | create table d__has_c_real_som29 as select a.*, b.real_sale as c_real_som29 from d__has_c_real_som28 a left outer join d__has_c_real_som28 b on a.sale_date_dt = dateadd(b.sale_date_dt, 29, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 36 | create table d__has_c_real_som30 as select a.*, b.real_sale as c_real_som30 from d__has_c_real_som29 a left outer join d__has_c_real_som29 b on a.sale_date_dt = dateadd(b.sale_date_dt, 30, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 37 | create table d__has_c_real_som31 as select a.*, b.real_sale as c_real_som31 from d__has_c_real_som30 a left outer join d__has_c_real_som30 b on a.sale_date_dt = dateadd(b.sale_date_dt, 31, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 38 | create table d__has_c_real_som32 as select a.*, b.real_sale as c_real_som32 from d__has_c_real_som31 a left outer join d__has_c_real_som31 b on a.sale_date_dt = dateadd(b.sale_date_dt, 32, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 39 | create table d__has_c_real_som33 as select a.*, b.real_sale as c_real_som33 from d__has_c_real_som32 a left outer join d__has_c_real_som32 b on a.sale_date_dt = dateadd(b.sale_date_dt, 33, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 40 | create table d__has_c_real_som34 as select a.*, b.real_sale as c_real_som34 from d__has_c_real_som33 a left outer join d__has_c_real_som33 b on a.sale_date_dt = dateadd(b.sale_date_dt, 34, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 41 | create table d__has_c_real_som35 as select a.*, b.real_sale as c_real_som35 from d__has_c_real_som34 a left outer join d__has_c_real_som34 b on a.sale_date_dt = dateadd(b.sale_date_dt, 35, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 42 | create table d__has_c_real_som36 as select a.*, b.real_sale as c_real_som36 from d__has_c_real_som35 a left outer join d__has_c_real_som35 b on a.sale_date_dt = dateadd(b.sale_date_dt, 36, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 43 | create table d__has_c_real_som37 as select a.*, b.real_sale as c_real_som37 from d__has_c_real_som36 a left outer join d__has_c_real_som36 b on a.sale_date_dt = dateadd(b.sale_date_dt, 37, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 44 | create table d__has_c_real_som38 as select a.*, b.real_sale as c_real_som38 from d__has_c_real_som37 a left outer join d__has_c_real_som37 b on a.sale_date_dt = dateadd(b.sale_date_dt, 38, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 45 | create table d__has_c_real_som39 as select a.*, b.real_sale as c_real_som39 from d__has_c_real_som38 a left outer join d__has_c_real_som38 b on a.sale_date_dt = dateadd(b.sale_date_dt, 39, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 46 | create table d__has_c_real_som40 as select a.*, b.real_sale as c_real_som40 from d__has_c_real_som39 a left outer join d__has_c_real_som39 b on a.sale_date_dt = dateadd(b.sale_date_dt, 40, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.class_id = b.class_id; 47 | 48 | alter table d__has_c_real_som40 rename to whole_f4_step3; -------------------------------------------------------------------------------- /代码/特征工程/4_2_ssm_车型.txt: -------------------------------------------------------------------------------- 1 | set odps.sql.type.system.odps2=true; 2 | 3 | 4 | create table dr_c_real_ssm_2_3 as select *, c_real_som_2 + c_real_som_3 as c_real_ssm_2_3 from draft_has_c_real_som_37;--首句不一样哦 5 | create table dr_c_real_ssm_2_4 as select *, c_real_ssm_2_3 + c_real_som_4 as c_real_ssm_2_4 from dr_c_real_ssm_2_3; 6 | create table dr_c_real_ssm_2_5 as select *, c_real_ssm_2_4 + c_real_som_5 as c_real_ssm_2_5 from dr_c_real_ssm_2_4; 7 | create table dr_c_real_ssm_2_6 as select *, c_real_ssm_2_5 + c_real_som_6 as c_real_ssm_2_6 from dr_c_real_ssm_2_5; 8 | create table dr_c_real_ssm_2_7 as select *, c_real_ssm_2_6 + c_real_som_7 as c_real_ssm_2_7 from dr_c_real_ssm_2_6; 9 | create table dr_c_real_ssm_2_8 as select *, c_real_ssm_2_7 + c_real_som_8 as c_real_ssm_2_8 from dr_c_real_ssm_2_7; 10 | create table dr_c_real_ssm_2_9 as select *, c_real_ssm_2_8 + c_real_som_9 as c_real_ssm_2_9 from dr_c_real_ssm_2_8; 11 | create table dr_c_real_ssm_2_10 as select *, c_real_ssm_2_9 + c_real_som_10 as c_real_ssm_2_10 from dr_c_real_ssm_2_9; 12 | create table dr_c_real_ssm_2_11 as select *, c_real_ssm_2_10 + c_real_som_11 as c_real_ssm_2_11 from dr_c_real_ssm_2_10; 13 | create table dr_c_real_ssm_2_12 as select *, c_real_ssm_2_11 + c_real_som_12 as c_real_ssm_2_12 from dr_c_real_ssm_2_11; 14 | create table dr_c_real_ssm_2_13 as select *, c_real_ssm_2_12 + c_real_som_13 as c_real_ssm_2_13 from dr_c_real_ssm_2_12; 15 | create table dr_c_real_ssm_2_14 as select *, c_real_ssm_2_13 + c_real_som_14 as c_real_ssm_2_14 from dr_c_real_ssm_2_13; 16 | create table dr_c_real_ssm_2_15 as select *, c_real_ssm_2_14 + c_real_som_15 as c_real_ssm_2_15 from dr_c_real_ssm_2_14; 17 | create table dr_c_real_ssm_2_16 as select *, c_real_ssm_2_15 + c_real_som_16 as c_real_ssm_2_16 from dr_c_real_ssm_2_15; 18 | create table dr_c_real_ssm_2_17 as select *, c_real_ssm_2_16 + c_real_som_17 as c_real_ssm_2_17 from dr_c_real_ssm_2_16; 19 | create table dr_c_real_ssm_2_18 as select *, c_real_ssm_2_17 + c_real_som_18 as c_real_ssm_2_18 from dr_c_real_ssm_2_17; 20 | create table dr_c_real_ssm_2_19 as select *, c_real_ssm_2_18 + c_real_som_19 as c_real_ssm_2_19 from dr_c_real_ssm_2_18; 21 | create table dr_c_real_ssm_2_20 as select *, c_real_ssm_2_19 + c_real_som_20 as c_real_ssm_2_20 from dr_c_real_ssm_2_19; 22 | create table dr_c_real_ssm_2_21 as select *, c_real_ssm_2_20 + c_real_som_21 as c_real_ssm_2_21 from dr_c_real_ssm_2_20; 23 | create table dr_c_real_ssm_2_22 as select *, c_real_ssm_2_21 + c_real_som_22 as c_real_ssm_2_22 from dr_c_real_ssm_2_21; 24 | create table dr_c_real_ssm_2_23 as select *, c_real_ssm_2_22 + c_real_som_23 as c_real_ssm_2_23 from dr_c_real_ssm_2_22; 25 | create table dr_c_real_ssm_2_24 as select *, c_real_ssm_2_23 + c_real_som_24 as c_real_ssm_2_24 from dr_c_real_ssm_2_23; 26 | create table dr_c_real_ssm_2_25 as select *, c_real_ssm_2_24 + c_real_som_25 as c_real_ssm_2_25 from dr_c_real_ssm_2_24; 27 | create table dr_c_real_ssm_2_26 as select *, c_real_ssm_2_25 + c_real_som_26 as c_real_ssm_2_26 from dr_c_real_ssm_2_25; 28 | create table dr_c_real_ssm_2_27 as select *, c_real_ssm_2_26 + c_real_som_27 as c_real_ssm_2_27 from dr_c_real_ssm_2_26; 29 | create table dr_c_real_ssm_2_28 as select *, c_real_ssm_2_27 + c_real_som_28 as c_real_ssm_2_28 from dr_c_real_ssm_2_27; 30 | create table dr_c_real_ssm_2_29 as select *, c_real_ssm_2_28 + c_real_som_29 as c_real_ssm_2_29 from dr_c_real_ssm_2_28; 31 | create table dr_c_real_ssm_2_30 as select *, c_real_ssm_2_29 + c_real_som_30 as c_real_ssm_2_30 from dr_c_real_ssm_2_29; 32 | create table dr_c_real_ssm_2_31 as select *, c_real_ssm_2_30 + c_real_som_31 as c_real_ssm_2_31 from dr_c_real_ssm_2_30; 33 | create table dr_c_real_ssm_2_32 as select *, c_real_ssm_2_31 + c_real_som_32 as c_real_ssm_2_32 from dr_c_real_ssm_2_31; 34 | create table dr_c_real_ssm_2_33 as select *, c_real_ssm_2_32 + c_real_som_33 as c_real_ssm_2_33 from dr_c_real_ssm_2_32; 35 | create table dr_c_real_ssm_2_34 as select *, c_real_ssm_2_33 + c_real_som_34 as c_real_ssm_2_34 from dr_c_real_ssm_2_33; 36 | create table dr_c_real_ssm_2_35 as select *, c_real_ssm_2_34 + c_real_som_35 as c_real_ssm_2_35 from dr_c_real_ssm_2_34; 37 | create table dr_c_real_ssm_2_36 as select *, c_real_ssm_2_35 + c_real_som_36 as c_real_ssm_2_36 from dr_c_real_ssm_2_35; -------------------------------------------------------------------------------- /代码/特征工程/4_3_fdfrfir_车型.txt: -------------------------------------------------------------------------------- 1 | DROP TABLE IF EXISTS whole_f4_3; 2 | 3 | DROP TABLE IF EXISTS dr_c_real_fd_real_fr; 4 | 5 | CREATE TABLE dr_c_real_fd_real_fr 6 | AS 7 | SELECT *, c_real_som_2 - c_real_som_3 AS c_real_fd_2, c_real_som_3 - c_real_som_4 AS c_real_fd_3 8 | , c_real_som_12 - c_real_som_13 AS c_real_fd_12, c_real_som_24 - c_real_som_25 AS c_real_fd_24 9 | , c_real_som_36 - c_real_som_37 AS c_real_fd_36, c_real_som_2 / c_real_som_3 AS c_real_fr_2 10 | , c_real_som_3 / c_real_som_4 AS c_real_fr_3, c_real_som_12 / c_real_som_13 AS c_real_fr_12 11 | , c_real_som_24 / c_real_som_25 AS c_real_fr_24, c_real_som_36 / c_real_som_37 AS c_real_fr_36 12 | , c_real_som_2 - c_real_som_14 as c_real_fd_2_14, c_real_som_3 - c_real_som_15 as c_real_fd_3_15 13 | , c_real_som_2 / c_real_som_14 as c_real_fr_2_14, c_real_som_3 / c_real_som_15 as c_real_fr_3_15 14 | FROM dr_c_real_ssm_2_36; 15 | 16 | -- inc_rate 增长率 c_fir 17 | CREATE TABLE whole_f4_3 18 | AS 19 | SELECT *, c_real_fr_2 - 1 AS c_real_fir_2, c_real_fr_3 - 1 AS c_real_fir_3 20 | , c_real_fr_12 - 1 AS c_real_fir_12, c_real_fr_24 - 1 AS c_real_fir_24 21 | , c_real_fr_36 - 1 AS c_real_fir_36 22 | , c_real_fr_2_14 - 1 as c_real_fir_2_14 23 | , c_real_fr_3_15 - 1 as c_real_fir_3_15 24 | FROM dr_c_real_fd_real_fr; -------------------------------------------------------------------------------- /代码/特征工程/5_0_分出轿车suvmpv等类别.txt: -------------------------------------------------------------------------------- 1 | 2 | drop table if exists whole_f5_step1; 3 | CREATE TABLE whole_f5_step1 4 | AS 5 | SELECT sale_date, province_id, city_id, type_saloon_suv_mpv 6 | , SUM(sale_quantity) AS sale_quantity, SUM(real_sale) AS real_sale 7 | FROM whole_f3 8 | GROUP BY province_id, 9 | city_id, 10 | type_saloon_suv_mpv, 11 | sale_date; 12 | 13 | select * from whole_f5_step1; -------------------------------------------------------------------------------- /代码/特征工程/5_1_2_ssm_车类型.txt: -------------------------------------------------------------------------------- 1 | set odps.sql.type.system.odps2=true; 2 | 3 | 4 | create table dr_c_real_ssm_type_2_3 as select *, c_real_som_type_2 + c_real_som_type_3 as c_real_ssm_type_2_3 from d__has_c_real_som_type_37;--首句不一样哦 5 | create table dr_c_real_ssm_type_2_4 as select *, c_real_ssm_type_2_3 + c_real_som_type_4 as c_real_ssm_type_2_4 from dr_c_real_ssm_type_2_3; 6 | create table dr_c_real_ssm_type_2_5 as select *, c_real_ssm_type_2_4 + c_real_som_type_5 as c_real_ssm_type_2_5 from dr_c_real_ssm_type_2_4; 7 | create table dr_c_real_ssm_type_2_6 as select *, c_real_ssm_type_2_5 + c_real_som_type_6 as c_real_ssm_type_2_6 from dr_c_real_ssm_type_2_5; 8 | create table dr_c_real_ssm_type_2_7 as select *, c_real_ssm_type_2_6 + c_real_som_type_7 as c_real_ssm_type_2_7 from dr_c_real_ssm_type_2_6; 9 | create table dr_c_real_ssm_type_2_8 as select *, c_real_ssm_type_2_7 + c_real_som_type_8 as c_real_ssm_type_2_8 from dr_c_real_ssm_type_2_7; 10 | create table dr_c_real_ssm_type_2_9 as select *, c_real_ssm_type_2_8 + c_real_som_type_9 as c_real_ssm_type_2_9 from dr_c_real_ssm_type_2_8; 11 | create table dr_c_real_ssm_type_2_10 as select *, c_real_ssm_type_2_9 + c_real_som_type_10 as c_real_ssm_type_2_10 from dr_c_real_ssm_type_2_9; 12 | create table dr_c_real_ssm_type_2_11 as select *, c_real_ssm_type_2_10 + c_real_som_type_11 as c_real_ssm_type_2_11 from dr_c_real_ssm_type_2_10; 13 | create table dr_c_real_ssm_type_2_12 as select *, c_real_ssm_type_2_11 + c_real_som_type_12 as c_real_ssm_type_2_12 from dr_c_real_ssm_type_2_11; 14 | create table dr_c_real_ssm_type_2_13 as select *, c_real_ssm_type_2_12 + c_real_som_type_13 as c_real_ssm_type_2_13 from dr_c_real_ssm_type_2_12; 15 | create table dr_c_real_ssm_type_2_14 as select *, c_real_ssm_type_2_13 + c_real_som_type_14 as c_real_ssm_type_2_14 from dr_c_real_ssm_type_2_13; 16 | create table dr_c_real_ssm_type_2_15 as select *, c_real_ssm_type_2_14 + c_real_som_type_15 as c_real_ssm_type_2_15 from dr_c_real_ssm_type_2_14; 17 | create table dr_c_real_ssm_type_2_16 as select *, c_real_ssm_type_2_15 + c_real_som_type_16 as c_real_ssm_type_2_16 from dr_c_real_ssm_type_2_15; 18 | create table dr_c_real_ssm_type_2_17 as select *, c_real_ssm_type_2_16 + c_real_som_type_17 as c_real_ssm_type_2_17 from dr_c_real_ssm_type_2_16; 19 | create table dr_c_real_ssm_type_2_18 as select *, c_real_ssm_type_2_17 + c_real_som_type_18 as c_real_ssm_type_2_18 from dr_c_real_ssm_type_2_17; 20 | create table dr_c_real_ssm_type_2_19 as select *, c_real_ssm_type_2_18 + c_real_som_type_19 as c_real_ssm_type_2_19 from dr_c_real_ssm_type_2_18; 21 | create table dr_c_real_ssm_type_2_20 as select *, c_real_ssm_type_2_19 + c_real_som_type_20 as c_real_ssm_type_2_20 from dr_c_real_ssm_type_2_19; 22 | create table dr_c_real_ssm_type_2_21 as select *, c_real_ssm_type_2_20 + c_real_som_type_21 as c_real_ssm_type_2_21 from dr_c_real_ssm_type_2_20; 23 | create table dr_c_real_ssm_type_2_22 as select *, c_real_ssm_type_2_21 + c_real_som_type_22 as c_real_ssm_type_2_22 from dr_c_real_ssm_type_2_21; 24 | create table dr_c_real_ssm_type_2_23 as select *, c_real_ssm_type_2_22 + c_real_som_type_23 as c_real_ssm_type_2_23 from dr_c_real_ssm_type_2_22; 25 | create table dr_c_real_ssm_type_2_24 as select *, c_real_ssm_type_2_23 + c_real_som_type_24 as c_real_ssm_type_2_24 from dr_c_real_ssm_type_2_23; 26 | create table dr_c_real_ssm_type_2_25 as select *, c_real_ssm_type_2_24 + c_real_som_type_25 as c_real_ssm_type_2_25 from dr_c_real_ssm_type_2_24; 27 | create table dr_c_real_ssm_type_2_26 as select *, c_real_ssm_type_2_25 + c_real_som_type_26 as c_real_ssm_type_2_26 from dr_c_real_ssm_type_2_25; 28 | create table dr_c_real_ssm_type_2_27 as select *, c_real_ssm_type_2_26 + c_real_som_type_27 as c_real_ssm_type_2_27 from dr_c_real_ssm_type_2_26; 29 | create table dr_c_real_ssm_type_2_28 as select *, c_real_ssm_type_2_27 + c_real_som_type_28 as c_real_ssm_type_2_28 from dr_c_real_ssm_type_2_27; 30 | create table dr_c_real_ssm_type_2_29 as select *, c_real_ssm_type_2_28 + c_real_som_type_29 as c_real_ssm_type_2_29 from dr_c_real_ssm_type_2_28; 31 | create table dr_c_real_ssm_type_2_30 as select *, c_real_ssm_type_2_29 + c_real_som_type_30 as c_real_ssm_type_2_30 from dr_c_real_ssm_type_2_29; 32 | create table dr_c_real_ssm_type_2_31 as select *, c_real_ssm_type_2_30 + c_real_som_type_31 as c_real_ssm_type_2_31 from dr_c_real_ssm_type_2_30; 33 | create table dr_c_real_ssm_type_2_32 as select *, c_real_ssm_type_2_31 + c_real_som_type_32 as c_real_ssm_type_2_32 from dr_c_real_ssm_type_2_31; 34 | create table dr_c_real_ssm_type_2_33 as select *, c_real_ssm_type_2_32 + c_real_som_type_33 as c_real_ssm_type_2_33 from dr_c_real_ssm_type_2_32; 35 | create table dr_c_real_ssm_type_2_34 as select *, c_real_ssm_type_2_33 + c_real_som_type_34 as c_real_ssm_type_2_34 from dr_c_real_ssm_type_2_33; 36 | create table dr_c_real_ssm_type_2_35 as select *, c_real_ssm_type_2_34 + c_real_som_type_35 as c_real_ssm_type_2_35 from dr_c_real_ssm_type_2_34; 37 | create table dr_c_real_ssm_type_2_36 as select *, c_real_ssm_type_2_35 + c_real_som_type_36 as c_real_ssm_type_2_36 from dr_c_real_ssm_type_2_35; 38 | create table dr_c_real_ssm_type_2_37 as select *, c_real_ssm_type_2_36 + c_real_som_type_37 as c_real_ssm_type_2_37 from dr_c_real_ssm_type_2_36; -------------------------------------------------------------------------------- /代码/特征工程/5_1_3_fdfrfir_车类型.txt: -------------------------------------------------------------------------------- 1 | DROP TABLE IF EXISTS whole_f5_1_3; 2 | 3 | DROP TABLE IF EXISTS whole_f5_1_3_fdfr; 4 | 5 | CREATE TABLE whole_f5_1_3_fdfr 6 | AS 7 | SELECT *, c_real_som_type_2 - c_real_som_type_3 AS c_real_fd_type_2, c_real_som_type_3 - c_real_som_type_4 AS c_real_fd_type_3 8 | , c_real_som_type_12 - c_real_som_type_13 AS c_real_fd_type_12, c_real_som_type_24 - c_real_som_type_25 AS c_real_fd_type_24 9 | , c_real_som_type_36 - c_real_som_type_37 AS c_real_fd_type_36, c_real_som_type_2 / c_real_som_type_3 AS c_real_fr_type_2 10 | , c_real_som_type_3 / c_real_som_type_4 AS c_real_fr_type_3, c_real_som_type_12 / c_real_som_type_13 AS c_real_fr_type_12 11 | , c_real_som_type_24 / c_real_som_type_25 AS c_real_fr_type_24, c_real_som_type_36 / c_real_som_type_37 AS c_real_fr_type_36 12 | , c_real_som_type_2 - c_real_som_type_14 as c_real_fd_type_2_14, c_real_som_type_3 - c_real_som_type_15 as c_real_fd_type_3_15 13 | , c_real_som_type_2 / c_real_som_type_14 as c_real_fr_type_2_14, c_real_som_type_3 / c_real_som_type_15 as c_real_fr_type_3_15 14 | FROM dr_c_real_ssm_type_2_36; 15 | 16 | -- inc_rate 增长率 c_fir 17 | CREATE TABLE whole_f5_1_3 18 | AS 19 | SELECT *, c_real_fr_type_2 - 1 AS c_real_fir_type_2, c_real_fr_type_3 - 1 AS c_real_fir_type_3 20 | , c_real_fr_type_12 - 1 AS c_real_fir_type_12, c_real_fr_type_24 - 1 AS c_real_fir_type_24 21 | , c_real_fr_type_36 - 1 AS c_real_fir_type_36 22 | , c_real_fr_type_2_14 - 1 as c_real_fir_type_2_14 23 | , c_real_fr_type_3_15 - 1 as c_real_fir_type_3_15 24 | FROM whole_f5_1_3_fdfr; -------------------------------------------------------------------------------- /代码/特征工程/5_1_省市车类型时间.txt: -------------------------------------------------------------------------------- 1 | drop table if exists whole_f5_step2; 2 | create table whole_f5_step2 as 3 | select *, to_date(sale_date, 'yyyymm') as sale_date_dt 4 | from whole_f5_step1; 5 | 6 | 7 | 8 | create table d__has_c_real_som_type_2 as select a.*, b.real_sale as c_real_som_type_2 from whole_f5_step2 a left outer join whole_f5_step2 b on a.sale_date_dt = dateadd(b.sale_date_dt, 2, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 9 | create table d__has_c_real_som_type_3 as select a.*, b.real_sale as c_real_som_type_3 from d__has_c_real_som_type_2 a left outer join d__has_c_real_som_type_2 b on a.sale_date_dt = dateadd(b.sale_date_dt, 3, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 10 | create table d__has_c_real_som_type_4 as select a.*, b.real_sale as c_real_som_type_4 from d__has_c_real_som_type_3 a left outer join d__has_c_real_som_type_3 b on a.sale_date_dt = dateadd(b.sale_date_dt, 4, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 11 | create table d__has_c_real_som_type_5 as select a.*, b.real_sale as c_real_som_type_5 from d__has_c_real_som_type_4 a left outer join d__has_c_real_som_type_4 b on a.sale_date_dt = dateadd(b.sale_date_dt, 5, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 12 | create table d__has_c_real_som_type_6 as select a.*, b.real_sale as c_real_som_type_6 from d__has_c_real_som_type_5 a left outer join d__has_c_real_som_type_5 b on a.sale_date_dt = dateadd(b.sale_date_dt, 6, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 13 | create table d__has_c_real_som_type_7 as select a.*, b.real_sale as c_real_som_type_7 from d__has_c_real_som_type_6 a left outer join d__has_c_real_som_type_6 b on a.sale_date_dt = dateadd(b.sale_date_dt, 7, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 14 | create table d__has_c_real_som_type_8 as select a.*, b.real_sale as c_real_som_type_8 from d__has_c_real_som_type_7 a left outer join d__has_c_real_som_type_7 b on a.sale_date_dt = dateadd(b.sale_date_dt, 8, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 15 | create table d__has_c_real_som_type_9 as select a.*, b.real_sale as c_real_som_type_9 from d__has_c_real_som_type_8 a left outer join d__has_c_real_som_type_8 b on a.sale_date_dt = dateadd(b.sale_date_dt, 9, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 16 | create table d__has_c_real_som_type_10 as select a.*, b.real_sale as c_real_som_type_10 from d__has_c_real_som_type_9 a left outer join d__has_c_real_som_type_9 b on a.sale_date_dt = dateadd(b.sale_date_dt, 10, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 17 | create table d__has_c_real_som_type_11 as select a.*, b.real_sale as c_real_som_type_11 from d__has_c_real_som_type_10 a left outer join d__has_c_real_som_type_10 b on a.sale_date_dt = dateadd(b.sale_date_dt, 11, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 18 | create table d__has_c_real_som_type_12 as select a.*, b.real_sale as c_real_som_type_12 from d__has_c_real_som_type_11 a left outer join d__has_c_real_som_type_11 b on a.sale_date_dt = dateadd(b.sale_date_dt, 12, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 19 | create table d__has_c_real_som_type_13 as select a.*, b.real_sale as c_real_som_type_13 from d__has_c_real_som_type_12 a left outer join d__has_c_real_som_type_12 b on a.sale_date_dt = dateadd(b.sale_date_dt, 13, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 20 | create table d__has_c_real_som_type_14 as select a.*, b.real_sale as c_real_som_type_14 from d__has_c_real_som_type_13 a left outer join d__has_c_real_som_type_13 b on a.sale_date_dt = dateadd(b.sale_date_dt, 14, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 21 | create table d__has_c_real_som_type_15 as select a.*, b.real_sale as c_real_som_type_15 from d__has_c_real_som_type_14 a left outer join d__has_c_real_som_type_14 b on a.sale_date_dt = dateadd(b.sale_date_dt, 15, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 22 | create table d__has_c_real_som_type_16 as select a.*, b.real_sale as c_real_som_type_16 from d__has_c_real_som_type_15 a left outer join d__has_c_real_som_type_15 b on a.sale_date_dt = dateadd(b.sale_date_dt, 16, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 23 | create table d__has_c_real_som_type_17 as select a.*, b.real_sale as c_real_som_type_17 from d__has_c_real_som_type_16 a left outer join d__has_c_real_som_type_16 b on a.sale_date_dt = dateadd(b.sale_date_dt, 17, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 24 | create table d__has_c_real_som_type_18 as select a.*, b.real_sale as c_real_som_type_18 from d__has_c_real_som_type_17 a left outer join d__has_c_real_som_type_17 b on a.sale_date_dt = dateadd(b.sale_date_dt, 18, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 25 | create table d__has_c_real_som_type_19 as select a.*, b.real_sale as c_real_som_type_19 from d__has_c_real_som_type_18 a left outer join d__has_c_real_som_type_18 b on a.sale_date_dt = dateadd(b.sale_date_dt, 19, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 26 | create table d__has_c_real_som_type_20 as select a.*, b.real_sale as c_real_som_type_20 from d__has_c_real_som_type_19 a left outer join d__has_c_real_som_type_19 b on a.sale_date_dt = dateadd(b.sale_date_dt, 20, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 27 | create table d__has_c_real_som_type_21 as select a.*, b.real_sale as c_real_som_type_21 from d__has_c_real_som_type_20 a left outer join d__has_c_real_som_type_20 b on a.sale_date_dt = dateadd(b.sale_date_dt, 21, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 28 | create table d__has_c_real_som_type_22 as select a.*, b.real_sale as c_real_som_type_22 from d__has_c_real_som_type_21 a left outer join d__has_c_real_som_type_21 b on a.sale_date_dt = dateadd(b.sale_date_dt, 22, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 29 | create table d__has_c_real_som_type_23 as select a.*, b.real_sale as c_real_som_type_23 from d__has_c_real_som_type_22 a left outer join d__has_c_real_som_type_22 b on a.sale_date_dt = dateadd(b.sale_date_dt, 23, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 30 | create table d__has_c_real_som_type_24 as select a.*, b.real_sale as c_real_som_type_24 from d__has_c_real_som_type_23 a left outer join d__has_c_real_som_type_23 b on a.sale_date_dt = dateadd(b.sale_date_dt, 24, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 31 | create table d__has_c_real_som_type_25 as select a.*, b.real_sale as c_real_som_type_25 from d__has_c_real_som_type_24 a left outer join d__has_c_real_som_type_24 b on a.sale_date_dt = dateadd(b.sale_date_dt, 25, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 32 | create table d__has_c_real_som_type_26 as select a.*, b.real_sale as c_real_som_type_26 from d__has_c_real_som_type_25 a left outer join d__has_c_real_som_type_25 b on a.sale_date_dt = dateadd(b.sale_date_dt, 26, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 33 | create table d__has_c_real_som_type_27 as select a.*, b.real_sale as c_real_som_type_27 from d__has_c_real_som_type_26 a left outer join d__has_c_real_som_type_26 b on a.sale_date_dt = dateadd(b.sale_date_dt, 27, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 34 | create table d__has_c_real_som_type_28 as select a.*, b.real_sale as c_real_som_type_28 from d__has_c_real_som_type_27 a left outer join d__has_c_real_som_type_27 b on a.sale_date_dt = dateadd(b.sale_date_dt, 28, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 35 | create table d__has_c_real_som_type_29 as select a.*, b.real_sale as c_real_som_type_29 from d__has_c_real_som_type_28 a left outer join d__has_c_real_som_type_28 b on a.sale_date_dt = dateadd(b.sale_date_dt, 29, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 36 | create table d__has_c_real_som_type_30 as select a.*, b.real_sale as c_real_som_type_30 from d__has_c_real_som_type_29 a left outer join d__has_c_real_som_type_29 b on a.sale_date_dt = dateadd(b.sale_date_dt, 30, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 37 | create table d__has_c_real_som_type_31 as select a.*, b.real_sale as c_real_som_type_31 from d__has_c_real_som_type_30 a left outer join d__has_c_real_som_type_30 b on a.sale_date_dt = dateadd(b.sale_date_dt, 31, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 38 | create table d__has_c_real_som_type_32 as select a.*, b.real_sale as c_real_som_type_32 from d__has_c_real_som_type_31 a left outer join d__has_c_real_som_type_31 b on a.sale_date_dt = dateadd(b.sale_date_dt, 32, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 39 | create table d__has_c_real_som_type_33 as select a.*, b.real_sale as c_real_som_type_33 from d__has_c_real_som_type_32 a left outer join d__has_c_real_som_type_32 b on a.sale_date_dt = dateadd(b.sale_date_dt, 33, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 40 | create table d__has_c_real_som_type_34 as select a.*, b.real_sale as c_real_som_type_34 from d__has_c_real_som_type_33 a left outer join d__has_c_real_som_type_33 b on a.sale_date_dt = dateadd(b.sale_date_dt, 34, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 41 | create table d__has_c_real_som_type_35 as select a.*, b.real_sale as c_real_som_type_35 from d__has_c_real_som_type_34 a left outer join d__has_c_real_som_type_34 b on a.sale_date_dt = dateadd(b.sale_date_dt, 35, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 42 | create table d__has_c_real_som_type_36 as select a.*, b.real_sale as c_real_som_type_36 from d__has_c_real_som_type_35 a left outer join d__has_c_real_som_type_35 b on a.sale_date_dt = dateadd(b.sale_date_dt, 36, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 43 | create table d__has_c_real_som_type_37 as select a.*, b.real_sale as c_real_som_type_37 from d__has_c_real_som_type_36 a left outer join d__has_c_real_som_type_36 b on a.sale_date_dt = dateadd(b.sale_date_dt, 37, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 44 | create table d__has_c_real_som_type_38 as select a.*, b.real_sale as c_real_som_type_38 from d__has_c_real_som_type_37 a left outer join d__has_c_real_som_type_37 b on a.sale_date_dt = dateadd(b.sale_date_dt, 38, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 45 | create table d__has_c_real_som_type_39 as select a.*, b.real_sale as c_real_som_type_39 from d__has_c_real_som_type_38 a left outer join d__has_c_real_som_type_38 b on a.sale_date_dt = dateadd(b.sale_date_dt, 39, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 46 | create table d__has_c_real_som_type_40 as select a.*, b.real_sale as c_real_som_type_40 from d__has_c_real_som_type_39 a left outer join d__has_c_real_som_type_39 b on a.sale_date_dt = dateadd(b.sale_date_dt, 40, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.type_saloon_suv_mpv = b.type_saloon_suv_mpv; 47 | 48 | alter table d__has_c_real_som_type_40 rename to whole_f5_step3; -------------------------------------------------------------------------------- /代码/特征工程/5_2_0_按排量分.txt: -------------------------------------------------------------------------------- 1 | drop table if exists whole_f5_2_step1; 2 | CREATE TABLE whole_f5_2_step1 3 | AS 4 | SELECT sale_date, province_id, city_id, displacement_level 5 | , SUM(sale_quantity) AS sale_quantity, SUM(real_sale) AS real_sale 6 | FROM whole_f3 7 | GROUP BY province_id, 8 | city_id, 9 | displacement_level, 10 | sale_date; 11 | 12 | select * from whole_f5_2_step1; -------------------------------------------------------------------------------- /代码/特征工程/5_2_1_省市排量时间.txt: -------------------------------------------------------------------------------- 1 | drop table if exists whole_f5_2_step2; 2 | create table whole_f5_2_step2 as 3 | select *, to_date(sale_date, 'yyyymm') as sale_date_dt 4 | from whole_f5_2_step1; 5 | 6 | 7 | 8 | create table d__has_c_real_som_displace_level_2 as select a.*, b.real_sale as c_real_som_displace_level_2 from whole_f5_2_step2 a left outer join whole_f5_2_step2 b on a.sale_date_dt = dateadd(b.sale_date_dt, 2, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 9 | create table d__has_c_real_som_displace_level_3 as select a.*, b.real_sale as c_real_som_displace_level_3 from d__has_c_real_som_displace_level_2 a left outer join d__has_c_real_som_displace_level_2 b on a.sale_date_dt = dateadd(b.sale_date_dt, 3, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 10 | create table d__has_c_real_som_displace_level_4 as select a.*, b.real_sale as c_real_som_displace_level_4 from d__has_c_real_som_displace_level_3 a left outer join d__has_c_real_som_displace_level_3 b on a.sale_date_dt = dateadd(b.sale_date_dt, 4, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 11 | create table d__has_c_real_som_displace_level_5 as select a.*, b.real_sale as c_real_som_displace_level_5 from d__has_c_real_som_displace_level_4 a left outer join d__has_c_real_som_displace_level_4 b on a.sale_date_dt = dateadd(b.sale_date_dt, 5, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 12 | create table d__has_c_real_som_displace_level_6 as select a.*, b.real_sale as c_real_som_displace_level_6 from d__has_c_real_som_displace_level_5 a left outer join d__has_c_real_som_displace_level_5 b on a.sale_date_dt = dateadd(b.sale_date_dt, 6, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 13 | create table d__has_c_real_som_displace_level_7 as select a.*, b.real_sale as c_real_som_displace_level_7 from d__has_c_real_som_displace_level_6 a left outer join d__has_c_real_som_displace_level_6 b on a.sale_date_dt = dateadd(b.sale_date_dt, 7, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 14 | create table d__has_c_real_som_displace_level_8 as select a.*, b.real_sale as c_real_som_displace_level_8 from d__has_c_real_som_displace_level_7 a left outer join d__has_c_real_som_displace_level_7 b on a.sale_date_dt = dateadd(b.sale_date_dt, 8, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 15 | create table d__has_c_real_som_displace_level_9 as select a.*, b.real_sale as c_real_som_displace_level_9 from d__has_c_real_som_displace_level_8 a left outer join d__has_c_real_som_displace_level_8 b on a.sale_date_dt = dateadd(b.sale_date_dt, 9, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 16 | create table d__has_c_real_som_displace_level_10 as select a.*, b.real_sale as c_real_som_displace_level_10 from d__has_c_real_som_displace_level_9 a left outer join d__has_c_real_som_displace_level_9 b on a.sale_date_dt = dateadd(b.sale_date_dt, 10, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 17 | create table d__has_c_real_som_displace_level_11 as select a.*, b.real_sale as c_real_som_displace_level_11 from d__has_c_real_som_displace_level_10 a left outer join d__has_c_real_som_displace_level_10 b on a.sale_date_dt = dateadd(b.sale_date_dt, 11, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 18 | create table d__has_c_real_som_displace_level_12 as select a.*, b.real_sale as c_real_som_displace_level_12 from d__has_c_real_som_displace_level_11 a left outer join d__has_c_real_som_displace_level_11 b on a.sale_date_dt = dateadd(b.sale_date_dt, 12, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 19 | create table d__has_c_real_som_displace_level_13 as select a.*, b.real_sale as c_real_som_displace_level_13 from d__has_c_real_som_displace_level_12 a left outer join d__has_c_real_som_displace_level_12 b on a.sale_date_dt = dateadd(b.sale_date_dt, 13, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 20 | create table d__has_c_real_som_displace_level_14 as select a.*, b.real_sale as c_real_som_displace_level_14 from d__has_c_real_som_displace_level_13 a left outer join d__has_c_real_som_displace_level_13 b on a.sale_date_dt = dateadd(b.sale_date_dt, 14, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 21 | create table d__has_c_real_som_displace_level_15 as select a.*, b.real_sale as c_real_som_displace_level_15 from d__has_c_real_som_displace_level_14 a left outer join d__has_c_real_som_displace_level_14 b on a.sale_date_dt = dateadd(b.sale_date_dt, 15, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 22 | create table d__has_c_real_som_displace_level_16 as select a.*, b.real_sale as c_real_som_displace_level_16 from d__has_c_real_som_displace_level_15 a left outer join d__has_c_real_som_displace_level_15 b on a.sale_date_dt = dateadd(b.sale_date_dt, 16, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 23 | create table d__has_c_real_som_displace_level_17 as select a.*, b.real_sale as c_real_som_displace_level_17 from d__has_c_real_som_displace_level_16 a left outer join d__has_c_real_som_displace_level_16 b on a.sale_date_dt = dateadd(b.sale_date_dt, 17, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 24 | create table d__has_c_real_som_displace_level_18 as select a.*, b.real_sale as c_real_som_displace_level_18 from d__has_c_real_som_displace_level_17 a left outer join d__has_c_real_som_displace_level_17 b on a.sale_date_dt = dateadd(b.sale_date_dt, 18, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 25 | create table d__has_c_real_som_displace_level_19 as select a.*, b.real_sale as c_real_som_displace_level_19 from d__has_c_real_som_displace_level_18 a left outer join d__has_c_real_som_displace_level_18 b on a.sale_date_dt = dateadd(b.sale_date_dt, 19, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 26 | create table d__has_c_real_som_displace_level_20 as select a.*, b.real_sale as c_real_som_displace_level_20 from d__has_c_real_som_displace_level_19 a left outer join d__has_c_real_som_displace_level_19 b on a.sale_date_dt = dateadd(b.sale_date_dt, 20, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 27 | create table d__has_c_real_som_displace_level_21 as select a.*, b.real_sale as c_real_som_displace_level_21 from d__has_c_real_som_displace_level_20 a left outer join d__has_c_real_som_displace_level_20 b on a.sale_date_dt = dateadd(b.sale_date_dt, 21, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 28 | create table d__has_c_real_som_displace_level_22 as select a.*, b.real_sale as c_real_som_displace_level_22 from d__has_c_real_som_displace_level_21 a left outer join d__has_c_real_som_displace_level_21 b on a.sale_date_dt = dateadd(b.sale_date_dt, 22, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 29 | create table d__has_c_real_som_displace_level_23 as select a.*, b.real_sale as c_real_som_displace_level_23 from d__has_c_real_som_displace_level_22 a left outer join d__has_c_real_som_displace_level_22 b on a.sale_date_dt = dateadd(b.sale_date_dt, 23, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 30 | create table d__has_c_real_som_displace_level_24 as select a.*, b.real_sale as c_real_som_displace_level_24 from d__has_c_real_som_displace_level_23 a left outer join d__has_c_real_som_displace_level_23 b on a.sale_date_dt = dateadd(b.sale_date_dt, 24, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 31 | create table d__has_c_real_som_displace_level_25 as select a.*, b.real_sale as c_real_som_displace_level_25 from d__has_c_real_som_displace_level_24 a left outer join d__has_c_real_som_displace_level_24 b on a.sale_date_dt = dateadd(b.sale_date_dt, 25, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 32 | create table d__has_c_real_som_displace_level_26 as select a.*, b.real_sale as c_real_som_displace_level_26 from d__has_c_real_som_displace_level_25 a left outer join d__has_c_real_som_displace_level_25 b on a.sale_date_dt = dateadd(b.sale_date_dt, 26, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 33 | create table d__has_c_real_som_displace_level_27 as select a.*, b.real_sale as c_real_som_displace_level_27 from d__has_c_real_som_displace_level_26 a left outer join d__has_c_real_som_displace_level_26 b on a.sale_date_dt = dateadd(b.sale_date_dt, 27, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 34 | create table d__has_c_real_som_displace_level_28 as select a.*, b.real_sale as c_real_som_displace_level_28 from d__has_c_real_som_displace_level_27 a left outer join d__has_c_real_som_displace_level_27 b on a.sale_date_dt = dateadd(b.sale_date_dt, 28, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 35 | create table d__has_c_real_som_displace_level_29 as select a.*, b.real_sale as c_real_som_displace_level_29 from d__has_c_real_som_displace_level_28 a left outer join d__has_c_real_som_displace_level_28 b on a.sale_date_dt = dateadd(b.sale_date_dt, 29, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 36 | create table d__has_c_real_som_displace_level_30 as select a.*, b.real_sale as c_real_som_displace_level_30 from d__has_c_real_som_displace_level_29 a left outer join d__has_c_real_som_displace_level_29 b on a.sale_date_dt = dateadd(b.sale_date_dt, 30, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 37 | create table d__has_c_real_som_displace_level_31 as select a.*, b.real_sale as c_real_som_displace_level_31 from d__has_c_real_som_displace_level_30 a left outer join d__has_c_real_som_displace_level_30 b on a.sale_date_dt = dateadd(b.sale_date_dt, 31, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 38 | create table d__has_c_real_som_displace_level_32 as select a.*, b.real_sale as c_real_som_displace_level_32 from d__has_c_real_som_displace_level_31 a left outer join d__has_c_real_som_displace_level_31 b on a.sale_date_dt = dateadd(b.sale_date_dt, 32, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 39 | create table d__has_c_real_som_displace_level_33 as select a.*, b.real_sale as c_real_som_displace_level_33 from d__has_c_real_som_displace_level_32 a left outer join d__has_c_real_som_displace_level_32 b on a.sale_date_dt = dateadd(b.sale_date_dt, 33, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 40 | create table d__has_c_real_som_displace_level_34 as select a.*, b.real_sale as c_real_som_displace_level_34 from d__has_c_real_som_displace_level_33 a left outer join d__has_c_real_som_displace_level_33 b on a.sale_date_dt = dateadd(b.sale_date_dt, 34, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 41 | create table d__has_c_real_som_displace_level_35 as select a.*, b.real_sale as c_real_som_displace_level_35 from d__has_c_real_som_displace_level_34 a left outer join d__has_c_real_som_displace_level_34 b on a.sale_date_dt = dateadd(b.sale_date_dt, 35, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 42 | create table d__has_c_real_som_displace_level_36 as select a.*, b.real_sale as c_real_som_displace_level_36 from d__has_c_real_som_displace_level_35 a left outer join d__has_c_real_som_displace_level_35 b on a.sale_date_dt = dateadd(b.sale_date_dt, 36, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 43 | create table d__has_c_real_som_displace_level_37 as select a.*, b.real_sale as c_real_som_displace_level_37 from d__has_c_real_som_displace_level_36 a left outer join d__has_c_real_som_displace_level_36 b on a.sale_date_dt = dateadd(b.sale_date_dt, 37, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 44 | create table d__has_c_real_som_displace_level_38 as select a.*, b.real_sale as c_real_som_displace_level_38 from d__has_c_real_som_displace_level_37 a left outer join d__has_c_real_som_displace_level_37 b on a.sale_date_dt = dateadd(b.sale_date_dt, 38, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 45 | create table d__has_c_real_som_displace_level_39 as select a.*, b.real_sale as c_real_som_displace_level_39 from d__has_c_real_som_displace_level_38 a left outer join d__has_c_real_som_displace_level_38 b on a.sale_date_dt = dateadd(b.sale_date_dt, 39, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 46 | create table d__has_c_real_som_displace_level_40 as select a.*, b.real_sale as c_real_som_displace_level_40 from d__has_c_real_som_displace_level_39 a left outer join d__has_c_real_som_displace_level_39 b on a.sale_date_dt = dateadd(b.sale_date_dt, 40, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 47 | 48 | alter table d__has_c_real_som_displace_level_40 rename to whole_f5_2_step3; -------------------------------------------------------------------------------- /代码/特征工程/5_2_2_ssm_排量.txt: -------------------------------------------------------------------------------- 1 | set odps.sql.displace_level.system.odps2=true; 2 | 3 | 4 | create table dr_c_real_ssm_displace_level_2_3 as select *, c_real_som_displace_level_2 + c_real_som_displace_level_3 as c_real_ssm_displace_level_2_3 from d__has_c_real_som_displace_level_37;--首句不一样哦 5 | create table dr_c_real_ssm_displace_level_2_4 as select *, c_real_ssm_displace_level_2_3 + c_real_som_displace_level_4 as c_real_ssm_displace_level_2_4 from dr_c_real_ssm_displace_level_2_3; 6 | create table dr_c_real_ssm_displace_level_2_5 as select *, c_real_ssm_displace_level_2_4 + c_real_som_displace_level_5 as c_real_ssm_displace_level_2_5 from dr_c_real_ssm_displace_level_2_4; 7 | create table dr_c_real_ssm_displace_level_2_6 as select *, c_real_ssm_displace_level_2_5 + c_real_som_displace_level_6 as c_real_ssm_displace_level_2_6 from dr_c_real_ssm_displace_level_2_5; 8 | create table dr_c_real_ssm_displace_level_2_7 as select *, c_real_ssm_displace_level_2_6 + c_real_som_displace_level_7 as c_real_ssm_displace_level_2_7 from dr_c_real_ssm_displace_level_2_6; 9 | create table dr_c_real_ssm_displace_level_2_8 as select *, c_real_ssm_displace_level_2_7 + c_real_som_displace_level_8 as c_real_ssm_displace_level_2_8 from dr_c_real_ssm_displace_level_2_7; 10 | create table dr_c_real_ssm_displace_level_2_9 as select *, c_real_ssm_displace_level_2_8 + c_real_som_displace_level_9 as c_real_ssm_displace_level_2_9 from dr_c_real_ssm_displace_level_2_8; 11 | create table dr_c_real_ssm_displace_level_2_10 as select *, c_real_ssm_displace_level_2_9 + c_real_som_displace_level_10 as c_real_ssm_displace_level_2_10 from dr_c_real_ssm_displace_level_2_9; 12 | create table dr_c_real_ssm_displace_level_2_11 as select *, c_real_ssm_displace_level_2_10 + c_real_som_displace_level_11 as c_real_ssm_displace_level_2_11 from dr_c_real_ssm_displace_level_2_10; 13 | create table dr_c_real_ssm_displace_level_2_12 as select *, c_real_ssm_displace_level_2_11 + c_real_som_displace_level_12 as c_real_ssm_displace_level_2_12 from dr_c_real_ssm_displace_level_2_11; 14 | create table dr_c_real_ssm_displace_level_2_13 as select *, c_real_ssm_displace_level_2_12 + c_real_som_displace_level_13 as c_real_ssm_displace_level_2_13 from dr_c_real_ssm_displace_level_2_12; 15 | create table dr_c_real_ssm_displace_level_2_14 as select *, c_real_ssm_displace_level_2_13 + c_real_som_displace_level_14 as c_real_ssm_displace_level_2_14 from dr_c_real_ssm_displace_level_2_13; 16 | create table dr_c_real_ssm_displace_level_2_15 as select *, c_real_ssm_displace_level_2_14 + c_real_som_displace_level_15 as c_real_ssm_displace_level_2_15 from dr_c_real_ssm_displace_level_2_14; 17 | create table dr_c_real_ssm_displace_level_2_16 as select *, c_real_ssm_displace_level_2_15 + c_real_som_displace_level_16 as c_real_ssm_displace_level_2_16 from dr_c_real_ssm_displace_level_2_15; 18 | create table dr_c_real_ssm_displace_level_2_17 as select *, c_real_ssm_displace_level_2_16 + c_real_som_displace_level_17 as c_real_ssm_displace_level_2_17 from dr_c_real_ssm_displace_level_2_16; 19 | create table dr_c_real_ssm_displace_level_2_18 as select *, c_real_ssm_displace_level_2_17 + c_real_som_displace_level_18 as c_real_ssm_displace_level_2_18 from dr_c_real_ssm_displace_level_2_17; 20 | create table dr_c_real_ssm_displace_level_2_19 as select *, c_real_ssm_displace_level_2_18 + c_real_som_displace_level_19 as c_real_ssm_displace_level_2_19 from dr_c_real_ssm_displace_level_2_18; 21 | create table dr_c_real_ssm_displace_level_2_20 as select *, c_real_ssm_displace_level_2_19 + c_real_som_displace_level_20 as c_real_ssm_displace_level_2_20 from dr_c_real_ssm_displace_level_2_19; 22 | create table dr_c_real_ssm_displace_level_2_21 as select *, c_real_ssm_displace_level_2_20 + c_real_som_displace_level_21 as c_real_ssm_displace_level_2_21 from dr_c_real_ssm_displace_level_2_20; 23 | create table dr_c_real_ssm_displace_level_2_22 as select *, c_real_ssm_displace_level_2_21 + c_real_som_displace_level_22 as c_real_ssm_displace_level_2_22 from dr_c_real_ssm_displace_level_2_21; 24 | create table dr_c_real_ssm_displace_level_2_23 as select *, c_real_ssm_displace_level_2_22 + c_real_som_displace_level_23 as c_real_ssm_displace_level_2_23 from dr_c_real_ssm_displace_level_2_22; 25 | create table dr_c_real_ssm_displace_level_2_24 as select *, c_real_ssm_displace_level_2_23 + c_real_som_displace_level_24 as c_real_ssm_displace_level_2_24 from dr_c_real_ssm_displace_level_2_23; 26 | create table dr_c_real_ssm_displace_level_2_25 as select *, c_real_ssm_displace_level_2_24 + c_real_som_displace_level_25 as c_real_ssm_displace_level_2_25 from dr_c_real_ssm_displace_level_2_24; 27 | create table dr_c_real_ssm_displace_level_2_26 as select *, c_real_ssm_displace_level_2_25 + c_real_som_displace_level_26 as c_real_ssm_displace_level_2_26 from dr_c_real_ssm_displace_level_2_25; 28 | create table dr_c_real_ssm_displace_level_2_27 as select *, c_real_ssm_displace_level_2_26 + c_real_som_displace_level_27 as c_real_ssm_displace_level_2_27 from dr_c_real_ssm_displace_level_2_26; 29 | create table dr_c_real_ssm_displace_level_2_28 as select *, c_real_ssm_displace_level_2_27 + c_real_som_displace_level_28 as c_real_ssm_displace_level_2_28 from dr_c_real_ssm_displace_level_2_27; 30 | create table dr_c_real_ssm_displace_level_2_29 as select *, c_real_ssm_displace_level_2_28 + c_real_som_displace_level_29 as c_real_ssm_displace_level_2_29 from dr_c_real_ssm_displace_level_2_28; 31 | create table dr_c_real_ssm_displace_level_2_30 as select *, c_real_ssm_displace_level_2_29 + c_real_som_displace_level_30 as c_real_ssm_displace_level_2_30 from dr_c_real_ssm_displace_level_2_29; 32 | create table dr_c_real_ssm_displace_level_2_31 as select *, c_real_ssm_displace_level_2_30 + c_real_som_displace_level_31 as c_real_ssm_displace_level_2_31 from dr_c_real_ssm_displace_level_2_30; 33 | create table dr_c_real_ssm_displace_level_2_32 as select *, c_real_ssm_displace_level_2_31 + c_real_som_displace_level_32 as c_real_ssm_displace_level_2_32 from dr_c_real_ssm_displace_level_2_31; 34 | create table dr_c_real_ssm_displace_level_2_33 as select *, c_real_ssm_displace_level_2_32 + c_real_som_displace_level_33 as c_real_ssm_displace_level_2_33 from dr_c_real_ssm_displace_level_2_32; 35 | create table dr_c_real_ssm_displace_level_2_34 as select *, c_real_ssm_displace_level_2_33 + c_real_som_displace_level_34 as c_real_ssm_displace_level_2_34 from dr_c_real_ssm_displace_level_2_33; 36 | create table dr_c_real_ssm_displace_level_2_35 as select *, c_real_ssm_displace_level_2_34 + c_real_som_displace_level_35 as c_real_ssm_displace_level_2_35 from dr_c_real_ssm_displace_level_2_34; 37 | create table dr_c_real_ssm_displace_level_2_36 as select *, c_real_ssm_displace_level_2_35 + c_real_som_displace_level_36 as c_real_ssm_displace_level_2_36 from dr_c_real_ssm_displace_level_2_35; 38 | create table dr_c_real_ssm_displace_level_2_37 as select *, c_real_ssm_displace_level_2_36 + c_real_som_displace_level_37 as c_real_ssm_displace_level_2_37 from dr_c_real_ssm_displace_level_2_36; -------------------------------------------------------------------------------- /代码/特征工程/5_2_3_fdfrfir_排量.txt: -------------------------------------------------------------------------------- 1 | DROP TABLE IF EXISTS whole_f5_2_3; 2 | 3 | DROP TABLE IF EXISTS whole_f5_2_3_fdfr; 4 | 5 | CREATE TABLE whole_f5_2_3_fdfr 6 | AS 7 | SELECT *, c_real_som_displace_level_2 - c_real_som_displace_level_3 AS c_real_fd_displace_level_2, c_real_som_displace_level_3 - c_real_som_displace_level_4 AS c_real_fd_displace_level_3 8 | , c_real_som_displace_level_12 - c_real_som_displace_level_13 AS c_real_fd_displace_level_12, c_real_som_displace_level_24 - c_real_som_displace_level_25 AS c_real_fd_displace_level_24 9 | , c_real_som_displace_level_36 - c_real_som_displace_level_37 AS c_real_fd_displace_level_36, c_real_som_displace_level_2 / c_real_som_displace_level_3 AS c_real_fr_displace_level_2 10 | , c_real_som_displace_level_3 / c_real_som_displace_level_4 AS c_real_fr_displace_level_3, c_real_som_displace_level_12 / c_real_som_displace_level_13 AS c_real_fr_displace_level_12 11 | , c_real_som_displace_level_24 / c_real_som_displace_level_25 AS c_real_fr_displace_level_24, c_real_som_displace_level_36 / c_real_som_displace_level_37 AS c_real_fr_displace_level_36 12 | , c_real_som_displace_level_2 - c_real_som_displace_level_14 as c_real_fd_displace_level_2_14, c_real_som_displace_level_3 - c_real_som_displace_level_15 as c_real_fd_displace_level_3_15 13 | , c_real_som_displace_level_2 / c_real_som_displace_level_14 as c_real_fr_displace_level_2_14, c_real_som_displace_level_3 / c_real_som_displace_level_15 as c_real_fr_displace_level_3_15 14 | FROM dr_c_real_ssm_displace_level_2_36; 15 | 16 | -- inc_rate 增长率 c_fir 17 | CREATE TABLE whole_f5_2_3 18 | AS 19 | SELECT *, c_real_fr_displace_level_2 - 1 AS c_real_fir_displace_level_2, c_real_fr_displace_level_3 - 1 AS c_real_fir_displace_level_3 20 | , c_real_fr_displace_level_12 - 1 AS c_real_fir_displace_level_12, c_real_fr_displace_level_24 - 1 AS c_real_fir_displace_level_24 21 | , c_real_fr_displace_level_36 - 1 AS c_real_fir_displace_level_36 22 | , c_real_fr_displace_level_2_14 - 1 as c_real_fir_displace_level_2_14 23 | , c_real_fr_displace_level_3_15 - 1 as c_real_fir_displace_level_3_15 24 | FROM whole_f5_2_3_fdfr; -------------------------------------------------------------------------------- /代码/特征工程/5_3_0_按新能源类别分.txt: -------------------------------------------------------------------------------- 1 | drop table if exists whole_f5_3_step1; 2 | CREATE TABLE whole_f5_3_step1 3 | AS 4 | SELECT sale_date, province_id, city_id, newenergy_type_id_min 5 | , SUM(sale_quantity) AS sale_quantity, SUM(real_sale) AS real_sale 6 | FROM whole_f3 7 | GROUP BY province_id, 8 | city_id, 9 | newenergy_type_id_min, 10 | sale_date; 11 | 12 | select * from whole_f5_3_step1; -------------------------------------------------------------------------------- /代码/特征工程/5_3_1_省市新能源时间.txt: -------------------------------------------------------------------------------- 1 | drop table if exists whole_f5_3_step2; 2 | create table whole_f5_3_step2 as 3 | select *, to_date(sale_date, 'yyyymm') as sale_date_dt 4 | from whole_f5_3_step1; 5 | 6 | 7 | 8 | create table d__has_c_real_som_ne_type_2 as select a.*, b.real_sale as c_real_som_ne_type_2 from whole_f5_2_step2 a left outer join whole_f5_2_step2 b on a.sale_date_dt = dateadd(b.sale_date_dt, 2, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 9 | create table d__has_c_real_som_ne_type_3 as select a.*, b.real_sale as c_real_som_ne_type_3 from d__has_c_real_som_ne_type_2 a left outer join d__has_c_real_som_ne_type_2 b on a.sale_date_dt = dateadd(b.sale_date_dt, 3, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 10 | create table d__has_c_real_som_ne_type_4 as select a.*, b.real_sale as c_real_som_ne_type_4 from d__has_c_real_som_ne_type_3 a left outer join d__has_c_real_som_ne_type_3 b on a.sale_date_dt = dateadd(b.sale_date_dt, 4, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 11 | create table d__has_c_real_som_ne_type_5 as select a.*, b.real_sale as c_real_som_ne_type_5 from d__has_c_real_som_ne_type_4 a left outer join d__has_c_real_som_ne_type_4 b on a.sale_date_dt = dateadd(b.sale_date_dt, 5, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 12 | create table d__has_c_real_som_ne_type_6 as select a.*, b.real_sale as c_real_som_ne_type_6 from d__has_c_real_som_ne_type_5 a left outer join d__has_c_real_som_ne_type_5 b on a.sale_date_dt = dateadd(b.sale_date_dt, 6, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 13 | create table d__has_c_real_som_ne_type_7 as select a.*, b.real_sale as c_real_som_ne_type_7 from d__has_c_real_som_ne_type_6 a left outer join d__has_c_real_som_ne_type_6 b on a.sale_date_dt = dateadd(b.sale_date_dt, 7, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 14 | create table d__has_c_real_som_ne_type_8 as select a.*, b.real_sale as c_real_som_ne_type_8 from d__has_c_real_som_ne_type_7 a left outer join d__has_c_real_som_ne_type_7 b on a.sale_date_dt = dateadd(b.sale_date_dt, 8, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 15 | create table d__has_c_real_som_ne_type_9 as select a.*, b.real_sale as c_real_som_ne_type_9 from d__has_c_real_som_ne_type_8 a left outer join d__has_c_real_som_ne_type_8 b on a.sale_date_dt = dateadd(b.sale_date_dt, 9, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 16 | create table d__has_c_real_som_ne_type_10 as select a.*, b.real_sale as c_real_som_ne_type_10 from d__has_c_real_som_ne_type_9 a left outer join d__has_c_real_som_ne_type_9 b on a.sale_date_dt = dateadd(b.sale_date_dt, 10, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 17 | create table d__has_c_real_som_ne_type_11 as select a.*, b.real_sale as c_real_som_ne_type_11 from d__has_c_real_som_ne_type_10 a left outer join d__has_c_real_som_ne_type_10 b on a.sale_date_dt = dateadd(b.sale_date_dt, 11, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 18 | create table d__has_c_real_som_ne_type_12 as select a.*, b.real_sale as c_real_som_ne_type_12 from d__has_c_real_som_ne_type_11 a left outer join d__has_c_real_som_ne_type_11 b on a.sale_date_dt = dateadd(b.sale_date_dt, 12, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 19 | create table d__has_c_real_som_ne_type_13 as select a.*, b.real_sale as c_real_som_ne_type_13 from d__has_c_real_som_ne_type_12 a left outer join d__has_c_real_som_ne_type_12 b on a.sale_date_dt = dateadd(b.sale_date_dt, 13, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 20 | create table d__has_c_real_som_ne_type_14 as select a.*, b.real_sale as c_real_som_ne_type_14 from d__has_c_real_som_ne_type_13 a left outer join d__has_c_real_som_ne_type_13 b on a.sale_date_dt = dateadd(b.sale_date_dt, 14, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 21 | create table d__has_c_real_som_ne_type_15 as select a.*, b.real_sale as c_real_som_ne_type_15 from d__has_c_real_som_ne_type_14 a left outer join d__has_c_real_som_ne_type_14 b on a.sale_date_dt = dateadd(b.sale_date_dt, 15, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 22 | create table d__has_c_real_som_ne_type_16 as select a.*, b.real_sale as c_real_som_ne_type_16 from d__has_c_real_som_ne_type_15 a left outer join d__has_c_real_som_ne_type_15 b on a.sale_date_dt = dateadd(b.sale_date_dt, 16, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 23 | create table d__has_c_real_som_ne_type_17 as select a.*, b.real_sale as c_real_som_ne_type_17 from d__has_c_real_som_ne_type_16 a left outer join d__has_c_real_som_ne_type_16 b on a.sale_date_dt = dateadd(b.sale_date_dt, 17, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 24 | create table d__has_c_real_som_ne_type_18 as select a.*, b.real_sale as c_real_som_ne_type_18 from d__has_c_real_som_ne_type_17 a left outer join d__has_c_real_som_ne_type_17 b on a.sale_date_dt = dateadd(b.sale_date_dt, 18, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 25 | create table d__has_c_real_som_ne_type_19 as select a.*, b.real_sale as c_real_som_ne_type_19 from d__has_c_real_som_ne_type_18 a left outer join d__has_c_real_som_ne_type_18 b on a.sale_date_dt = dateadd(b.sale_date_dt, 19, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 26 | create table d__has_c_real_som_ne_type_20 as select a.*, b.real_sale as c_real_som_ne_type_20 from d__has_c_real_som_ne_type_19 a left outer join d__has_c_real_som_ne_type_19 b on a.sale_date_dt = dateadd(b.sale_date_dt, 20, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 27 | create table d__has_c_real_som_ne_type_21 as select a.*, b.real_sale as c_real_som_ne_type_21 from d__has_c_real_som_ne_type_20 a left outer join d__has_c_real_som_ne_type_20 b on a.sale_date_dt = dateadd(b.sale_date_dt, 21, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 28 | create table d__has_c_real_som_ne_type_22 as select a.*, b.real_sale as c_real_som_ne_type_22 from d__has_c_real_som_ne_type_21 a left outer join d__has_c_real_som_ne_type_21 b on a.sale_date_dt = dateadd(b.sale_date_dt, 22, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 29 | create table d__has_c_real_som_ne_type_23 as select a.*, b.real_sale as c_real_som_ne_type_23 from d__has_c_real_som_ne_type_22 a left outer join d__has_c_real_som_ne_type_22 b on a.sale_date_dt = dateadd(b.sale_date_dt, 23, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 30 | create table d__has_c_real_som_ne_type_24 as select a.*, b.real_sale as c_real_som_ne_type_24 from d__has_c_real_som_ne_type_23 a left outer join d__has_c_real_som_ne_type_23 b on a.sale_date_dt = dateadd(b.sale_date_dt, 24, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 31 | create table d__has_c_real_som_ne_type_25 as select a.*, b.real_sale as c_real_som_ne_type_25 from d__has_c_real_som_ne_type_24 a left outer join d__has_c_real_som_ne_type_24 b on a.sale_date_dt = dateadd(b.sale_date_dt, 25, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 32 | create table d__has_c_real_som_ne_type_26 as select a.*, b.real_sale as c_real_som_ne_type_26 from d__has_c_real_som_ne_type_25 a left outer join d__has_c_real_som_ne_type_25 b on a.sale_date_dt = dateadd(b.sale_date_dt, 26, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 33 | create table d__has_c_real_som_ne_type_27 as select a.*, b.real_sale as c_real_som_ne_type_27 from d__has_c_real_som_ne_type_26 a left outer join d__has_c_real_som_ne_type_26 b on a.sale_date_dt = dateadd(b.sale_date_dt, 27, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 34 | create table d__has_c_real_som_ne_type_28 as select a.*, b.real_sale as c_real_som_ne_type_28 from d__has_c_real_som_ne_type_27 a left outer join d__has_c_real_som_ne_type_27 b on a.sale_date_dt = dateadd(b.sale_date_dt, 28, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 35 | create table d__has_c_real_som_ne_type_29 as select a.*, b.real_sale as c_real_som_ne_type_29 from d__has_c_real_som_ne_type_28 a left outer join d__has_c_real_som_ne_type_28 b on a.sale_date_dt = dateadd(b.sale_date_dt, 29, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 36 | create table d__has_c_real_som_ne_type_30 as select a.*, b.real_sale as c_real_som_ne_type_30 from d__has_c_real_som_ne_type_29 a left outer join d__has_c_real_som_ne_type_29 b on a.sale_date_dt = dateadd(b.sale_date_dt, 30, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 37 | create table d__has_c_real_som_ne_type_31 as select a.*, b.real_sale as c_real_som_ne_type_31 from d__has_c_real_som_ne_type_30 a left outer join d__has_c_real_som_ne_type_30 b on a.sale_date_dt = dateadd(b.sale_date_dt, 31, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 38 | create table d__has_c_real_som_ne_type_32 as select a.*, b.real_sale as c_real_som_ne_type_32 from d__has_c_real_som_ne_type_31 a left outer join d__has_c_real_som_ne_type_31 b on a.sale_date_dt = dateadd(b.sale_date_dt, 32, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 39 | create table d__has_c_real_som_ne_type_33 as select a.*, b.real_sale as c_real_som_ne_type_33 from d__has_c_real_som_ne_type_32 a left outer join d__has_c_real_som_ne_type_32 b on a.sale_date_dt = dateadd(b.sale_date_dt, 33, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 40 | create table d__has_c_real_som_ne_type_34 as select a.*, b.real_sale as c_real_som_ne_type_34 from d__has_c_real_som_ne_type_33 a left outer join d__has_c_real_som_ne_type_33 b on a.sale_date_dt = dateadd(b.sale_date_dt, 34, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 41 | create table d__has_c_real_som_ne_type_35 as select a.*, b.real_sale as c_real_som_ne_type_35 from d__has_c_real_som_ne_type_34 a left outer join d__has_c_real_som_ne_type_34 b on a.sale_date_dt = dateadd(b.sale_date_dt, 35, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 42 | create table d__has_c_real_som_ne_type_36 as select a.*, b.real_sale as c_real_som_ne_type_36 from d__has_c_real_som_ne_type_35 a left outer join d__has_c_real_som_ne_type_35 b on a.sale_date_dt = dateadd(b.sale_date_dt, 36, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 43 | create table d__has_c_real_som_ne_type_37 as select a.*, b.real_sale as c_real_som_ne_type_37 from d__has_c_real_som_ne_type_36 a left outer join d__has_c_real_som_ne_type_36 b on a.sale_date_dt = dateadd(b.sale_date_dt, 37, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 44 | create table d__has_c_real_som_ne_type_38 as select a.*, b.real_sale as c_real_som_ne_type_38 from d__has_c_real_som_ne_type_37 a left outer join d__has_c_real_som_ne_type_37 b on a.sale_date_dt = dateadd(b.sale_date_dt, 38, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 45 | create table d__has_c_real_som_ne_type_39 as select a.*, b.real_sale as c_real_som_ne_type_39 from d__has_c_real_som_ne_type_38 a left outer join d__has_c_real_som_ne_type_38 b on a.sale_date_dt = dateadd(b.sale_date_dt, 39, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 46 | create table d__has_c_real_som_ne_type_40 as select a.*, b.real_sale as c_real_som_ne_type_40 from d__has_c_real_som_ne_type_39 a left outer join d__has_c_real_som_ne_type_39 b on a.sale_date_dt = dateadd(b.sale_date_dt, 40, 'mm') and a.province_id = b.province_id and a.city_id = b.city_id and a.displacement_level = b.displacement_level; 47 | 48 | alter table d__has_c_real_som_ne_type_40 rename to whole_f5_3_step3; -------------------------------------------------------------------------------- /代码/特征工程/5_3_2_ssm_新能源.txt: -------------------------------------------------------------------------------- 1 | set odps.sql.ne_type.system.odps2=true; 2 | 3 | 4 | create table dr_c_real_ssm_ne_type_2_3 as select *, c_real_som_ne_type_2 + c_real_som_ne_type_3 as c_real_ssm_ne_type_2_3 from d__has_c_real_som_ne_type_37;--首句不一样哦 5 | create table dr_c_real_ssm_ne_type_2_4 as select *, c_real_ssm_ne_type_2_3 + c_real_som_ne_type_4 as c_real_ssm_ne_type_2_4 from dr_c_real_ssm_ne_type_2_3; 6 | create table dr_c_real_ssm_ne_type_2_5 as select *, c_real_ssm_ne_type_2_4 + c_real_som_ne_type_5 as c_real_ssm_ne_type_2_5 from dr_c_real_ssm_ne_type_2_4; 7 | create table dr_c_real_ssm_ne_type_2_6 as select *, c_real_ssm_ne_type_2_5 + c_real_som_ne_type_6 as c_real_ssm_ne_type_2_6 from dr_c_real_ssm_ne_type_2_5; 8 | create table dr_c_real_ssm_ne_type_2_7 as select *, c_real_ssm_ne_type_2_6 + c_real_som_ne_type_7 as c_real_ssm_ne_type_2_7 from dr_c_real_ssm_ne_type_2_6; 9 | create table dr_c_real_ssm_ne_type_2_8 as select *, c_real_ssm_ne_type_2_7 + c_real_som_ne_type_8 as c_real_ssm_ne_type_2_8 from dr_c_real_ssm_ne_type_2_7; 10 | create table dr_c_real_ssm_ne_type_2_9 as select *, c_real_ssm_ne_type_2_8 + c_real_som_ne_type_9 as c_real_ssm_ne_type_2_9 from dr_c_real_ssm_ne_type_2_8; 11 | create table dr_c_real_ssm_ne_type_2_10 as select *, c_real_ssm_ne_type_2_9 + c_real_som_ne_type_10 as c_real_ssm_ne_type_2_10 from dr_c_real_ssm_ne_type_2_9; 12 | create table dr_c_real_ssm_ne_type_2_11 as select *, c_real_ssm_ne_type_2_10 + c_real_som_ne_type_11 as c_real_ssm_ne_type_2_11 from dr_c_real_ssm_ne_type_2_10; 13 | create table dr_c_real_ssm_ne_type_2_12 as select *, c_real_ssm_ne_type_2_11 + c_real_som_ne_type_12 as c_real_ssm_ne_type_2_12 from dr_c_real_ssm_ne_type_2_11; 14 | create table dr_c_real_ssm_ne_type_2_13 as select *, c_real_ssm_ne_type_2_12 + c_real_som_ne_type_13 as c_real_ssm_ne_type_2_13 from dr_c_real_ssm_ne_type_2_12; 15 | create table dr_c_real_ssm_ne_type_2_14 as select *, c_real_ssm_ne_type_2_13 + c_real_som_ne_type_14 as c_real_ssm_ne_type_2_14 from dr_c_real_ssm_ne_type_2_13; 16 | create table dr_c_real_ssm_ne_type_2_15 as select *, c_real_ssm_ne_type_2_14 + c_real_som_ne_type_15 as c_real_ssm_ne_type_2_15 from dr_c_real_ssm_ne_type_2_14; 17 | create table dr_c_real_ssm_ne_type_2_16 as select *, c_real_ssm_ne_type_2_15 + c_real_som_ne_type_16 as c_real_ssm_ne_type_2_16 from dr_c_real_ssm_ne_type_2_15; 18 | create table dr_c_real_ssm_ne_type_2_17 as select *, c_real_ssm_ne_type_2_16 + c_real_som_ne_type_17 as c_real_ssm_ne_type_2_17 from dr_c_real_ssm_ne_type_2_16; 19 | create table dr_c_real_ssm_ne_type_2_18 as select *, c_real_ssm_ne_type_2_17 + c_real_som_ne_type_18 as c_real_ssm_ne_type_2_18 from dr_c_real_ssm_ne_type_2_17; 20 | create table dr_c_real_ssm_ne_type_2_19 as select *, c_real_ssm_ne_type_2_18 + c_real_som_ne_type_19 as c_real_ssm_ne_type_2_19 from dr_c_real_ssm_ne_type_2_18; 21 | create table dr_c_real_ssm_ne_type_2_20 as select *, c_real_ssm_ne_type_2_19 + c_real_som_ne_type_20 as c_real_ssm_ne_type_2_20 from dr_c_real_ssm_ne_type_2_19; 22 | create table dr_c_real_ssm_ne_type_2_21 as select *, c_real_ssm_ne_type_2_20 + c_real_som_ne_type_21 as c_real_ssm_ne_type_2_21 from dr_c_real_ssm_ne_type_2_20; 23 | create table dr_c_real_ssm_ne_type_2_22 as select *, c_real_ssm_ne_type_2_21 + c_real_som_ne_type_22 as c_real_ssm_ne_type_2_22 from dr_c_real_ssm_ne_type_2_21; 24 | create table dr_c_real_ssm_ne_type_2_23 as select *, c_real_ssm_ne_type_2_22 + c_real_som_ne_type_23 as c_real_ssm_ne_type_2_23 from dr_c_real_ssm_ne_type_2_22; 25 | create table dr_c_real_ssm_ne_type_2_24 as select *, c_real_ssm_ne_type_2_23 + c_real_som_ne_type_24 as c_real_ssm_ne_type_2_24 from dr_c_real_ssm_ne_type_2_23; 26 | create table dr_c_real_ssm_ne_type_2_25 as select *, c_real_ssm_ne_type_2_24 + c_real_som_ne_type_25 as c_real_ssm_ne_type_2_25 from dr_c_real_ssm_ne_type_2_24; 27 | create table dr_c_real_ssm_ne_type_2_26 as select *, c_real_ssm_ne_type_2_25 + c_real_som_ne_type_26 as c_real_ssm_ne_type_2_26 from dr_c_real_ssm_ne_type_2_25; 28 | create table dr_c_real_ssm_ne_type_2_27 as select *, c_real_ssm_ne_type_2_26 + c_real_som_ne_type_27 as c_real_ssm_ne_type_2_27 from dr_c_real_ssm_ne_type_2_26; 29 | create table dr_c_real_ssm_ne_type_2_28 as select *, c_real_ssm_ne_type_2_27 + c_real_som_ne_type_28 as c_real_ssm_ne_type_2_28 from dr_c_real_ssm_ne_type_2_27; 30 | create table dr_c_real_ssm_ne_type_2_29 as select *, c_real_ssm_ne_type_2_28 + c_real_som_ne_type_29 as c_real_ssm_ne_type_2_29 from dr_c_real_ssm_ne_type_2_28; 31 | create table dr_c_real_ssm_ne_type_2_30 as select *, c_real_ssm_ne_type_2_29 + c_real_som_ne_type_30 as c_real_ssm_ne_type_2_30 from dr_c_real_ssm_ne_type_2_29; 32 | create table dr_c_real_ssm_ne_type_2_31 as select *, c_real_ssm_ne_type_2_30 + c_real_som_ne_type_31 as c_real_ssm_ne_type_2_31 from dr_c_real_ssm_ne_type_2_30; 33 | create table dr_c_real_ssm_ne_type_2_32 as select *, c_real_ssm_ne_type_2_31 + c_real_som_ne_type_32 as c_real_ssm_ne_type_2_32 from dr_c_real_ssm_ne_type_2_31; 34 | create table dr_c_real_ssm_ne_type_2_33 as select *, c_real_ssm_ne_type_2_32 + c_real_som_ne_type_33 as c_real_ssm_ne_type_2_33 from dr_c_real_ssm_ne_type_2_32; 35 | create table dr_c_real_ssm_ne_type_2_34 as select *, c_real_ssm_ne_type_2_33 + c_real_som_ne_type_34 as c_real_ssm_ne_type_2_34 from dr_c_real_ssm_ne_type_2_33; 36 | create table dr_c_real_ssm_ne_type_2_35 as select *, c_real_ssm_ne_type_2_34 + c_real_som_ne_type_35 as c_real_ssm_ne_type_2_35 from dr_c_real_ssm_ne_type_2_34; 37 | create table dr_c_real_ssm_ne_type_2_36 as select *, c_real_ssm_ne_type_2_35 + c_real_som_ne_type_36 as c_real_ssm_ne_type_2_36 from dr_c_real_ssm_ne_type_2_35; 38 | create table dr_c_real_ssm_ne_type_2_37 as select *, c_real_ssm_ne_type_2_36 + c_real_som_ne_type_37 as c_real_ssm_ne_type_2_37 from dr_c_real_ssm_ne_type_2_36; -------------------------------------------------------------------------------- /代码/特征工程/5_3_3_fdfrfir_新能源.txt: -------------------------------------------------------------------------------- 1 | DROP TABLE IF EXISTS whole_f5_3_3; 2 | 3 | DROP TABLE IF EXISTS whole_f5_3_3_fdfr; 4 | 5 | CREATE TABLE whole_f5_3_3_fdfr 6 | AS 7 | SELECT *, c_real_som_ne_type_2 - c_real_som_ne_type_3 AS c_real_fd_ne_type_2, c_real_som_ne_type_3 - c_real_som_ne_type_4 AS c_real_fd_ne_type_3 8 | , c_real_som_ne_type_12 - c_real_som_ne_type_13 AS c_real_fd_ne_type_12, c_real_som_ne_type_24 - c_real_som_ne_type_25 AS c_real_fd_ne_type_24 9 | , c_real_som_ne_type_36 - c_real_som_ne_type_37 AS c_real_fd_ne_type_36, c_real_som_ne_type_2 / c_real_som_ne_type_3 AS c_real_fr_ne_type_2 10 | , c_real_som_ne_type_3 / c_real_som_ne_type_4 AS c_real_fr_ne_type_3, c_real_som_ne_type_12 / c_real_som_ne_type_13 AS c_real_fr_ne_type_12 11 | , c_real_som_ne_type_24 / c_real_som_ne_type_25 AS c_real_fr_ne_type_24, c_real_som_ne_type_36 / c_real_som_ne_type_37 AS c_real_fr_ne_type_36 12 | , c_real_som_ne_type_2 - c_real_som_ne_type_14 as c_real_fd_ne_type_2_14, c_real_som_ne_type_3 - c_real_som_ne_type_15 as c_real_fd_ne_type_3_15 13 | , c_real_som_ne_type_2 / c_real_som_ne_type_14 as c_real_fr_ne_type_2_14, c_real_som_ne_type_3 / c_real_som_ne_type_15 as c_real_fr_ne_type_3_15 14 | FROM dr_c_real_ssm_ne_type_2_36; 15 | 16 | -- inc_rate 增长率 c_fir 17 | CREATE TABLE whole_f5_3_3 18 | AS 19 | SELECT *, c_real_fr_ne_type_2 - 1 AS c_real_fir_ne_type_2, c_real_fr_ne_type_3 - 1 AS c_real_fir_ne_type_3 20 | , c_real_fr_ne_type_12 - 1 AS c_real_fir_ne_type_12, c_real_fr_ne_type_24 - 1 AS c_real_fir_ne_type_24 21 | , c_real_fr_ne_type_36 - 1 AS c_real_fir_ne_type_36 22 | , c_real_fr_ne_type_2_14 - 1 as c_real_fir_ne_type_2_14 23 | , c_real_fr_ne_type_3_15 - 1 as c_real_fir_ne_type_3_15 24 | FROM whole_f5_3_3_fdfr; -------------------------------------------------------------------------------- /代码/特征工程/6_0_1_省市时间开窗之车型.txt: -------------------------------------------------------------------------------- 1 | drop table if exists whole_f601__prv_city_month; 2 | create table whole_f601__prv_city_month as 3 | SELECT province_id, city_id, sale_date, class_id --, c_real_som_2 4 | , dense_rank() OVER (PARTITION BY province_id, city_id, sale_date ORDER BY c_real_som_2 ASC) AS rank_within_prv_city_month__2_mm_ago 5 | , COUNT(c_real_som_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS count_of_record_within_prv_city_month__2_mm_ago 6 | , AVG(c_real_som_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS avg_within_prv_city_month__2_mm_ago 7 | , MAX(c_real_som_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS max_within_prv_city_month__2_mm_ago 8 | , MIN(c_real_som_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS min_within_prv_city_month__2_mm_ago 9 | , median(c_real_som_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS median_within_prv_city_month__2_mm_ago 10 | , STDDEV(c_real_som_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS stddev_within_prv_city_month__2_mm_ago 11 | , stddev_samp(c_real_som_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS stddev_samp_within_prv_city_month__2_mm_ago 12 | 13 | , dense_rank() OVER (PARTITION BY province_id, city_id, sale_date ORDER BY c_real_som_12 ASC) AS rank_within_prv_city_month__12_mm_ago 14 | , COUNT(c_real_som_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS count_of_record_within_prv_city_month__12_mm_ago 15 | , AVG(c_real_som_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS avg_within_prv_city_month__12_mm_ago 16 | , MAX(c_real_som_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS max_within_prv_city_month__12_mm_ago 17 | , MIN(c_real_som_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS min_within_prv_city_month__12_mm_ago 18 | , median(c_real_som_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS median_within_prv_city_month__12_mm_ago 19 | , STDDEV(c_real_som_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS stddev_within_prv_city_month__12_mm_ago 20 | , stddev_samp(c_real_som_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS stddev_samp_within_prv_city_month__12_mm_ago 21 | FROM dr_c_real_ssm_2_36; --whole_f3为带了车型自身分类(按排量等)的训练+测试表 22 | 23 | -- drop table if exists whole_f601__prv_city_month; 24 | -- CREATE TABLE whole_f601__prv_city_month 25 | -- AS 26 | -- SELECT a.* 27 | -- , b.rank_within_prv_city_month AS rank_within_prv_city_month__2_mm_ago 28 | -- , b.count_of_class_within_prv_city_month as count_of_class_within_prv_city_month__2_mm_ago 29 | -- , b.avg_within_prv_city_month as avg_within_prv_city_month__2_mm_ago 30 | -- , b.max_within_prv_city_month as max_within_prv_city_month__2_mm_ago 31 | -- , b.min_within_prv_city_month as min_within_prv_city_month__2_mm_ago 32 | -- , b.median_within_prv_city_month as median_within_prv_city_month__2_mm_ago 33 | -- , b.stddev_within_prv_city_month as stddev_within_prv_city_month__2_mm_ago 34 | -- , b.stddev_samp_within_prv_city_month as stddev_samp_within_prv_city_month__2_mm_ago 35 | -- FROM whole_f3 a 36 | -- LEFT OUTER JOIN helper_f601_step1 b 37 | -- ON to_date(a.sale_date,'yyyymm') = dateadd(to_date(b.sale_date,'yyyymm'), 2, 'mm') 38 | -- AND a.province_id = b.province_id 39 | -- AND a.class_id = b.class_id 40 | -- AND a.city_id = b.city_id; -------------------------------------------------------------------------------- /代码/特征工程/6_0_2_省市时间开窗之车类型.txt: -------------------------------------------------------------------------------- 1 | drop table if exists whole_f602__prv_city_month; 2 | create table whole_f602__prv_city_month as 3 | SELECT province_id, city_id, sale_date, type_saloon_suv_mpv --, c_real_som_type_2 4 | , dense_rank() OVER (PARTITION BY province_id, city_id, sale_date ORDER BY c_real_som_type_2 ASC) AS rank_real_type_within_prv_city_month__2_mm_ago 5 | , COUNT(c_real_som_type_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS count_of_record_real_type_within_prv_city_month__2_mm_ago 6 | , AVG(c_real_som_type_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS avg_real_type_within_prv_city_month__2_mm_ago 7 | , MAX(c_real_som_type_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS max_real_type_within_prv_city_month__2_mm_ago 8 | , MIN(c_real_som_type_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS min_real_type_within_prv_city_month__2_mm_ago 9 | , median(c_real_som_type_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS median_real_type_within_prv_city_month__2_mm_ago 10 | , STDDEV(c_real_som_type_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS stddev_real_type_within_prv_city_month__2_mm_ago 11 | , stddev_samp(c_real_som_type_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS stddev_samp_real_type_within_prv_city_month__2_mm_ago 12 | 13 | , dense_rank() OVER (PARTITION BY province_id, city_id, sale_date ORDER BY c_real_som_type_12 ASC) AS rank_real_type_within_prv_city_month__12_mm_ago 14 | , COUNT(c_real_som_type_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS count_of_record_real_type_within_prv_city_month__12_mm_ago 15 | , AVG(c_real_som_type_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS avg_real_type_within_prv_city_month__12_mm_ago 16 | , MAX(c_real_som_type_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS max_real_type_within_prv_city_month__12_mm_ago 17 | , MIN(c_real_som_type_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS min_real_type_within_prv_city_month__12_mm_ago 18 | , median(c_real_som_type_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS median_real_type_within_prv_city_month__12_mm_ago 19 | , STDDEV(c_real_som_type_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS stddev_real_type_within_prv_city_month__12_mm_ago 20 | , stddev_samp(c_real_som_type_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS stddev_samp_real_type_within_prv_city_month__12_mm_ago 21 | FROM dr_c_real_ssm_type_2_36; -------------------------------------------------------------------------------- /代码/特征工程/6_0_3_省市时间开窗之排量.txt: -------------------------------------------------------------------------------- 1 | drop table if exists whole_f603__prv_city_month; 2 | create table whole_f603__prv_city_month as 3 | SELECT province_id, city_id, sale_date, displacement_level --, c_real_som_displace_level_2 4 | , dense_rank() OVER (PARTITION BY province_id, city_id, sale_date ORDER BY c_real_som_displace_level_2 ASC) AS rank_real_displace_level_within_prv_city_month__2_mm_ago 5 | , COUNT(c_real_som_displace_level_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS count_of_record_real_displace_level_within_prv_city_month__2_mm_ago 6 | , AVG(c_real_som_displace_level_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS avg_real_displace_level_within_prv_city_month__2_mm_ago 7 | , MAX(c_real_som_displace_level_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS max_real_displace_level_within_prv_city_month__2_mm_ago 8 | , MIN(c_real_som_displace_level_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS min_real_displace_level_within_prv_city_month__2_mm_ago 9 | , median(c_real_som_displace_level_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS median_real_displace_level_within_prv_city_month__2_mm_ago 10 | , STDDEV(c_real_som_displace_level_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS stddev_real_displace_level_within_prv_city_month__2_mm_ago 11 | , stddev_samp(c_real_som_displace_level_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS stddev_samp_real_displace_level_within_prv_city_month__2_mm_ago 12 | 13 | , dense_rank() OVER (PARTITION BY province_id, city_id, sale_date ORDER BY c_real_som_displace_level_12 ASC) AS rank_real_displace_level_within_prv_city_month__12_mm_ago 14 | , COUNT(c_real_som_displace_level_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS count_of_record_real_displace_level_within_prv_city_month__12_mm_ago 15 | , AVG(c_real_som_displace_level_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS avg_real_displace_level_within_prv_city_month__12_mm_ago 16 | , MAX(c_real_som_displace_level_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS max_real_displace_level_within_prv_city_month__12_mm_ago 17 | , MIN(c_real_som_displace_level_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS min_real_displace_level_within_prv_city_month__12_mm_ago 18 | , median(c_real_som_displace_level_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS median_real_displace_level_within_prv_city_month__12_mm_ago 19 | , STDDEV(c_real_som_displace_level_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS stddev_real_displace_level_within_prv_city_month__12_mm_ago 20 | , stddev_samp(c_real_som_displace_level_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS stddev_samp_real_displace_level_within_prv_city_month__12_mm_ago 21 | FROM dr_c_real_ssm_displace_level_2_36; -------------------------------------------------------------------------------- /代码/特征工程/6_0_4_省市时间开窗之新能源.txt: -------------------------------------------------------------------------------- 1 | drop table if exists whole_f604__prv_city_month; 2 | create table whole_f604__prv_city_month as 3 | SELECT province_id, city_id, sale_date, newenergy_type_id_min --, c_real_som_ne_type_2 4 | , dense_rank() OVER (PARTITION BY province_id, city_id, sale_date ORDER BY c_real_som_ne_type_2 ASC) AS rank_real_ne_type_within_prv_city_month__2_mm_ago 5 | , COUNT(c_real_som_ne_type_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS count_of_record_real_ne_type_within_prv_city_month__2_mm_ago 6 | , AVG(c_real_som_ne_type_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS avg_real_ne_type_within_prv_city_month__2_mm_ago 7 | , MAX(c_real_som_ne_type_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS max_real_ne_type_within_prv_city_month__2_mm_ago 8 | , MIN(c_real_som_ne_type_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS min_real_ne_type_within_prv_city_month__2_mm_ago 9 | , median(c_real_som_ne_type_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS median_real_ne_type_within_prv_city_month__2_mm_ago 10 | , STDDEV(c_real_som_ne_type_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS stddev_real_ne_type_within_prv_city_month__2_mm_ago 11 | , stddev_samp(c_real_som_ne_type_2) OVER (PARTITION BY province_id, city_id, sale_date ) AS stddev_samp_real_ne_type_within_prv_city_month__2_mm_ago 12 | 13 | , dense_rank() OVER (PARTITION BY province_id, city_id, sale_date ORDER BY c_real_som_ne_type_12 ASC) AS rank_real_ne_type_within_prv_city_month__12_mm_ago 14 | , COUNT(c_real_som_ne_type_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS count_of_record_real_ne_type_within_prv_city_month__12_mm_ago 15 | , AVG(c_real_som_ne_type_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS avg_real_ne_type_within_prv_city_month__12_mm_ago 16 | , MAX(c_real_som_ne_type_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS max_real_ne_type_within_prv_city_month__12_mm_ago 17 | , MIN(c_real_som_ne_type_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS min_real_ne_type_within_prv_city_month__12_mm_ago 18 | , median(c_real_som_ne_type_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS median_real_ne_type_within_prv_city_month__12_mm_ago 19 | , STDDEV(c_real_som_ne_type_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS stddev_real_ne_type_within_prv_city_month__12_mm_ago 20 | , stddev_samp(c_real_som_ne_type_12) OVER (PARTITION BY province_id, city_id, sale_date ) AS stddev_samp_real_ne_type_within_prv_city_month__12_mm_ago 21 | FROM dr_c_real_ssm_ne_type_2_36; -------------------------------------------------------------------------------- /代码/特征工程/7_1_省市车型历史交叉.txt: -------------------------------------------------------------------------------- 1 | -- 省市车型,在过去2年/过去一年/过去半年/过去三个月中的销量mean/min/max/median 2 | DROP TABLE IF EXISTS whole_f7_1_step1; 3 | 4 | DROP TABLE IF EXISTS whole_f7_1_step2; 5 | 6 | CREATE TABLE whole_f7_1_step1 7 | AS 8 | -- SELECT province_id, city_id, class_id, sale_date, real_sale 9 | -- , c_real_som_2, c_real_som_12, c_real_ssm_2_3, c_real_ssm_2_7, c_real_ssm_2_13 10 | -- , c_real_ssm_2_25 -- 近X个月的median/max/min/avg 11 | select * 12 | -- median 13 | , ordinal(2, c_real_som_2, c_real_som_3, c_real_som_4) AS median_2_4 14 | , (ordinal(3, c_real_som_2, c_real_som_3, c_real_som_4, c_real_som_5, c_real_som_6, c_real_som_7) + ordinal(4, c_real_som_2, c_real_som_3, c_real_som_4, c_real_som_5, c_real_som_6, c_real_som_7)) / 2 AS median_2_7 15 | , (ordinal(6, c_real_som_2, c_real_som_3, c_real_som_4, c_real_som_5, c_real_som_6, c_real_som_7, c_real_som_8, c_real_som_9, c_real_som_10, c_real_som_11, c_real_som_12, c_real_som_13) + ordinal(7, c_real_som_2, c_real_som_3, c_real_som_4, c_real_som_5, c_real_som_6, c_real_som_7, c_real_som_8, c_real_som_9, c_real_som_10, c_real_som_11, c_real_som_12, c_real_som_13)) / 2 AS median_2_13 16 | , (ordinal(12, c_real_som_2, c_real_som_3, c_real_som_4, c_real_som_5, c_real_som_6, c_real_som_7, c_real_som_8, c_real_som_9, c_real_som_10, c_real_som_11, c_real_som_12, c_real_som_13, c_real_som_14, c_real_som_15, c_real_som_16, c_real_som_17, c_real_som_18, c_real_som_19, c_real_som_20, c_real_som_21, c_real_som_22, c_real_som_23, c_real_som_24, c_real_som_25) + ordinal(13, c_real_som_2, c_real_som_3, c_real_som_4, c_real_som_5, c_real_som_6, c_real_som_7, c_real_som_8, c_real_som_9, c_real_som_10, c_real_som_11, c_real_som_12, c_real_som_13, c_real_som_14, c_real_som_15, c_real_som_16, c_real_som_17, c_real_som_18, c_real_som_19, c_real_som_20, c_real_som_21, c_real_som_22, c_real_som_23, c_real_som_24, c_real_som_25)) / 2 AS median_2_25 17 | -- max 18 | , GREATEST(c_real_som_2, c_real_som_3) AS max_2_3, GREATEST(c_real_som_2, c_real_som_3, c_real_som_4) AS max_2_4 19 | , GREATEST(c_real_som_2, c_real_som_3, c_real_som_4, c_real_som_5, c_real_som_6, c_real_som_7) AS max_2_7 20 | , GREATEST(c_real_som_2, c_real_som_3, c_real_som_4, c_real_som_5, c_real_som_6, c_real_som_7, c_real_som_8, c_real_som_9, c_real_som_10, c_real_som_11, c_real_som_12, c_real_som_13) AS max_2_13 21 | , greatest(c_real_som_2, c_real_som_3, c_real_som_4, c_real_som_5, c_real_som_6, c_real_som_7, c_real_som_8, c_real_som_9, c_real_som_10, c_real_som_11, c_real_som_12, c_real_som_13, c_real_som_14, c_real_som_15, c_real_som_16, c_real_som_17, c_real_som_18, c_real_som_19, c_real_som_20, c_real_som_21, c_real_som_22, c_real_som_23, c_real_som_24, c_real_som_25) as max_2_25 22 | -- min 23 | , least(c_real_som_2, c_real_som_3) AS min_2_3, least(c_real_som_2, c_real_som_3, c_real_som_4) AS min_2_4 24 | , least(c_real_som_2, c_real_som_3, c_real_som_4, c_real_som_5, c_real_som_6, c_real_som_7) AS min_2_7 25 | , least(c_real_som_2, c_real_som_3, c_real_som_4, c_real_som_5, c_real_som_6, c_real_som_7, c_real_som_8, c_real_som_9, c_real_som_10, c_real_som_11, c_real_som_12, c_real_som_13) AS min_2_13 26 | , least(c_real_som_2, c_real_som_3, c_real_som_4, c_real_som_5, c_real_som_6, c_real_som_7, c_real_som_8, c_real_som_9, c_real_som_10, c_real_som_11, c_real_som_12, c_real_som_13, c_real_som_14, c_real_som_15, c_real_som_16, c_real_som_17, c_real_som_18, c_real_som_19, c_real_som_20, c_real_som_21, c_real_som_22, c_real_som_23, c_real_som_24, c_real_som_25) as min_2_25 27 | -- avg 28 | , (c_real_som_2 + c_real_som_3)/2 AS avg_2_3, (c_real_som_2 + c_real_som_3 + c_real_som_4)/3 AS avg_2_4 29 | , (c_real_som_2 + c_real_som_3 + c_real_som_4 + c_real_som_5 + c_real_som_6 + c_real_som_7)/6 AS avg_2_7 30 | , (c_real_som_2 + c_real_som_3 + c_real_som_4 + c_real_som_5 + c_real_som_6 + c_real_som_7 + c_real_som_8 + c_real_som_9 + c_real_som_10 + c_real_som_11 + c_real_som_12 + c_real_som_13)/12 AS avg_2_13 31 | , (c_real_som_2 + c_real_som_3 + c_real_som_4 + c_real_som_5 + c_real_som_6 + c_real_som_7 + c_real_som_8 + c_real_som_9 + c_real_som_10 + c_real_som_11 + c_real_som_12 + c_real_som_13 + c_real_som_14 + c_real_som_15 + c_real_som_16 + c_real_som_17 + c_real_som_18 + c_real_som_19 + c_real_som_20 + c_real_som_21 + c_real_som_22 + c_real_som_23 + c_real_som_24 + c_real_som_25)/24 as avg_2_25 32 | 33 | --上一年的 34 | -- median 35 | , ordinal(2, c_real_som_14, c_real_som_15, c_real_som_16) AS median_14_16 36 | , (ordinal(3, c_real_som_14, c_real_som_15, c_real_som_16, c_real_som_17, c_real_som_18, c_real_som_19) + ordinal(4, c_real_som_14, c_real_som_15, c_real_som_16, c_real_som_17, c_real_som_18, c_real_som_19)) / 2 AS median_14_19 37 | , (ordinal(6, c_real_som_14, c_real_som_15, c_real_som_16, c_real_som_17, c_real_som_18, c_real_som_19, c_real_som_20, c_real_som_21, c_real_som_22, c_real_som_23, c_real_som_24, c_real_som_25) + ordinal(7, c_real_som_14, c_real_som_15, c_real_som_16, c_real_som_17, c_real_som_18, c_real_som_19, c_real_som_20, c_real_som_21, c_real_som_22, c_real_som_23, c_real_som_24, c_real_som_25)) / 2 AS median_14_25 38 | , (ordinal(12, c_real_som_14, c_real_som_15, c_real_som_16, c_real_som_17, c_real_som_18, c_real_som_19, c_real_som_20, c_real_som_21, c_real_som_22, c_real_som_23, c_real_som_24, c_real_som_25, c_real_som_26, c_real_som_27, c_real_som_28, c_real_som_29, c_real_som_30, c_real_som_31, c_real_som_32, c_real_som_33, c_real_som_34, c_real_som_35, c_real_som_36, c_real_som_37) + ordinal(13, c_real_som_14, c_real_som_15, c_real_som_16, c_real_som_17, c_real_som_18, c_real_som_19, c_real_som_20, c_real_som_21, c_real_som_22, c_real_som_23, c_real_som_24, c_real_som_25, c_real_som_26, c_real_som_27, c_real_som_28, c_real_som_29, c_real_som_30, c_real_som_31, c_real_som_32, c_real_som_33, c_real_som_34, c_real_som_35, c_real_som_36, c_real_som_37)) / 2 AS median_14_37 39 | -- max 40 | , GREATEST(c_real_som_14, c_real_som_15) AS max_14_15, GREATEST(c_real_som_14, c_real_som_15, c_real_som_16) AS max_14_16 41 | , GREATEST(c_real_som_14, c_real_som_15, c_real_som_16, c_real_som_17, c_real_som_18, c_real_som_19) AS max_14_19 42 | , GREATEST(c_real_som_14, c_real_som_15, c_real_som_16, c_real_som_17, c_real_som_18, c_real_som_19, c_real_som_20, c_real_som_21, c_real_som_22, c_real_som_23, c_real_som_24, c_real_som_25) AS max_14_25 43 | , greatest(c_real_som_14, c_real_som_15, c_real_som_16, c_real_som_17, c_real_som_18, c_real_som_19, c_real_som_20, c_real_som_21, c_real_som_22, c_real_som_23, c_real_som_24, c_real_som_25, c_real_som_26, c_real_som_27, c_real_som_28, c_real_som_29, c_real_som_30, c_real_som_31, c_real_som_32, c_real_som_33, c_real_som_34, c_real_som_35, c_real_som_36, c_real_som_37) as max_14_37 44 | -- min 45 | , least(c_real_som_14, c_real_som_15) AS min_14_15, least(c_real_som_14, c_real_som_15, c_real_som_16) AS min_14_16 46 | , least(c_real_som_14, c_real_som_15, c_real_som_16, c_real_som_17, c_real_som_18, c_real_som_19) AS min_14_19 47 | , least(c_real_som_14, c_real_som_15, c_real_som_16, c_real_som_17, c_real_som_18, c_real_som_19, c_real_som_20, c_real_som_21, c_real_som_22, c_real_som_23, c_real_som_24, c_real_som_25) AS min_14_25 48 | , least(c_real_som_14, c_real_som_15, c_real_som_16, c_real_som_17, c_real_som_18, c_real_som_19, c_real_som_20, c_real_som_21, c_real_som_22, c_real_som_23, c_real_som_24, c_real_som_25, c_real_som_26, c_real_som_27, c_real_som_28, c_real_som_29, c_real_som_30, c_real_som_31, c_real_som_32, c_real_som_33, c_real_som_34, c_real_som_35, c_real_som_36, c_real_som_37) as min_14_37 49 | -- avg 50 | , (c_real_som_14 + c_real_som_15)/2 AS avg_14_15, (c_real_som_14 + c_real_som_15 + c_real_som_16)/3 AS avg_14_16 51 | , (c_real_som_14 + c_real_som_15 + c_real_som_16 + c_real_som_17 + c_real_som_18 + c_real_som_19)/6 AS avg_14_19 52 | , (c_real_som_14 + c_real_som_15 + c_real_som_16 + c_real_som_17 + c_real_som_18 + c_real_som_19 + c_real_som_20 + c_real_som_21 + c_real_som_22 + c_real_som_23 + c_real_som_24 + c_real_som_25)/12 AS avg_14_25 53 | , (c_real_som_14 + c_real_som_15 + c_real_som_16 + c_real_som_17 + c_real_som_18 + c_real_som_19 + c_real_som_20 + c_real_som_21 + c_real_som_22 + c_real_som_23 + c_real_som_24 + c_real_som_25 + c_real_som_26 + c_real_som_27 + c_real_som_28 + c_real_som_29 + c_real_som_30 + c_real_som_31 + c_real_som_32 + c_real_som_33 + c_real_som_34 + c_real_som_35 + c_real_som_36 + c_real_som_37)/24 as avg_14_37 54 | 55 | FROM dr_c_real_ssm_2_36; 56 | 57 | -- 【】 58 | -- 去年本月的上述特征 59 | -- 上个月销量,比上,近三/四个月的均值/近半年的均值/近一年均值/近2年均值 60 | -- 均值换成中位数/最大值/最小值 61 | -- 各时段的最大值-最小值 62 | -- ------------- 63 | -- 去年这个月销量,比上,12~14/~15/~17/~23的mean\min\max\median 64 | CREATE TABLE whole_f7_1_step2 65 | AS 66 | SELECT * 67 | --今年的各种做差 68 | --X个月的median/max/min/avg - Y个月的 69 | --median - median 70 | , median_2_4 - median_2_7 as median_2_4__sub__median_2_7 71 | , median_2_4 - median_2_13 as median_2_4__sub__median_2_13 72 | , median_2_4 - median_2_25 as median_2_4__sub__median_2_25 73 | 74 | , median_2_7 - median_2_13 as median_2_7__sub__median_2_13 75 | , median_2_7 - median_2_25 as median_2_7__sub__median_2_25 76 | 77 | , median_2_13 - median_2_25 as median_2_13__sub__median_2_25 78 | 79 | --median - min 80 | , median_2_4 - min_2_7 as median_2_4__sub__min_2_7 81 | , median_2_4 - min_2_13 as median_2_4__sub__min_2_13 82 | , median_2_4 - min_2_25 as median_2_4__sub__min_2_25 83 | 84 | , median_2_7 - min_2_13 as median_2_7__sub__min_2_13 85 | , median_2_7 - min_2_25 as median_2_7__sub__min_2_25 86 | 87 | , median_2_13 - min_2_25 as median_2_13__sub__min_2_25 88 | 89 | --median - avg 90 | , median_2_4 - avg_2_7 as median_2_4__sub__avg_2_7 91 | , median_2_4 - avg_2_13 as median_2_4__sub__avg_2_13 92 | , median_2_4 - avg_2_25 as median_2_4__sub__avg_2_25 93 | 94 | , median_2_7 - avg_2_13 as median_2_7__sub__avg_2_13 95 | , median_2_7 - avg_2_25 as median_2_7__sub__avg_2_25 96 | 97 | , median_2_13 - avg_2_25 as median_2_13__sub__avg_2_25 98 | 99 | --max - max 100 | , max_2_4 - max_2_7 as max_2_4__sub__max_2_7 101 | , max_2_4 - max_2_13 as max_2_4__sub__max_2_13 102 | , max_2_4 - max_2_25 as max_2_4__sub__max_2_25 103 | 104 | , max_2_7 - max_2_13 as max_2_7__sub__max_2_13 105 | , max_2_7 - max_2_25 as max_2_7__sub__max_2_25 106 | 107 | , max_2_13 - max_2_25 as max_2_13__sub__max_2_25 108 | --max - avg 109 | , max_2_4 - avg_2_7 as max_2_4__sub__avg_2_7 110 | , max_2_4 - avg_2_13 as max_2_4__sub__avg_2_13 111 | , max_2_4 - avg_2_25 as max_2_4__sub__avg_2_25 112 | 113 | , max_2_7 - avg_2_13 as max_2_7__sub__avg_2_13 114 | , max_2_7 - avg_2_25 as max_2_7__sub__avg_2_25 115 | 116 | , max_2_13 - avg_2_25 as max_2_13__sub__avg_2_25 117 | --max - median 118 | , max_2_4 - median_2_7 as max_2_4__sub__median_2_7 119 | , max_2_4 - median_2_13 as max_2_4__sub__median_2_13 120 | , max_2_4 - median_2_25 as max_2_4__sub__median_2_25 121 | 122 | , max_2_7 - median_2_13 as max_2_7__sub__median_2_13 123 | , max_2_7 - median_2_25 as max_2_7__sub__median_2_25 124 | 125 | , max_2_13 - median_2_25 as max_2_13__sub__median_2_25 126 | --max - min 127 | , max_2_4 - min_2_7 as max_2_4__sub__min_2_7 128 | , max_2_4 - min_2_13 as max_2_4__sub__min_2_13 129 | , max_2_4 - min_2_25 as max_2_4__sub__min_2_25 130 | 131 | , max_2_7 - min_2_13 as max_2_7__sub__min_2_13 132 | , max_2_7 - min_2_25 as max_2_7__sub__min_2_25 133 | 134 | , max_2_13 - min_2_25 as max_2_13__sub__min_2_25 135 | 136 | --avg - min 137 | , avg_2_4 - min_2_7 as avg_2_4__sub__min_2_7 138 | , avg_2_4 - min_2_13 as avg_2_4__sub__min_2_13 139 | , avg_2_4 - min_2_25 as avg_2_4__sub__min_2_25 140 | 141 | , avg_2_7 - min_2_13 as avg_2_7__sub__min_2_13 142 | , avg_2_7 - min_2_25 as avg_2_7__sub__min_2_25 143 | 144 | , avg_2_13 - min_2_25 as avg_2_13__sub__min_2_25 145 | --avg - avg 146 | , avg_2_4 - avg_2_7 as avg_2_4__sub__avg_2_7 147 | , avg_2_4 - avg_2_13 as avg_2_4__sub__avg_2_13 148 | , avg_2_4 - avg_2_25 as avg_2_4__sub__avg_2_25 149 | 150 | , avg_2_7 - avg_2_13 as avg_2_7__sub__avg_2_13 151 | , avg_2_7 - avg_2_25 as avg_2_7__sub__avg_2_25 152 | 153 | , avg_2_13 - avg_2_25 as avg_2_13__sub__avg_2_25 154 | 155 | --min - min 156 | , min_2_4 - min_2_7 as min_2_4__sub__min_2_7 157 | , min_2_4 - min_2_13 as min_2_4__sub__min_2_13 158 | , min_2_4 - min_2_25 as min_2_4__sub__min_2_25 159 | 160 | , min_2_7 - min_2_13 as min_2_7__sub__min_2_13 161 | , min_2_7 - min_2_25 as min_2_7__sub__min_2_25 162 | 163 | , min_2_13 - min_2_25 as min_2_13__sub__min_2_25 164 | 165 | --去年的本月销量,与其他这些均值中位数之类的关系 166 | , c_real_som_12 - median_14_16 as c_real_som_12__sub__median_14_16 167 | , c_real_som_12 - median_14_19 as c_real_som_12__sub__median_14_19 168 | , c_real_som_12 - median_14_25 as c_real_som_12__sub__median_14_25 169 | , c_real_som_12 - median_14_37 as c_real_som_12__sub__median_14_37 170 | 171 | , c_real_som_12 - max_14_16 as c_real_som_12__sub__max_14_16 172 | , c_real_som_12 - max_14_19 as c_real_som_12__sub__max_14_19 173 | , c_real_som_12 - max_14_25 as c_real_som_12__sub__max_14_25 174 | , c_real_som_12 - max_14_37 as c_real_som_12__sub__max_14_37 175 | 176 | , c_real_som_12 - min_14_16 as c_real_som_12__sub__min_14_16 177 | , c_real_som_12 - min_14_19 as c_real_som_12__sub__min_14_19 178 | , c_real_som_12 - min_14_25 as c_real_som_12__sub__min_14_25 179 | , c_real_som_12 - min_14_37 as c_real_som_12__sub__min_14_37 180 | 181 | , c_real_som_12 - avg_14_16 as c_real_som_12__sub__avg_14_16 182 | , c_real_som_12 - avg_14_19 as c_real_som_12__sub__avg_14_19 183 | , c_real_som_12 - avg_14_25 as c_real_som_12__sub__avg_14_25 184 | , c_real_som_12 - avg_14_37 as c_real_som_12__sub__avg_14_37 185 | 186 | --【平滑除法】 187 | --median +1)/(1+ median 188 | , (median_2_4 +1)/(1+ median_2_7) as median_2_4__smoothdivby__median_2_7 189 | , (median_2_4 +1)/(1+ median_2_13) as median_2_4__smoothdivby__median_2_13 190 | , (median_2_4 +1)/(1+ median_2_25) as median_2_4__smoothdivby__median_2_25 191 | 192 | , (median_2_7 +1)/(1+ median_2_13) as median_2_7__smoothdivby__median_2_13 193 | , (median_2_7 +1)/(1+ median_2_25) as median_2_7__smoothdivby__median_2_25 194 | 195 | , (median_2_13 +1)/(1+ median_2_25) as median_2_13__smoothdivby__median_2_25 196 | 197 | --median +1)/(1+ min 198 | , (median_2_4 +1)/(1+ min_2_7) as median_2_4__smoothdivby__min_2_7 199 | , (median_2_4 +1)/(1+ min_2_13) as median_2_4__smoothdivby__min_2_13 200 | , (median_2_4 +1)/(1+ min_2_25) as median_2_4__smoothdivby__min_2_25 201 | 202 | , (median_2_7 +1)/(1+ min_2_13) as median_2_7__smoothdivby__min_2_13 203 | , (median_2_7 +1)/(1+ min_2_25) as median_2_7__smoothdivby__min_2_25 204 | 205 | , (median_2_13 +1)/(1+ min_2_25) as median_2_13__smoothdivby__min_2_25 206 | 207 | --median +1)/(1+ avg 208 | , (median_2_4 +1)/(1+ avg_2_7) as median_2_4__smoothdivby__avg_2_7 209 | , (median_2_4 +1)/(1+ avg_2_13) as median_2_4__smoothdivby__avg_2_13 210 | , (median_2_4 +1)/(1+ avg_2_25) as median_2_4__smoothdivby__avg_2_25 211 | 212 | , (median_2_7 +1)/(1+ avg_2_13) as median_2_7__smoothdivby__avg_2_13 213 | , (median_2_7 +1)/(1+ avg_2_25) as median_2_7__smoothdivby__avg_2_25 214 | 215 | , (median_2_13 +1)/(1+ avg_2_25) as median_2_13__smoothdivby__avg_2_25 216 | 217 | --max +1)/(1+ max 218 | , (max_2_4 +1)/(1+ max_2_7) as max_2_4__smoothdivby__max_2_7 219 | , (max_2_4 +1)/(1+ max_2_13) as max_2_4__smoothdivby__max_2_13 220 | , (max_2_4 +1)/(1+ max_2_25) as max_2_4__smoothdivby__max_2_25 221 | 222 | , (max_2_7 +1)/(1+ max_2_13) as max_2_7__smoothdivby__max_2_13 223 | , (max_2_7 +1)/(1+ max_2_25) as max_2_7__smoothdivby__max_2_25 224 | 225 | , (max_2_13 +1)/(1+ max_2_25) as max_2_13__smoothdivby__max_2_25 226 | --max +1)/(1+ avg 227 | , (max_2_4 +1)/(1+ avg_2_7) as max_2_4__smoothdivby__avg_2_7 228 | , (max_2_4 +1)/(1+ avg_2_13) as max_2_4__smoothdivby__avg_2_13 229 | , (max_2_4 +1)/(1+ avg_2_25) as max_2_4__smoothdivby__avg_2_25 230 | 231 | , (max_2_7 +1)/(1+ avg_2_13) as max_2_7__smoothdivby__avg_2_13 232 | , (max_2_7 +1)/(1+ avg_2_25) as max_2_7__smoothdivby__avg_2_25 233 | 234 | , (max_2_13 +1)/(1+ avg_2_25) as max_2_13__smoothdivby__avg_2_25 235 | --max +1)/(1+ median 236 | , (max_2_4 +1)/(1+ median_2_7) as max_2_4__smoothdivby__median_2_7 237 | , (max_2_4 +1)/(1+ median_2_13) as max_2_4__smoothdivby__median_2_13 238 | , (max_2_4 +1)/(1+ median_2_25) as max_2_4__smoothdivby__median_2_25 239 | 240 | , (max_2_7 +1)/(1+ median_2_13) as max_2_7__smoothdivby__median_2_13 241 | , (max_2_7 +1)/(1+ median_2_25) as max_2_7__smoothdivby__median_2_25 242 | 243 | , (max_2_13 +1)/(1+ median_2_25) as max_2_13__smoothdivby__median_2_25 244 | --max +1)/(1+ min 245 | , (max_2_4 +1)/(1+ min_2_7) as max_2_4__smoothdivby__min_2_7 246 | , (max_2_4 +1)/(1+ min_2_13) as max_2_4__smoothdivby__min_2_13 247 | , (max_2_4 +1)/(1+ min_2_25) as max_2_4__smoothdivby__min_2_25 248 | 249 | , (max_2_7 +1)/(1+ min_2_13) as max_2_7__smoothdivby__min_2_13 250 | , (max_2_7 +1)/(1+ min_2_25) as max_2_7__smoothdivby__min_2_25 251 | 252 | , (max_2_13 +1)/(1+ min_2_25) as max_2_13__smoothdivby__min_2_25 253 | 254 | --avg +1)/(1+ min 255 | , (avg_2_4 +1)/(1+ min_2_7) as avg_2_4__smoothdivby__min_2_7 256 | , (avg_2_4 +1)/(1+ min_2_13) as avg_2_4__smoothdivby__min_2_13 257 | , (avg_2_4 +1)/(1+ min_2_25) as avg_2_4__smoothdivby__min_2_25 258 | 259 | , (avg_2_7 +1)/(1+ min_2_13) as avg_2_7__smoothdivby__min_2_13 260 | , (avg_2_7 +1)/(1+ min_2_25) as avg_2_7__smoothdivby__min_2_25 261 | 262 | , (avg_2_13 +1)/(1+ min_2_25) as avg_2_13__smoothdivby__min_2_25 263 | --avg +1)/(1+ avg 264 | , (avg_2_4 +1)/(1+ avg_2_7) as avg_2_4__smoothdivby__avg_2_7 265 | , (avg_2_4 +1)/(1+ avg_2_13) as avg_2_4__smoothdivby__avg_2_13 266 | , (avg_2_4 +1)/(1+ avg_2_25) as avg_2_4__smoothdivby__avg_2_25 267 | 268 | , (avg_2_7 +1)/(1+ avg_2_13) as avg_2_7__smoothdivby__avg_2_13 269 | , (avg_2_7 +1)/(1+ avg_2_25) as avg_2_7__smoothdivby__avg_2_25 270 | 271 | , (avg_2_13 +1)/(1+ avg_2_25) as avg_2_13__smoothdivby__avg_2_25 272 | 273 | --min +1)/(1+ min 274 | , (min_2_4 +1)/(1+ min_2_7) as min_2_4__smoothdivby__min_2_7 275 | , (min_2_4 +1)/(1+ min_2_13) as min_2_4__smoothdivby__min_2_13 276 | , (min_2_4 +1)/(1+ min_2_25) as min_2_4__smoothdivby__min_2_25 277 | 278 | , (min_2_7 +1)/(1+ min_2_13) as min_2_7__smoothdivby__min_2_13 279 | , (min_2_7 +1)/(1+ min_2_25) as min_2_7__smoothdivby__min_2_25 280 | 281 | , (min_2_13 +1)/(1+ min_2_25) as min_2_13__smoothdivby__min_2_25 282 | 283 | --去年的本月销量,与其他这些均值中位数之类的关系 284 | , (c_real_som_12 +1)/(1+ median_14_16) as c_real_som_12__smoothdivby__median_14_16 285 | , (c_real_som_12 +1)/(1+ median_14_19) as c_real_som_12__smoothdivby__median_14_19 286 | , (c_real_som_12 +1)/(1+ median_14_25) as c_real_som_12__smoothdivby__median_14_25 287 | , (c_real_som_12 +1)/(1+ median_14_37) as c_real_som_12__smoothdivby__median_14_37 288 | 289 | , (c_real_som_12 +1)/(1+ max_14_16) as c_real_som_12__smoothdivby__max_14_16 290 | , (c_real_som_12 +1)/(1+ max_14_19) as c_real_som_12__smoothdivby__max_14_19 291 | , (c_real_som_12 +1)/(1+ max_14_25) as c_real_som_12__smoothdivby__max_14_25 292 | , (c_real_som_12 +1)/(1+ max_14_37) as c_real_som_12__smoothdivby__max_14_37 293 | 294 | , (c_real_som_12 +1)/(1+ min_14_16) as c_real_som_12__smoothdivby__min_14_16 295 | , (c_real_som_12 +1)/(1+ min_14_19) as c_real_som_12__smoothdivby__min_14_19 296 | , (c_real_som_12 +1)/(1+ min_14_25) as c_real_som_12__smoothdivby__min_14_25 297 | , (c_real_som_12 +1)/(1+ min_14_37) as c_real_som_12__smoothdivby__min_14_37 298 | 299 | , (c_real_som_12 +1)/(1+ avg_14_16) as c_real_som_12__smoothdivby__avg_14_16 300 | , (c_real_som_12 +1)/(1+ avg_14_19) as c_real_som_12__smoothdivby__avg_14_19 301 | , (c_real_som_12 +1)/(1+ avg_14_25) as c_real_som_12__smoothdivby__avg_14_25 302 | , (c_real_som_12 +1)/(1+ avg_14_37) as c_real_som_12__smoothdivby__avg_14_37 303 | 304 | FROM whole_f7_1_step1; -------------------------------------------------------------------------------- /代码/特征工程/7_2_省市车类型交叉.txt: -------------------------------------------------------------------------------- 1 | -- 省市车型,在过去2年/过去一年/过去半年/过去三个月中的销量mean/min/max/median 2 | DROP TABLE IF EXISTS whole_f7_2_step1; 3 | 4 | DROP TABLE IF EXISTS whole_f7_2_step2; 5 | 6 | CREATE TABLE whole_f7_2_step1 7 | AS 8 | -- SELECT province_id, city_id, class_id, sale_date, real_sale 9 | -- , c_real_som_type_2, c_real_som_type_12, c_real_ssm_type_2_3, c_real_ssm_type_2_7, c_real_ssm_type_2_13 10 | -- , c_real_ssm_type_2_25 -- 近X个月的median/max/min/avg 11 | select * 12 | -- median 13 | , ordinal(2, c_real_som_type_2, c_real_som_type_3, c_real_som_type_4) AS median_2_4_type 14 | , (ordinal(3, c_real_som_type_2, c_real_som_type_3, c_real_som_type_4, c_real_som_type_5, c_real_som_type_6, c_real_som_type_7) + ordinal(4, c_real_som_type_2, c_real_som_type_3, c_real_som_type_4, c_real_som_type_5, c_real_som_type_6, c_real_som_type_7)) / 2 AS median_2_7_type 15 | , (ordinal(6, c_real_som_type_2, c_real_som_type_3, c_real_som_type_4, c_real_som_type_5, c_real_som_type_6, c_real_som_type_7, c_real_som_type_8, c_real_som_type_9, c_real_som_type_10, c_real_som_type_11, c_real_som_type_12, c_real_som_type_13) + ordinal(7, c_real_som_type_2, c_real_som_type_3, c_real_som_type_4, c_real_som_type_5, c_real_som_type_6, c_real_som_type_7, c_real_som_type_8, c_real_som_type_9, c_real_som_type_10, c_real_som_type_11, c_real_som_type_12, c_real_som_type_13)) / 2 AS median_2_13_type 16 | , (ordinal(12, c_real_som_type_2, c_real_som_type_3, c_real_som_type_4, c_real_som_type_5, c_real_som_type_6, c_real_som_type_7, c_real_som_type_8, c_real_som_type_9, c_real_som_type_10, c_real_som_type_11, c_real_som_type_12, c_real_som_type_13, c_real_som_type_14, c_real_som_type_15, c_real_som_type_16, c_real_som_type_17, c_real_som_type_18, c_real_som_type_19, c_real_som_type_20, c_real_som_type_21, c_real_som_type_22, c_real_som_type_23, c_real_som_type_24, c_real_som_type_25) + ordinal(13, c_real_som_type_2, c_real_som_type_3, c_real_som_type_4, c_real_som_type_5, c_real_som_type_6, c_real_som_type_7, c_real_som_type_8, c_real_som_type_9, c_real_som_type_10, c_real_som_type_11, c_real_som_type_12, c_real_som_type_13, c_real_som_type_14, c_real_som_type_15, c_real_som_type_16, c_real_som_type_17, c_real_som_type_18, c_real_som_type_19, c_real_som_type_20, c_real_som_type_21, c_real_som_type_22, c_real_som_type_23, c_real_som_type_24, c_real_som_type_25)) / 2 AS median_2_25_type 17 | -- max 18 | , GREATEST(c_real_som_type_2, c_real_som_type_3) AS max_2_3_type, GREATEST(c_real_som_type_2, c_real_som_type_3, c_real_som_type_4) AS max_2_4_type 19 | , GREATEST(c_real_som_type_2, c_real_som_type_3, c_real_som_type_4, c_real_som_type_5, c_real_som_type_6, c_real_som_type_7) AS max_2_7_type 20 | , GREATEST(c_real_som_type_2, c_real_som_type_3, c_real_som_type_4, c_real_som_type_5, c_real_som_type_6, c_real_som_type_7, c_real_som_type_8, c_real_som_type_9, c_real_som_type_10, c_real_som_type_11, c_real_som_type_12, c_real_som_type_13) AS max_2_13_type 21 | , greatest(c_real_som_type_2, c_real_som_type_3, c_real_som_type_4, c_real_som_type_5, c_real_som_type_6, c_real_som_type_7, c_real_som_type_8, c_real_som_type_9, c_real_som_type_10, c_real_som_type_11, c_real_som_type_12, c_real_som_type_13, c_real_som_type_14, c_real_som_type_15, c_real_som_type_16, c_real_som_type_17, c_real_som_type_18, c_real_som_type_19, c_real_som_type_20, c_real_som_type_21, c_real_som_type_22, c_real_som_type_23, c_real_som_type_24, c_real_som_type_25) as max_2_25_type 22 | -- min 23 | , least(c_real_som_type_2, c_real_som_type_3) AS min_2_3_type, least(c_real_som_type_2, c_real_som_type_3, c_real_som_type_4) AS min_2_4_type 24 | , least(c_real_som_type_2, c_real_som_type_3, c_real_som_type_4, c_real_som_type_5, c_real_som_type_6, c_real_som_type_7) AS min_2_7_type 25 | , least(c_real_som_type_2, c_real_som_type_3, c_real_som_type_4, c_real_som_type_5, c_real_som_type_6, c_real_som_type_7, c_real_som_type_8, c_real_som_type_9, c_real_som_type_10, c_real_som_type_11, c_real_som_type_12, c_real_som_type_13) AS min_2_13_type 26 | , least(c_real_som_type_2, c_real_som_type_3, c_real_som_type_4, c_real_som_type_5, c_real_som_type_6, c_real_som_type_7, c_real_som_type_8, c_real_som_type_9, c_real_som_type_10, c_real_som_type_11, c_real_som_type_12, c_real_som_type_13, c_real_som_type_14, c_real_som_type_15, c_real_som_type_16, c_real_som_type_17, c_real_som_type_18, c_real_som_type_19, c_real_som_type_20, c_real_som_type_21, c_real_som_type_22, c_real_som_type_23, c_real_som_type_24, c_real_som_type_25) as min_2_25_type 27 | -- avg 28 | , (c_real_som_type_2 + c_real_som_type_3)/2 AS avg_2_3_type, (c_real_som_type_2 + c_real_som_type_3 + c_real_som_type_4)/3 AS avg_2_4_type 29 | , (c_real_som_type_2 + c_real_som_type_3 + c_real_som_type_4 + c_real_som_type_5 + c_real_som_type_6 + c_real_som_type_7)/6 AS avg_2_7_type 30 | , (c_real_som_type_2 + c_real_som_type_3 + c_real_som_type_4 + c_real_som_type_5 + c_real_som_type_6 + c_real_som_type_7 + c_real_som_type_8 + c_real_som_type_9 + c_real_som_type_10 + c_real_som_type_11 + c_real_som_type_12 + c_real_som_type_13)/12 AS avg_2_13_type 31 | , (c_real_som_type_2 + c_real_som_type_3 + c_real_som_type_4 + c_real_som_type_5 + c_real_som_type_6 + c_real_som_type_7 + c_real_som_type_8 + c_real_som_type_9 + c_real_som_type_10 + c_real_som_type_11 + c_real_som_type_12 + c_real_som_type_13 + c_real_som_type_14 + c_real_som_type_15 + c_real_som_type_16 + c_real_som_type_17 + c_real_som_type_18 + c_real_som_type_19 + c_real_som_type_20 + c_real_som_type_21 + c_real_som_type_22 + c_real_som_type_23 + c_real_som_type_24 + c_real_som_type_25)/24 as avg_2_25_type 32 | 33 | --上一年的 34 | -- median 35 | , ordinal(2, c_real_som_type_14, c_real_som_type_15, c_real_som_type_16) AS median_14_16_type 36 | , (ordinal(3, c_real_som_type_14, c_real_som_type_15, c_real_som_type_16, c_real_som_type_17, c_real_som_type_18, c_real_som_type_19) + ordinal(4, c_real_som_type_14, c_real_som_type_15, c_real_som_type_16, c_real_som_type_17, c_real_som_type_18, c_real_som_type_19)) / 2 AS median_14_19_type 37 | , (ordinal(6, c_real_som_type_14, c_real_som_type_15, c_real_som_type_16, c_real_som_type_17, c_real_som_type_18, c_real_som_type_19, c_real_som_type_20, c_real_som_type_21, c_real_som_type_22, c_real_som_type_23, c_real_som_type_24, c_real_som_type_25) + ordinal(7, c_real_som_type_14, c_real_som_type_15, c_real_som_type_16, c_real_som_type_17, c_real_som_type_18, c_real_som_type_19, c_real_som_type_20, c_real_som_type_21, c_real_som_type_22, c_real_som_type_23, c_real_som_type_24, c_real_som_type_25)) / 2 AS median_14_25_type 38 | , (ordinal(12, c_real_som_type_14, c_real_som_type_15, c_real_som_type_16, c_real_som_type_17, c_real_som_type_18, c_real_som_type_19, c_real_som_type_20, c_real_som_type_21, c_real_som_type_22, c_real_som_type_23, c_real_som_type_24, c_real_som_type_25, c_real_som_type_26, c_real_som_type_27, c_real_som_type_28, c_real_som_type_29, c_real_som_type_30, c_real_som_type_31, c_real_som_type_32, c_real_som_type_33, c_real_som_type_34, c_real_som_type_35, c_real_som_type_36, c_real_som_type_37) + ordinal(13, c_real_som_type_14, c_real_som_type_15, c_real_som_type_16, c_real_som_type_17, c_real_som_type_18, c_real_som_type_19, c_real_som_type_20, c_real_som_type_21, c_real_som_type_22, c_real_som_type_23, c_real_som_type_24, c_real_som_type_25, c_real_som_type_26, c_real_som_type_27, c_real_som_type_28, c_real_som_type_29, c_real_som_type_30, c_real_som_type_31, c_real_som_type_32, c_real_som_type_33, c_real_som_type_34, c_real_som_type_35, c_real_som_type_36, c_real_som_type_37)) / 2 AS median_14_37_type 39 | -- max 40 | , GREATEST(c_real_som_type_14, c_real_som_type_15) AS max_14_15_type, GREATEST(c_real_som_type_14, c_real_som_type_15, c_real_som_type_16) AS max_14_16_type 41 | , GREATEST(c_real_som_type_14, c_real_som_type_15, c_real_som_type_16, c_real_som_type_17, c_real_som_type_18, c_real_som_type_19) AS max_14_19_type 42 | , GREATEST(c_real_som_type_14, c_real_som_type_15, c_real_som_type_16, c_real_som_type_17, c_real_som_type_18, c_real_som_type_19, c_real_som_type_20, c_real_som_type_21, c_real_som_type_22, c_real_som_type_23, c_real_som_type_24, c_real_som_type_25) AS max_14_25_type 43 | , greatest(c_real_som_type_14, c_real_som_type_15, c_real_som_type_16, c_real_som_type_17, c_real_som_type_18, c_real_som_type_19, c_real_som_type_20, c_real_som_type_21, c_real_som_type_22, c_real_som_type_23, c_real_som_type_24, c_real_som_type_25, c_real_som_type_26, c_real_som_type_27, c_real_som_type_28, c_real_som_type_29, c_real_som_type_30, c_real_som_type_31, c_real_som_type_32, c_real_som_type_33, c_real_som_type_34, c_real_som_type_35, c_real_som_type_36, c_real_som_type_37) as max_14_37_type 44 | -- min 45 | , least(c_real_som_type_14, c_real_som_type_15) AS min_14_15_type, least(c_real_som_type_14, c_real_som_type_15, c_real_som_type_16) AS min_14_16_type 46 | , least(c_real_som_type_14, c_real_som_type_15, c_real_som_type_16, c_real_som_type_17, c_real_som_type_18, c_real_som_type_19) AS min_14_19_type 47 | , least(c_real_som_type_14, c_real_som_type_15, c_real_som_type_16, c_real_som_type_17, c_real_som_type_18, c_real_som_type_19, c_real_som_type_20, c_real_som_type_21, c_real_som_type_22, c_real_som_type_23, c_real_som_type_24, c_real_som_type_25) AS min_14_25_type 48 | , least(c_real_som_type_14, c_real_som_type_15, c_real_som_type_16, c_real_som_type_17, c_real_som_type_18, c_real_som_type_19, c_real_som_type_20, c_real_som_type_21, c_real_som_type_22, c_real_som_type_23, c_real_som_type_24, c_real_som_type_25, c_real_som_type_26, c_real_som_type_27, c_real_som_type_28, c_real_som_type_29, c_real_som_type_30, c_real_som_type_31, c_real_som_type_32, c_real_som_type_33, c_real_som_type_34, c_real_som_type_35, c_real_som_type_36, c_real_som_type_37) as min_14_37_type 49 | -- avg 50 | , (c_real_som_type_14 + c_real_som_type_15)/2 AS avg_14_15_type, (c_real_som_type_14 + c_real_som_type_15 + c_real_som_type_16)/3 AS avg_14_16_type 51 | , (c_real_som_type_14 + c_real_som_type_15 + c_real_som_type_16 + c_real_som_type_17 + c_real_som_type_18 + c_real_som_type_19)/6 AS avg_14_19_type 52 | , (c_real_som_type_14 + c_real_som_type_15 + c_real_som_type_16 + c_real_som_type_17 + c_real_som_type_18 + c_real_som_type_19 + c_real_som_type_20 + c_real_som_type_21 + c_real_som_type_22 + c_real_som_type_23 + c_real_som_type_24 + c_real_som_type_25)/12 AS avg_14_25_type 53 | , (c_real_som_type_14 + c_real_som_type_15 + c_real_som_type_16 + c_real_som_type_17 + c_real_som_type_18 + c_real_som_type_19 + c_real_som_type_20 + c_real_som_type_21 + c_real_som_type_22 + c_real_som_type_23 + c_real_som_type_24 + c_real_som_type_25 + c_real_som_type_26 + c_real_som_type_27 + c_real_som_type_28 + c_real_som_type_29 + c_real_som_type_30 + c_real_som_type_31 + c_real_som_type_32 + c_real_som_type_33 + c_real_som_type_34 + c_real_som_type_35 + c_real_som_type_36 + c_real_som_type_37)/24 as avg_14_37_type 54 | 55 | FROM dr_c_real_ssm_type_2_36; 56 | 57 | -- 【】 58 | -- 去年本月的上述特征 59 | -- 上个月销量,比上,近三/四个月的均值/近半年的均值/近一年均值/近2年均值 60 | -- 均值换成中位数/最大值/最小值 61 | -- 各时段的最大值-最小值 62 | -- ------------- 63 | -- 去年这个月销量,比上,12~14/~15/~17/~23的mean\min\max\median 64 | CREATE TABLE whole_f7_2_step2 65 | AS 66 | SELECT * 67 | --今年的各种做差 68 | --X个月的median/max/min/avg - Y个月的 69 | --median - median 70 | , median_2_4_type - median_2_7_type as median_2_4__sub__median_2_7_type 71 | , median_2_4_type - median_2_13_type as median_2_4__sub__median_2_13_type 72 | , median_2_4_type - median_2_25_type as median_2_4__sub__median_2_25_type 73 | 74 | , median_2_7_type - median_2_13_type as median_2_7__sub__median_2_13_type 75 | , median_2_7_type - median_2_25_type as median_2_7__sub__median_2_25_type 76 | 77 | , median_2_13_type - median_2_25_type as median_2_13__sub__median_2_25_type 78 | 79 | --median - min 80 | , median_2_4_type - min_2_7_type as median_2_4__sub__min_2_7_type 81 | , median_2_4_type - min_2_13_type as median_2_4__sub__min_2_13_type 82 | , median_2_4_type - min_2_25_type as median_2_4__sub__min_2_25_type 83 | 84 | , median_2_7_type - min_2_13_type as median_2_7__sub__min_2_13_type 85 | , median_2_7_type - min_2_25_type as median_2_7__sub__min_2_25_type 86 | 87 | , median_2_13_type - min_2_25_type as median_2_13__sub__min_2_25_type 88 | 89 | --median - avg 90 | , median_2_4_type - avg_2_7_type as median_2_4__sub__avg_2_7_type 91 | , median_2_4_type - avg_2_13_type as median_2_4__sub__avg_2_13_type 92 | , median_2_4_type - avg_2_25_type as median_2_4__sub__avg_2_25_type 93 | 94 | , median_2_7_type - avg_2_13_type as median_2_7__sub__avg_2_13_type 95 | , median_2_7_type - avg_2_25_type as median_2_7__sub__avg_2_25_type 96 | 97 | , median_2_13_type - avg_2_25_type as median_2_13__sub__avg_2_25_type 98 | 99 | --max - max 100 | , max_2_4_type - max_2_7_type as max_2_4__sub__max_2_7_type 101 | , max_2_4_type - max_2_13_type as max_2_4__sub__max_2_13_type 102 | , max_2_4_type - max_2_25_type as max_2_4__sub__max_2_25_type 103 | 104 | , max_2_7_type - max_2_13_type as max_2_7__sub__max_2_13_type 105 | , max_2_7_type - max_2_25_type as max_2_7__sub__max_2_25_type 106 | 107 | , max_2_13_type - max_2_25_type as max_2_13__sub__max_2_25_type 108 | --max - avg 109 | , max_2_4_type - avg_2_7_type as max_2_4__sub__avg_2_7_type 110 | , max_2_4_type - avg_2_13_type as max_2_4__sub__avg_2_13_type 111 | , max_2_4_type - avg_2_25_type as max_2_4__sub__avg_2_25_type 112 | 113 | , max_2_7_type - avg_2_13_type as max_2_7__sub__avg_2_13_type 114 | , max_2_7_type - avg_2_25_type as max_2_7__sub__avg_2_25_type 115 | 116 | , max_2_13_type - avg_2_25_type as max_2_13__sub__avg_2_25_type 117 | --max - median 118 | , max_2_4_type - median_2_7_type as max_2_4__sub__median_2_7_type 119 | , max_2_4_type - median_2_13_type as max_2_4__sub__median_2_13_type 120 | , max_2_4_type - median_2_25_type as max_2_4__sub__median_2_25_type 121 | 122 | , max_2_7_type - median_2_13_type as max_2_7__sub__median_2_13_type 123 | , max_2_7_type - median_2_25_type as max_2_7__sub__median_2_25_type 124 | 125 | , max_2_13_type - median_2_25_type as max_2_13__sub__median_2_25_type 126 | --max - min 127 | , max_2_4_type - min_2_7_type as max_2_4__sub__min_2_7_type 128 | , max_2_4_type - min_2_13_type as max_2_4__sub__min_2_13_type 129 | , max_2_4_type - min_2_25_type as max_2_4__sub__min_2_25_type 130 | 131 | , max_2_7_type - min_2_13_type as max_2_7__sub__min_2_13_type 132 | , max_2_7_type - min_2_25_type as max_2_7__sub__min_2_25_type 133 | 134 | , max_2_13_type - min_2_25_type as max_2_13__sub__min_2_25_type 135 | 136 | --avg - min 137 | , avg_2_4_type - min_2_7_type as avg_2_4__sub__min_2_7_type 138 | , avg_2_4_type - min_2_13_type as avg_2_4__sub__min_2_13_type 139 | , avg_2_4_type - min_2_25_type as avg_2_4__sub__min_2_25_type 140 | 141 | , avg_2_7_type - min_2_13_type as avg_2_7__sub__min_2_13_type 142 | , avg_2_7_type - min_2_25_type as avg_2_7__sub__min_2_25_type 143 | 144 | , avg_2_13_type - min_2_25_type as avg_2_13__sub__min_2_25_type 145 | --avg - avg 146 | , avg_2_4_type - avg_2_7_type as avg_2_4__sub__avg_2_7_type 147 | , avg_2_4_type - avg_2_13_type as avg_2_4__sub__avg_2_13_type 148 | , avg_2_4_type - avg_2_25_type as avg_2_4__sub__avg_2_25_type 149 | 150 | , avg_2_7_type - avg_2_13_type as avg_2_7__sub__avg_2_13_type 151 | , avg_2_7_type - avg_2_25_type as avg_2_7__sub__avg_2_25_type 152 | 153 | , avg_2_13_type - avg_2_25_type as avg_2_13__sub__avg_2_25_type 154 | 155 | --min - min 156 | , min_2_4_type - min_2_7_type as min_2_4__sub__min_2_7_type 157 | , min_2_4_type - min_2_13_type as min_2_4__sub__min_2_13_type 158 | , min_2_4_type - min_2_25_type as min_2_4__sub__min_2_25_type 159 | 160 | , min_2_7_type - min_2_13_type as min_2_7__sub__min_2_13_type 161 | , min_2_7_type - min_2_25_type as min_2_7__sub__min_2_25_type 162 | 163 | , min_2_13_type - min_2_25_type as min_2_13__sub__min_2_25_type 164 | 165 | --去年的本月销量,与其他这些均值中位数之类的关系 166 | , c_real_som_type_12 - median_14_16_type as c_real_som_type_12__sub__median_14_16_type 167 | , c_real_som_type_12 - median_14_19_type as c_real_som_type_12__sub__median_14_19_type 168 | , c_real_som_type_12 - median_14_25_type as c_real_som_type_12__sub__median_14_25_type 169 | , c_real_som_type_12 - median_14_37_type as c_real_som_type_12__sub__median_14_37_type 170 | 171 | , c_real_som_type_12 - max_14_16_type as c_real_som_type_12__sub__max_14_16_type 172 | , c_real_som_type_12 - max_14_19_type as c_real_som_type_12__sub__max_14_19_type 173 | , c_real_som_type_12 - max_14_25_type as c_real_som_type_12__sub__max_14_25_type 174 | , c_real_som_type_12 - max_14_37_type as c_real_som_type_12__sub__max_14_37_type 175 | 176 | , c_real_som_type_12 - min_14_16_type as c_real_som_type_12__sub__min_14_16_type 177 | , c_real_som_type_12 - min_14_19_type as c_real_som_type_12__sub__min_14_19_type 178 | , c_real_som_type_12 - min_14_25_type as c_real_som_type_12__sub__min_14_25_type 179 | , c_real_som_type_12 - min_14_37_type as c_real_som_type_12__sub__min_14_37_type 180 | 181 | , c_real_som_type_12 - avg_14_16_type as c_real_som_type_12__sub__avg_14_16_type 182 | , c_real_som_type_12 - avg_14_19_type as c_real_som_type_12__sub__avg_14_19_type 183 | , c_real_som_type_12 - avg_14_25_type as c_real_som_type_12__sub__avg_14_25_type 184 | , c_real_som_type_12 - avg_14_37_type as c_real_som_type_12__sub__avg_14_37_type 185 | 186 | --【平滑除法】 187 | --median +1)/(1+ median 188 | , (median_2_4_type +1)/(1+ median_2_7_type) as median_2_4__smoothdivby__median_2_7_type 189 | , (median_2_4_type +1)/(1+ median_2_13_type) as median_2_4__smoothdivby__median_2_13_type 190 | , (median_2_4_type +1)/(1+ median_2_25_type) as median_2_4__smoothdivby__median_2_25_type 191 | 192 | , (median_2_7_type +1)/(1+ median_2_13_type) as median_2_7__smoothdivby__median_2_13_type 193 | , (median_2_7_type +1)/(1+ median_2_25_type) as median_2_7__smoothdivby__median_2_25_type 194 | 195 | , (median_2_13_type +1)/(1+ median_2_25_type) as median_2_13__smoothdivby__median_2_25_type 196 | 197 | --median +1)/(1+ min 198 | , (median_2_4_type +1)/(1+ min_2_7_type) as median_2_4__smoothdivby__min_2_7_type 199 | , (median_2_4_type +1)/(1+ min_2_13_type) as median_2_4__smoothdivby__min_2_13_type 200 | , (median_2_4_type +1)/(1+ min_2_25_type) as median_2_4__smoothdivby__min_2_25_type 201 | 202 | , (median_2_7_type +1)/(1+ min_2_13_type) as median_2_7__smoothdivby__min_2_13_type 203 | , (median_2_7_type +1)/(1+ min_2_25_type) as median_2_7__smoothdivby__min_2_25_type 204 | 205 | , (median_2_13_type +1)/(1+ min_2_25_type) as median_2_13__smoothdivby__min_2_25_type 206 | 207 | --median +1)/(1+ avg 208 | , (median_2_4_type +1)/(1+ avg_2_7_type) as median_2_4__smoothdivby__avg_2_7_type 209 | , (median_2_4_type +1)/(1+ avg_2_13_type) as median_2_4__smoothdivby__avg_2_13_type 210 | , (median_2_4_type +1)/(1+ avg_2_25_type) as median_2_4__smoothdivby__avg_2_25_type 211 | 212 | , (median_2_7_type +1)/(1+ avg_2_13_type) as median_2_7__smoothdivby__avg_2_13_type 213 | , (median_2_7_type +1)/(1+ avg_2_25_type) as median_2_7__smoothdivby__avg_2_25_type 214 | 215 | , (median_2_13_type +1)/(1+ avg_2_25_type) as median_2_13__smoothdivby__avg_2_25_type 216 | 217 | --max +1)/(1+ max 218 | , (max_2_4_type +1)/(1+ max_2_7_type) as max_2_4__smoothdivby__max_2_7_type 219 | , (max_2_4_type +1)/(1+ max_2_13_type) as max_2_4__smoothdivby__max_2_13_type 220 | , (max_2_4_type +1)/(1+ max_2_25_type) as max_2_4__smoothdivby__max_2_25_type 221 | 222 | , (max_2_7_type +1)/(1+ max_2_13_type) as max_2_7__smoothdivby__max_2_13_type 223 | , (max_2_7_type +1)/(1+ max_2_25_type) as max_2_7__smoothdivby__max_2_25_type 224 | 225 | , (max_2_13_type +1)/(1+ max_2_25_type) as max_2_13__smoothdivby__max_2_25_type 226 | --max +1)/(1+ avg 227 | , (max_2_4_type +1)/(1+ avg_2_7_type) as max_2_4__smoothdivby__avg_2_7_type 228 | , (max_2_4_type +1)/(1+ avg_2_13_type) as max_2_4__smoothdivby__avg_2_13_type 229 | , (max_2_4_type +1)/(1+ avg_2_25_type) as max_2_4__smoothdivby__avg_2_25_type 230 | 231 | , (max_2_7_type +1)/(1+ avg_2_13_type) as max_2_7__smoothdivby__avg_2_13_type 232 | , (max_2_7_type +1)/(1+ avg_2_25_type) as max_2_7__smoothdivby__avg_2_25_type 233 | 234 | , (max_2_13_type +1)/(1+ avg_2_25_type) as max_2_13__smoothdivby__avg_2_25_type 235 | --max +1)/(1+ median 236 | , (max_2_4_type +1)/(1+ median_2_7_type) as max_2_4__smoothdivby__median_2_7_type 237 | , (max_2_4_type +1)/(1+ median_2_13_type) as max_2_4__smoothdivby__median_2_13_type 238 | , (max_2_4_type +1)/(1+ median_2_25_type) as max_2_4__smoothdivby__median_2_25_type 239 | 240 | , (max_2_7_type +1)/(1+ median_2_13_type) as max_2_7__smoothdivby__median_2_13_type 241 | , (max_2_7_type +1)/(1+ median_2_25_type) as max_2_7__smoothdivby__median_2_25_type 242 | 243 | , (max_2_13_type +1)/(1+ median_2_25_type) as max_2_13__smoothdivby__median_2_25_type 244 | --max +1)/(1+ min 245 | , (max_2_4_type +1)/(1+ min_2_7_type) as max_2_4__smoothdivby__min_2_7_type 246 | , (max_2_4_type +1)/(1+ min_2_13_type) as max_2_4__smoothdivby__min_2_13_type 247 | , (max_2_4_type +1)/(1+ min_2_25_type) as max_2_4__smoothdivby__min_2_25_type 248 | 249 | , (max_2_7_type +1)/(1+ min_2_13_type) as max_2_7__smoothdivby__min_2_13_type 250 | , (max_2_7_type +1)/(1+ min_2_25_type) as max_2_7__smoothdivby__min_2_25_type 251 | 252 | , (max_2_13_type +1)/(1+ min_2_25_type) as max_2_13__smoothdivby__min_2_25_type 253 | 254 | --avg +1)/(1+ min 255 | , (avg_2_4_type +1)/(1+ min_2_7_type) as avg_2_4__smoothdivby__min_2_7_type 256 | , (avg_2_4_type +1)/(1+ min_2_13_type) as avg_2_4__smoothdivby__min_2_13_type 257 | , (avg_2_4_type +1)/(1+ min_2_25_type) as avg_2_4__smoothdivby__min_2_25_type 258 | 259 | , (avg_2_7_type +1)/(1+ min_2_13_type) as avg_2_7__smoothdivby__min_2_13_type 260 | , (avg_2_7_type +1)/(1+ min_2_25_type) as avg_2_7__smoothdivby__min_2_25_type 261 | 262 | , (avg_2_13_type +1)/(1+ min_2_25_type) as avg_2_13__smoothdivby__min_2_25_type 263 | --avg +1)/(1+ avg 264 | , (avg_2_4_type +1)/(1+ avg_2_7_type) as avg_2_4__smoothdivby__avg_2_7_type 265 | , (avg_2_4_type +1)/(1+ avg_2_13_type) as avg_2_4__smoothdivby__avg_2_13_type 266 | , (avg_2_4_type +1)/(1+ avg_2_25_type) as avg_2_4__smoothdivby__avg_2_25_type 267 | 268 | , (avg_2_7_type +1)/(1+ avg_2_13_type) as avg_2_7__smoothdivby__avg_2_13_type 269 | , (avg_2_7_type +1)/(1+ avg_2_25_type) as avg_2_7__smoothdivby__avg_2_25_type 270 | 271 | , (avg_2_13_type +1)/(1+ avg_2_25_type) as avg_2_13__smoothdivby__avg_2_25_type 272 | 273 | --min +1)/(1+ min 274 | , (min_2_4_type +1)/(1+ min_2_7_type) as min_2_4__smoothdivby__min_2_7_type 275 | , (min_2_4_type +1)/(1+ min_2_13_type) as min_2_4__smoothdivby__min_2_13_type 276 | , (min_2_4_type +1)/(1+ min_2_25_type) as min_2_4__smoothdivby__min_2_25_type 277 | 278 | , (min_2_7_type +1)/(1+ min_2_13_type) as min_2_7__smoothdivby__min_2_13_type 279 | , (min_2_7_type +1)/(1+ min_2_25_type) as min_2_7__smoothdivby__min_2_25_type 280 | 281 | , (min_2_13_type +1)/(1+ min_2_25_type) as min_2_13__smoothdivby__min_2_25_type 282 | 283 | --去年的本月销量,与其他这些均值中位数之类的关系 284 | , (c_real_som_type_12 +1)/(1+ median_14_16_type) as c_real_som_type_12__smoothdivby__median_14_16_type 285 | , (c_real_som_type_12 +1)/(1+ median_14_19_type) as c_real_som_type_12__smoothdivby__median_14_19_type 286 | , (c_real_som_type_12 +1)/(1+ median_14_25_type) as c_real_som_type_12__smoothdivby__median_14_25_type 287 | , (c_real_som_type_12 +1)/(1+ median_14_37_type) as c_real_som_type_12__smoothdivby__median_14_37_type 288 | 289 | , (c_real_som_type_12 +1)/(1+ max_14_16_type) as c_real_som_type_12__smoothdivby__max_14_16_type 290 | , (c_real_som_type_12 +1)/(1+ max_14_19_type) as c_real_som_type_12__smoothdivby__max_14_19_type 291 | , (c_real_som_type_12 +1)/(1+ max_14_25_type) as c_real_som_type_12__smoothdivby__max_14_25_type 292 | , (c_real_som_type_12 +1)/(1+ max_14_37_type) as c_real_som_type_12__smoothdivby__max_14_37_type 293 | 294 | , (c_real_som_type_12 +1)/(1+ min_14_16_type) as c_real_som_type_12__smoothdivby__min_14_16_type 295 | , (c_real_som_type_12 +1)/(1+ min_14_19_type) as c_real_som_type_12__smoothdivby__min_14_19_type 296 | , (c_real_som_type_12 +1)/(1+ min_14_25_type) as c_real_som_type_12__smoothdivby__min_14_25_type 297 | , (c_real_som_type_12 +1)/(1+ min_14_37_type) as c_real_som_type_12__smoothdivby__min_14_37_type 298 | 299 | , (c_real_som_type_12 +1)/(1+ avg_14_16_type) as c_real_som_type_12__smoothdivby__avg_14_16_type 300 | , (c_real_som_type_12 +1)/(1+ avg_14_19_type) as c_real_som_type_12__smoothdivby__avg_14_19_type 301 | , (c_real_som_type_12 +1)/(1+ avg_14_25_type) as c_real_som_type_12__smoothdivby__avg_14_25_type 302 | , (c_real_som_type_12 +1)/(1+ avg_14_37_type) as c_real_som_type_12__smoothdivby__avg_14_37_type 303 | 304 | FROM whole_f7_2_step1; -------------------------------------------------------------------------------- /代码/特征工程/7_4_省市新能源交叉.txt: -------------------------------------------------------------------------------- 1 | -- 省市车型,在过去2年/过去一年/过去半年/过去三个月中的销量mean/min/max/median 2 | DROP TABLE IF EXISTS whole_f7_4_step1; 3 | 4 | DROP TABLE IF EXISTS whole_f7_4_step2; 5 | 6 | CREATE TABLE whole_f7_4_step1 7 | AS 8 | -- SELECT province_id, city_id, class_id, sale_date, real_sale 9 | -- , c_real_som_ne_type_2, c_real_som_ne_type_12, c_real_ssm_ne_type_2_3, c_real_ssm_ne_type_2_7, c_real_ssm_ne_type_2_13 10 | -- , c_real_ssm_ne_type_2_25 -- 近X个月的median/max/min/avg 11 | select * 12 | -- median 13 | , ordinal(2, c_real_som_ne_type_2, c_real_som_ne_type_3, c_real_som_ne_type_4) AS median_2_4_ne_type 14 | , (ordinal(3, c_real_som_ne_type_2, c_real_som_ne_type_3, c_real_som_ne_type_4, c_real_som_ne_type_5, c_real_som_ne_type_6, c_real_som_ne_type_7) + ordinal(4, c_real_som_ne_type_2, c_real_som_ne_type_3, c_real_som_ne_type_4, c_real_som_ne_type_5, c_real_som_ne_type_6, c_real_som_ne_type_7)) / 2 AS median_2_7_ne_type 15 | , (ordinal(6, c_real_som_ne_type_2, c_real_som_ne_type_3, c_real_som_ne_type_4, c_real_som_ne_type_5, c_real_som_ne_type_6, c_real_som_ne_type_7, c_real_som_ne_type_8, c_real_som_ne_type_9, c_real_som_ne_type_10, c_real_som_ne_type_11, c_real_som_ne_type_12, c_real_som_ne_type_13) + ordinal(7, c_real_som_ne_type_2, c_real_som_ne_type_3, c_real_som_ne_type_4, c_real_som_ne_type_5, c_real_som_ne_type_6, c_real_som_ne_type_7, c_real_som_ne_type_8, c_real_som_ne_type_9, c_real_som_ne_type_10, c_real_som_ne_type_11, c_real_som_ne_type_12, c_real_som_ne_type_13)) / 2 AS median_2_13_ne_type 16 | , (ordinal(12, c_real_som_ne_type_2, c_real_som_ne_type_3, c_real_som_ne_type_4, c_real_som_ne_type_5, c_real_som_ne_type_6, c_real_som_ne_type_7, c_real_som_ne_type_8, c_real_som_ne_type_9, c_real_som_ne_type_10, c_real_som_ne_type_11, c_real_som_ne_type_12, c_real_som_ne_type_13, c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16, c_real_som_ne_type_17, c_real_som_ne_type_18, c_real_som_ne_type_19, c_real_som_ne_type_20, c_real_som_ne_type_21, c_real_som_ne_type_22, c_real_som_ne_type_23, c_real_som_ne_type_24, c_real_som_ne_type_25) + ordinal(13, c_real_som_ne_type_2, c_real_som_ne_type_3, c_real_som_ne_type_4, c_real_som_ne_type_5, c_real_som_ne_type_6, c_real_som_ne_type_7, c_real_som_ne_type_8, c_real_som_ne_type_9, c_real_som_ne_type_10, c_real_som_ne_type_11, c_real_som_ne_type_12, c_real_som_ne_type_13, c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16, c_real_som_ne_type_17, c_real_som_ne_type_18, c_real_som_ne_type_19, c_real_som_ne_type_20, c_real_som_ne_type_21, c_real_som_ne_type_22, c_real_som_ne_type_23, c_real_som_ne_type_24, c_real_som_ne_type_25)) / 2 AS median_2_25_ne_type 17 | -- max 18 | , GREATEST(c_real_som_ne_type_2, c_real_som_ne_type_3) AS max_2_3_ne_type, GREATEST(c_real_som_ne_type_2, c_real_som_ne_type_3, c_real_som_ne_type_4) AS max_2_4_ne_type 19 | , GREATEST(c_real_som_ne_type_2, c_real_som_ne_type_3, c_real_som_ne_type_4, c_real_som_ne_type_5, c_real_som_ne_type_6, c_real_som_ne_type_7) AS max_2_7_ne_type 20 | , GREATEST(c_real_som_ne_type_2, c_real_som_ne_type_3, c_real_som_ne_type_4, c_real_som_ne_type_5, c_real_som_ne_type_6, c_real_som_ne_type_7, c_real_som_ne_type_8, c_real_som_ne_type_9, c_real_som_ne_type_10, c_real_som_ne_type_11, c_real_som_ne_type_12, c_real_som_ne_type_13) AS max_2_13_ne_type 21 | , greatest(c_real_som_ne_type_2, c_real_som_ne_type_3, c_real_som_ne_type_4, c_real_som_ne_type_5, c_real_som_ne_type_6, c_real_som_ne_type_7, c_real_som_ne_type_8, c_real_som_ne_type_9, c_real_som_ne_type_10, c_real_som_ne_type_11, c_real_som_ne_type_12, c_real_som_ne_type_13, c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16, c_real_som_ne_type_17, c_real_som_ne_type_18, c_real_som_ne_type_19, c_real_som_ne_type_20, c_real_som_ne_type_21, c_real_som_ne_type_22, c_real_som_ne_type_23, c_real_som_ne_type_24, c_real_som_ne_type_25) as max_2_25_ne_type 22 | -- min 23 | , least(c_real_som_ne_type_2, c_real_som_ne_type_3) AS min_2_3_ne_type, least(c_real_som_ne_type_2, c_real_som_ne_type_3, c_real_som_ne_type_4) AS min_2_4_ne_type 24 | , least(c_real_som_ne_type_2, c_real_som_ne_type_3, c_real_som_ne_type_4, c_real_som_ne_type_5, c_real_som_ne_type_6, c_real_som_ne_type_7) AS min_2_7_ne_type 25 | , least(c_real_som_ne_type_2, c_real_som_ne_type_3, c_real_som_ne_type_4, c_real_som_ne_type_5, c_real_som_ne_type_6, c_real_som_ne_type_7, c_real_som_ne_type_8, c_real_som_ne_type_9, c_real_som_ne_type_10, c_real_som_ne_type_11, c_real_som_ne_type_12, c_real_som_ne_type_13) AS min_2_13_ne_type 26 | , least(c_real_som_ne_type_2, c_real_som_ne_type_3, c_real_som_ne_type_4, c_real_som_ne_type_5, c_real_som_ne_type_6, c_real_som_ne_type_7, c_real_som_ne_type_8, c_real_som_ne_type_9, c_real_som_ne_type_10, c_real_som_ne_type_11, c_real_som_ne_type_12, c_real_som_ne_type_13, c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16, c_real_som_ne_type_17, c_real_som_ne_type_18, c_real_som_ne_type_19, c_real_som_ne_type_20, c_real_som_ne_type_21, c_real_som_ne_type_22, c_real_som_ne_type_23, c_real_som_ne_type_24, c_real_som_ne_type_25) as min_2_25_ne_type 27 | -- avg 28 | , (c_real_som_ne_type_2 + c_real_som_ne_type_3)/2 AS avg_2_3_ne_type, (c_real_som_ne_type_2 + c_real_som_ne_type_3 + c_real_som_ne_type_4)/3 AS avg_2_4_ne_type 29 | , (c_real_som_ne_type_2 + c_real_som_ne_type_3 + c_real_som_ne_type_4 + c_real_som_ne_type_5 + c_real_som_ne_type_6 + c_real_som_ne_type_7)/6 AS avg_2_7_ne_type 30 | , (c_real_som_ne_type_2 + c_real_som_ne_type_3 + c_real_som_ne_type_4 + c_real_som_ne_type_5 + c_real_som_ne_type_6 + c_real_som_ne_type_7 + c_real_som_ne_type_8 + c_real_som_ne_type_9 + c_real_som_ne_type_10 + c_real_som_ne_type_11 + c_real_som_ne_type_12 + c_real_som_ne_type_13)/12 AS avg_2_13_ne_type 31 | , (c_real_som_ne_type_2 + c_real_som_ne_type_3 + c_real_som_ne_type_4 + c_real_som_ne_type_5 + c_real_som_ne_type_6 + c_real_som_ne_type_7 + c_real_som_ne_type_8 + c_real_som_ne_type_9 + c_real_som_ne_type_10 + c_real_som_ne_type_11 + c_real_som_ne_type_12 + c_real_som_ne_type_13 + c_real_som_ne_type_14 + c_real_som_ne_type_15 + c_real_som_ne_type_16 + c_real_som_ne_type_17 + c_real_som_ne_type_18 + c_real_som_ne_type_19 + c_real_som_ne_type_20 + c_real_som_ne_type_21 + c_real_som_ne_type_22 + c_real_som_ne_type_23 + c_real_som_ne_type_24 + c_real_som_ne_type_25)/24 as avg_2_25_ne_type 32 | 33 | --上一年的 34 | -- median 35 | , ordinal(2, c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16) AS median_14_16_ne_type 36 | , (ordinal(3, c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16, c_real_som_ne_type_17, c_real_som_ne_type_18, c_real_som_ne_type_19) + ordinal(4, c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16, c_real_som_ne_type_17, c_real_som_ne_type_18, c_real_som_ne_type_19)) / 2 AS median_14_19_ne_type 37 | , (ordinal(6, c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16, c_real_som_ne_type_17, c_real_som_ne_type_18, c_real_som_ne_type_19, c_real_som_ne_type_20, c_real_som_ne_type_21, c_real_som_ne_type_22, c_real_som_ne_type_23, c_real_som_ne_type_24, c_real_som_ne_type_25) + ordinal(7, c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16, c_real_som_ne_type_17, c_real_som_ne_type_18, c_real_som_ne_type_19, c_real_som_ne_type_20, c_real_som_ne_type_21, c_real_som_ne_type_22, c_real_som_ne_type_23, c_real_som_ne_type_24, c_real_som_ne_type_25)) / 2 AS median_14_25_ne_type 38 | , (ordinal(12, c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16, c_real_som_ne_type_17, c_real_som_ne_type_18, c_real_som_ne_type_19, c_real_som_ne_type_20, c_real_som_ne_type_21, c_real_som_ne_type_22, c_real_som_ne_type_23, c_real_som_ne_type_24, c_real_som_ne_type_25, c_real_som_ne_type_26, c_real_som_ne_type_27, c_real_som_ne_type_28, c_real_som_ne_type_29, c_real_som_ne_type_30, c_real_som_ne_type_31, c_real_som_ne_type_32, c_real_som_ne_type_33, c_real_som_ne_type_34, c_real_som_ne_type_35, c_real_som_ne_type_36, c_real_som_ne_type_37) + ordinal(13, c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16, c_real_som_ne_type_17, c_real_som_ne_type_18, c_real_som_ne_type_19, c_real_som_ne_type_20, c_real_som_ne_type_21, c_real_som_ne_type_22, c_real_som_ne_type_23, c_real_som_ne_type_24, c_real_som_ne_type_25, c_real_som_ne_type_26, c_real_som_ne_type_27, c_real_som_ne_type_28, c_real_som_ne_type_29, c_real_som_ne_type_30, c_real_som_ne_type_31, c_real_som_ne_type_32, c_real_som_ne_type_33, c_real_som_ne_type_34, c_real_som_ne_type_35, c_real_som_ne_type_36, c_real_som_ne_type_37)) / 2 AS median_14_37_ne_type 39 | -- max 40 | , GREATEST(c_real_som_ne_type_14, c_real_som_ne_type_15) AS max_14_15_ne_type, GREATEST(c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16) AS max_14_16_ne_type 41 | , GREATEST(c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16, c_real_som_ne_type_17, c_real_som_ne_type_18, c_real_som_ne_type_19) AS max_14_19_ne_type 42 | , GREATEST(c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16, c_real_som_ne_type_17, c_real_som_ne_type_18, c_real_som_ne_type_19, c_real_som_ne_type_20, c_real_som_ne_type_21, c_real_som_ne_type_22, c_real_som_ne_type_23, c_real_som_ne_type_24, c_real_som_ne_type_25) AS max_14_25_ne_type 43 | , greatest(c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16, c_real_som_ne_type_17, c_real_som_ne_type_18, c_real_som_ne_type_19, c_real_som_ne_type_20, c_real_som_ne_type_21, c_real_som_ne_type_22, c_real_som_ne_type_23, c_real_som_ne_type_24, c_real_som_ne_type_25, c_real_som_ne_type_26, c_real_som_ne_type_27, c_real_som_ne_type_28, c_real_som_ne_type_29, c_real_som_ne_type_30, c_real_som_ne_type_31, c_real_som_ne_type_32, c_real_som_ne_type_33, c_real_som_ne_type_34, c_real_som_ne_type_35, c_real_som_ne_type_36, c_real_som_ne_type_37) as max_14_37_ne_type 44 | -- min 45 | , least(c_real_som_ne_type_14, c_real_som_ne_type_15) AS min_14_15_ne_type, least(c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16) AS min_14_16_ne_type 46 | , least(c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16, c_real_som_ne_type_17, c_real_som_ne_type_18, c_real_som_ne_type_19) AS min_14_19_ne_type 47 | , least(c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16, c_real_som_ne_type_17, c_real_som_ne_type_18, c_real_som_ne_type_19, c_real_som_ne_type_20, c_real_som_ne_type_21, c_real_som_ne_type_22, c_real_som_ne_type_23, c_real_som_ne_type_24, c_real_som_ne_type_25) AS min_14_25_ne_type 48 | , least(c_real_som_ne_type_14, c_real_som_ne_type_15, c_real_som_ne_type_16, c_real_som_ne_type_17, c_real_som_ne_type_18, c_real_som_ne_type_19, c_real_som_ne_type_20, c_real_som_ne_type_21, c_real_som_ne_type_22, c_real_som_ne_type_23, c_real_som_ne_type_24, c_real_som_ne_type_25, c_real_som_ne_type_26, c_real_som_ne_type_27, c_real_som_ne_type_28, c_real_som_ne_type_29, c_real_som_ne_type_30, c_real_som_ne_type_31, c_real_som_ne_type_32, c_real_som_ne_type_33, c_real_som_ne_type_34, c_real_som_ne_type_35, c_real_som_ne_type_36, c_real_som_ne_type_37) as min_14_37_ne_type 49 | -- avg 50 | , (c_real_som_ne_type_14 + c_real_som_ne_type_15)/2 AS avg_14_15_ne_type, (c_real_som_ne_type_14 + c_real_som_ne_type_15 + c_real_som_ne_type_16)/3 AS avg_14_16_ne_type 51 | , (c_real_som_ne_type_14 + c_real_som_ne_type_15 + c_real_som_ne_type_16 + c_real_som_ne_type_17 + c_real_som_ne_type_18 + c_real_som_ne_type_19)/6 AS avg_14_19_ne_type 52 | , (c_real_som_ne_type_14 + c_real_som_ne_type_15 + c_real_som_ne_type_16 + c_real_som_ne_type_17 + c_real_som_ne_type_18 + c_real_som_ne_type_19 + c_real_som_ne_type_20 + c_real_som_ne_type_21 + c_real_som_ne_type_22 + c_real_som_ne_type_23 + c_real_som_ne_type_24 + c_real_som_ne_type_25)/12 AS avg_14_25_ne_type 53 | , (c_real_som_ne_type_14 + c_real_som_ne_type_15 + c_real_som_ne_type_16 + c_real_som_ne_type_17 + c_real_som_ne_type_18 + c_real_som_ne_type_19 + c_real_som_ne_type_20 + c_real_som_ne_type_21 + c_real_som_ne_type_22 + c_real_som_ne_type_23 + c_real_som_ne_type_24 + c_real_som_ne_type_25 + c_real_som_ne_type_26 + c_real_som_ne_type_27 + c_real_som_ne_type_28 + c_real_som_ne_type_29 + c_real_som_ne_type_30 + c_real_som_ne_type_31 + c_real_som_ne_type_32 + c_real_som_ne_type_33 + c_real_som_ne_type_34 + c_real_som_ne_type_35 + c_real_som_ne_type_36 + c_real_som_ne_type_37)/24 as avg_14_37_ne_type 54 | 55 | FROM dr_c_real_ssm_ne_type_2_36; 56 | 57 | -- 【】 58 | -- 去年本月的上述特征 59 | -- 上个月销量,比上,近三/四个月的均值/近半年的均值/近一年均值/近2年均值 60 | -- 均值换成中位数/最大值/最小值 61 | -- 各时段的最大值-最小值 62 | -- ------------- 63 | -- 去年这个月销量,比上,12~14/~15/~17/~23的mean\min\max\median 64 | CREATE TABLE whole_f7_4_step2 65 | AS 66 | SELECT * 67 | --今年的各种做差 68 | --X个月的median/max/min/avg - Y个月的 69 | --median - median 70 | , median_2_4_ne_type - median_2_7_ne_type as median_2_4__sub__median_2_7_ne_type 71 | , median_2_4_ne_type - median_2_13_ne_type as median_2_4__sub__median_2_13_ne_type 72 | , median_2_4_ne_type - median_2_25_ne_type as median_2_4__sub__median_2_25_ne_type 73 | 74 | , median_2_7_ne_type - median_2_13_ne_type as median_2_7__sub__median_2_13_ne_type 75 | , median_2_7_ne_type - median_2_25_ne_type as median_2_7__sub__median_2_25_ne_type 76 | 77 | , median_2_13_ne_type - median_2_25_ne_type as median_2_13__sub__median_2_25_ne_type 78 | 79 | --median - min 80 | , median_2_4_ne_type - min_2_7_ne_type as median_2_4__sub__min_2_7_ne_type 81 | , median_2_4_ne_type - min_2_13_ne_type as median_2_4__sub__min_2_13_ne_type 82 | , median_2_4_ne_type - min_2_25_ne_type as median_2_4__sub__min_2_25_ne_type 83 | 84 | , median_2_7_ne_type - min_2_13_ne_type as median_2_7__sub__min_2_13_ne_type 85 | , median_2_7_ne_type - min_2_25_ne_type as median_2_7__sub__min_2_25_ne_type 86 | 87 | , median_2_13_ne_type - min_2_25_ne_type as median_2_13__sub__min_2_25_ne_type 88 | 89 | --median - avg 90 | , median_2_4_ne_type - avg_2_7_ne_type as median_2_4__sub__avg_2_7_ne_type 91 | , median_2_4_ne_type - avg_2_13_ne_type as median_2_4__sub__avg_2_13_ne_type 92 | , median_2_4_ne_type - avg_2_25_ne_type as median_2_4__sub__avg_2_25_ne_type 93 | 94 | , median_2_7_ne_type - avg_2_13_ne_type as median_2_7__sub__avg_2_13_ne_type 95 | , median_2_7_ne_type - avg_2_25_ne_type as median_2_7__sub__avg_2_25_ne_type 96 | 97 | , median_2_13_ne_type - avg_2_25_ne_type as median_2_13__sub__avg_2_25_ne_type 98 | 99 | --max - max 100 | , max_2_4_ne_type - max_2_7_ne_type as max_2_4__sub__max_2_7_ne_type 101 | , max_2_4_ne_type - max_2_13_ne_type as max_2_4__sub__max_2_13_ne_type 102 | , max_2_4_ne_type - max_2_25_ne_type as max_2_4__sub__max_2_25_ne_type 103 | 104 | , max_2_7_ne_type - max_2_13_ne_type as max_2_7__sub__max_2_13_ne_type 105 | , max_2_7_ne_type - max_2_25_ne_type as max_2_7__sub__max_2_25_ne_type 106 | 107 | , max_2_13_ne_type - max_2_25_ne_type as max_2_13__sub__max_2_25_ne_type 108 | --max - avg 109 | , max_2_4_ne_type - avg_2_7_ne_type as max_2_4__sub__avg_2_7_ne_type 110 | , max_2_4_ne_type - avg_2_13_ne_type as max_2_4__sub__avg_2_13_ne_type 111 | , max_2_4_ne_type - avg_2_25_ne_type as max_2_4__sub__avg_2_25_ne_type 112 | 113 | , max_2_7_ne_type - avg_2_13_ne_type as max_2_7__sub__avg_2_13_ne_type 114 | , max_2_7_ne_type - avg_2_25_ne_type as max_2_7__sub__avg_2_25_ne_type 115 | 116 | , max_2_13_ne_type - avg_2_25_ne_type as max_2_13__sub__avg_2_25_ne_type 117 | --max - median 118 | , max_2_4_ne_type - median_2_7_ne_type as max_2_4__sub__median_2_7_ne_type 119 | , max_2_4_ne_type - median_2_13_ne_type as max_2_4__sub__median_2_13_ne_type 120 | , max_2_4_ne_type - median_2_25_ne_type as max_2_4__sub__median_2_25_ne_type 121 | 122 | , max_2_7_ne_type - median_2_13_ne_type as max_2_7__sub__median_2_13_ne_type 123 | , max_2_7_ne_type - median_2_25_ne_type as max_2_7__sub__median_2_25_ne_type 124 | 125 | , max_2_13_ne_type - median_2_25_ne_type as max_2_13__sub__median_2_25_ne_type 126 | --max - min 127 | , max_2_4_ne_type - min_2_7_ne_type as max_2_4__sub__min_2_7_ne_type 128 | , max_2_4_ne_type - min_2_13_ne_type as max_2_4__sub__min_2_13_ne_type 129 | , max_2_4_ne_type - min_2_25_ne_type as max_2_4__sub__min_2_25_ne_type 130 | 131 | , max_2_7_ne_type - min_2_13_ne_type as max_2_7__sub__min_2_13_ne_type 132 | , max_2_7_ne_type - min_2_25_ne_type as max_2_7__sub__min_2_25_ne_type 133 | 134 | , max_2_13_ne_type - min_2_25_ne_type as max_2_13__sub__min_2_25_ne_type 135 | 136 | --avg - min 137 | , avg_2_4_ne_type - min_2_7_ne_type as avg_2_4__sub__min_2_7_ne_type 138 | , avg_2_4_ne_type - min_2_13_ne_type as avg_2_4__sub__min_2_13_ne_type 139 | , avg_2_4_ne_type - min_2_25_ne_type as avg_2_4__sub__min_2_25_ne_type 140 | 141 | , avg_2_7_ne_type - min_2_13_ne_type as avg_2_7__sub__min_2_13_ne_type 142 | , avg_2_7_ne_type - min_2_25_ne_type as avg_2_7__sub__min_2_25_ne_type 143 | 144 | , avg_2_13_ne_type - min_2_25_ne_type as avg_2_13__sub__min_2_25_ne_type 145 | --avg - avg 146 | , avg_2_4_ne_type - avg_2_7_ne_type as avg_2_4__sub__avg_2_7_ne_type 147 | , avg_2_4_ne_type - avg_2_13_ne_type as avg_2_4__sub__avg_2_13_ne_type 148 | , avg_2_4_ne_type - avg_2_25_ne_type as avg_2_4__sub__avg_2_25_ne_type 149 | 150 | , avg_2_7_ne_type - avg_2_13_ne_type as avg_2_7__sub__avg_2_13_ne_type 151 | , avg_2_7_ne_type - avg_2_25_ne_type as avg_2_7__sub__avg_2_25_ne_type 152 | 153 | , avg_2_13_ne_type - avg_2_25_ne_type as avg_2_13__sub__avg_2_25_ne_type 154 | 155 | --min - min 156 | , min_2_4_ne_type - min_2_7_ne_type as min_2_4__sub__min_2_7_ne_type 157 | , min_2_4_ne_type - min_2_13_ne_type as min_2_4__sub__min_2_13_ne_type 158 | , min_2_4_ne_type - min_2_25_ne_type as min_2_4__sub__min_2_25_ne_type 159 | 160 | , min_2_7_ne_type - min_2_13_ne_type as min_2_7__sub__min_2_13_ne_type 161 | , min_2_7_ne_type - min_2_25_ne_type as min_2_7__sub__min_2_25_ne_type 162 | 163 | , min_2_13_ne_type - min_2_25_ne_type as min_2_13__sub__min_2_25_ne_type 164 | 165 | --去年的本月销量,与其他这些均值中位数之类的关系 166 | , c_real_som_ne_type_12 - median_14_16_ne_type as c_real_som_ne_type_12__sub__median_14_16_ne_type 167 | , c_real_som_ne_type_12 - median_14_19_ne_type as c_real_som_ne_type_12__sub__median_14_19_ne_type 168 | , c_real_som_ne_type_12 - median_14_25_ne_type as c_real_som_ne_type_12__sub__median_14_25_ne_type 169 | , c_real_som_ne_type_12 - median_14_37_ne_type as c_real_som_ne_type_12__sub__median_14_37_ne_type 170 | 171 | , c_real_som_ne_type_12 - max_14_16_ne_type as c_real_som_ne_type_12__sub__max_14_16_ne_type 172 | , c_real_som_ne_type_12 - max_14_19_ne_type as c_real_som_ne_type_12__sub__max_14_19_ne_type 173 | , c_real_som_ne_type_12 - max_14_25_ne_type as c_real_som_ne_type_12__sub__max_14_25_ne_type 174 | , c_real_som_ne_type_12 - max_14_37_ne_type as c_real_som_ne_type_12__sub__max_14_37_ne_type 175 | 176 | , c_real_som_ne_type_12 - min_14_16_ne_type as c_real_som_ne_type_12__sub__min_14_16_ne_type 177 | , c_real_som_ne_type_12 - min_14_19_ne_type as c_real_som_ne_type_12__sub__min_14_19_ne_type 178 | , c_real_som_ne_type_12 - min_14_25_ne_type as c_real_som_ne_type_12__sub__min_14_25_ne_type 179 | , c_real_som_ne_type_12 - min_14_37_ne_type as c_real_som_ne_type_12__sub__min_14_37_ne_type 180 | 181 | , c_real_som_ne_type_12 - avg_14_16_ne_type as c_real_som_ne_type_12__sub__avg_14_16_ne_type 182 | , c_real_som_ne_type_12 - avg_14_19_ne_type as c_real_som_ne_type_12__sub__avg_14_19_ne_type 183 | , c_real_som_ne_type_12 - avg_14_25_ne_type as c_real_som_ne_type_12__sub__avg_14_25_ne_type 184 | , c_real_som_ne_type_12 - avg_14_37_ne_type as c_real_som_ne_type_12__sub__avg_14_37_ne_type 185 | 186 | --【平滑除法】 187 | --median +1)/(1+ median 188 | , (median_2_4_ne_type +1)/(1+ median_2_7_ne_type) as median_2_4__smoothdivby__median_2_7_ne_type 189 | , (median_2_4_ne_type +1)/(1+ median_2_13_ne_type) as median_2_4__smoothdivby__median_2_13_ne_type 190 | , (median_2_4_ne_type +1)/(1+ median_2_25_ne_type) as median_2_4__smoothdivby__median_2_25_ne_type 191 | 192 | , (median_2_7_ne_type +1)/(1+ median_2_13_ne_type) as median_2_7__smoothdivby__median_2_13_ne_type 193 | , (median_2_7_ne_type +1)/(1+ median_2_25_ne_type) as median_2_7__smoothdivby__median_2_25_ne_type 194 | 195 | , (median_2_13_ne_type +1)/(1+ median_2_25_ne_type) as median_2_13__smoothdivby__median_2_25_ne_type 196 | 197 | --median +1)/(1+ min 198 | , (median_2_4_ne_type +1)/(1+ min_2_7_ne_type) as median_2_4__smoothdivby__min_2_7_ne_type 199 | , (median_2_4_ne_type +1)/(1+ min_2_13_ne_type) as median_2_4__smoothdivby__min_2_13_ne_type 200 | , (median_2_4_ne_type +1)/(1+ min_2_25_ne_type) as median_2_4__smoothdivby__min_2_25_ne_type 201 | 202 | , (median_2_7_ne_type +1)/(1+ min_2_13_ne_type) as median_2_7__smoothdivby__min_2_13_ne_type 203 | , (median_2_7_ne_type +1)/(1+ min_2_25_ne_type) as median_2_7__smoothdivby__min_2_25_ne_type 204 | 205 | , (median_2_13_ne_type +1)/(1+ min_2_25_ne_type) as median_2_13__smoothdivby__min_2_25_ne_type 206 | 207 | --median +1)/(1+ avg 208 | , (median_2_4_ne_type +1)/(1+ avg_2_7_ne_type) as median_2_4__smoothdivby__avg_2_7_ne_type 209 | , (median_2_4_ne_type +1)/(1+ avg_2_13_ne_type) as median_2_4__smoothdivby__avg_2_13_ne_type 210 | , (median_2_4_ne_type +1)/(1+ avg_2_25_ne_type) as median_2_4__smoothdivby__avg_2_25_ne_type 211 | 212 | , (median_2_7_ne_type +1)/(1+ avg_2_13_ne_type) as median_2_7__smoothdivby__avg_2_13_ne_type 213 | , (median_2_7_ne_type +1)/(1+ avg_2_25_ne_type) as median_2_7__smoothdivby__avg_2_25_ne_type 214 | 215 | , (median_2_13_ne_type +1)/(1+ avg_2_25_ne_type) as median_2_13__smoothdivby__avg_2_25_ne_type 216 | 217 | --max +1)/(1+ max 218 | , (max_2_4_ne_type +1)/(1+ max_2_7_ne_type) as max_2_4__smoothdivby__max_2_7_ne_type 219 | , (max_2_4_ne_type +1)/(1+ max_2_13_ne_type) as max_2_4__smoothdivby__max_2_13_ne_type 220 | , (max_2_4_ne_type +1)/(1+ max_2_25_ne_type) as max_2_4__smoothdivby__max_2_25_ne_type 221 | 222 | , (max_2_7_ne_type +1)/(1+ max_2_13_ne_type) as max_2_7__smoothdivby__max_2_13_ne_type 223 | , (max_2_7_ne_type +1)/(1+ max_2_25_ne_type) as max_2_7__smoothdivby__max_2_25_ne_type 224 | 225 | , (max_2_13_ne_type +1)/(1+ max_2_25_ne_type) as max_2_13__smoothdivby__max_2_25_ne_type 226 | --max +1)/(1+ avg 227 | , (max_2_4_ne_type +1)/(1+ avg_2_7_ne_type) as max_2_4__smoothdivby__avg_2_7_ne_type 228 | , (max_2_4_ne_type +1)/(1+ avg_2_13_ne_type) as max_2_4__smoothdivby__avg_2_13_ne_type 229 | , (max_2_4_ne_type +1)/(1+ avg_2_25_ne_type) as max_2_4__smoothdivby__avg_2_25_ne_type 230 | 231 | , (max_2_7_ne_type +1)/(1+ avg_2_13_ne_type) as max_2_7__smoothdivby__avg_2_13_ne_type 232 | , (max_2_7_ne_type +1)/(1+ avg_2_25_ne_type) as max_2_7__smoothdivby__avg_2_25_ne_type 233 | 234 | , (max_2_13_ne_type +1)/(1+ avg_2_25_ne_type) as max_2_13__smoothdivby__avg_2_25_ne_type 235 | --max +1)/(1+ median 236 | , (max_2_4_ne_type +1)/(1+ median_2_7_ne_type) as max_2_4__smoothdivby__median_2_7_ne_type 237 | , (max_2_4_ne_type +1)/(1+ median_2_13_ne_type) as max_2_4__smoothdivby__median_2_13_ne_type 238 | , (max_2_4_ne_type +1)/(1+ median_2_25_ne_type) as max_2_4__smoothdivby__median_2_25_ne_type 239 | 240 | , (max_2_7_ne_type +1)/(1+ median_2_13_ne_type) as max_2_7__smoothdivby__median_2_13_ne_type 241 | , (max_2_7_ne_type +1)/(1+ median_2_25_ne_type) as max_2_7__smoothdivby__median_2_25_ne_type 242 | 243 | , (max_2_13_ne_type +1)/(1+ median_2_25_ne_type) as max_2_13__smoothdivby__median_2_25_ne_type 244 | --max +1)/(1+ min 245 | , (max_2_4_ne_type +1)/(1+ min_2_7_ne_type) as max_2_4__smoothdivby__min_2_7_ne_type 246 | , (max_2_4_ne_type +1)/(1+ min_2_13_ne_type) as max_2_4__smoothdivby__min_2_13_ne_type 247 | , (max_2_4_ne_type +1)/(1+ min_2_25_ne_type) as max_2_4__smoothdivby__min_2_25_ne_type 248 | 249 | , (max_2_7_ne_type +1)/(1+ min_2_13_ne_type) as max_2_7__smoothdivby__min_2_13_ne_type 250 | , (max_2_7_ne_type +1)/(1+ min_2_25_ne_type) as max_2_7__smoothdivby__min_2_25_ne_type 251 | 252 | , (max_2_13_ne_type +1)/(1+ min_2_25_ne_type) as max_2_13__smoothdivby__min_2_25_ne_type 253 | 254 | --avg +1)/(1+ min 255 | , (avg_2_4_ne_type +1)/(1+ min_2_7_ne_type) as avg_2_4__smoothdivby__min_2_7_ne_type 256 | , (avg_2_4_ne_type +1)/(1+ min_2_13_ne_type) as avg_2_4__smoothdivby__min_2_13_ne_type 257 | , (avg_2_4_ne_type +1)/(1+ min_2_25_ne_type) as avg_2_4__smoothdivby__min_2_25_ne_type 258 | 259 | , (avg_2_7_ne_type +1)/(1+ min_2_13_ne_type) as avg_2_7__smoothdivby__min_2_13_ne_type 260 | , (avg_2_7_ne_type +1)/(1+ min_2_25_ne_type) as avg_2_7__smoothdivby__min_2_25_ne_type 261 | 262 | , (avg_2_13_ne_type +1)/(1+ min_2_25_ne_type) as avg_2_13__smoothdivby__min_2_25_ne_type 263 | --avg +1)/(1+ avg 264 | , (avg_2_4_ne_type +1)/(1+ avg_2_7_ne_type) as avg_2_4__smoothdivby__avg_2_7_ne_type 265 | , (avg_2_4_ne_type +1)/(1+ avg_2_13_ne_type) as avg_2_4__smoothdivby__avg_2_13_ne_type 266 | , (avg_2_4_ne_type +1)/(1+ avg_2_25_ne_type) as avg_2_4__smoothdivby__avg_2_25_ne_type 267 | 268 | , (avg_2_7_ne_type +1)/(1+ avg_2_13_ne_type) as avg_2_7__smoothdivby__avg_2_13_ne_type 269 | , (avg_2_7_ne_type +1)/(1+ avg_2_25_ne_type) as avg_2_7__smoothdivby__avg_2_25_ne_type 270 | 271 | , (avg_2_13_ne_type +1)/(1+ avg_2_25_ne_type) as avg_2_13__smoothdivby__avg_2_25_ne_type 272 | 273 | --min +1)/(1+ min 274 | , (min_2_4_ne_type +1)/(1+ min_2_7_ne_type) as min_2_4__smoothdivby__min_2_7_ne_type 275 | , (min_2_4_ne_type +1)/(1+ min_2_13_ne_type) as min_2_4__smoothdivby__min_2_13_ne_type 276 | , (min_2_4_ne_type +1)/(1+ min_2_25_ne_type) as min_2_4__smoothdivby__min_2_25_ne_type 277 | 278 | , (min_2_7_ne_type +1)/(1+ min_2_13_ne_type) as min_2_7__smoothdivby__min_2_13_ne_type 279 | , (min_2_7_ne_type +1)/(1+ min_2_25_ne_type) as min_2_7__smoothdivby__min_2_25_ne_type 280 | 281 | , (min_2_13_ne_type +1)/(1+ min_2_25_ne_type) as min_2_13__smoothdivby__min_2_25_ne_type 282 | 283 | --去年的本月销量,与其他这些均值中位数之类的关系 284 | , (c_real_som_ne_type_12 +1)/(1+ median_14_16_ne_type) as c_real_som_ne_type_12__smoothdivby__median_14_16_ne_type 285 | , (c_real_som_ne_type_12 +1)/(1+ median_14_19_ne_type) as c_real_som_ne_type_12__smoothdivby__median_14_19_ne_type 286 | , (c_real_som_ne_type_12 +1)/(1+ median_14_25_ne_type) as c_real_som_ne_type_12__smoothdivby__median_14_25_ne_type 287 | , (c_real_som_ne_type_12 +1)/(1+ median_14_37_ne_type) as c_real_som_ne_type_12__smoothdivby__median_14_37_ne_type 288 | 289 | , (c_real_som_ne_type_12 +1)/(1+ max_14_16_ne_type) as c_real_som_ne_type_12__smoothdivby__max_14_16_ne_type 290 | , (c_real_som_ne_type_12 +1)/(1+ max_14_19_ne_type) as c_real_som_ne_type_12__smoothdivby__max_14_19_ne_type 291 | , (c_real_som_ne_type_12 +1)/(1+ max_14_25_ne_type) as c_real_som_ne_type_12__smoothdivby__max_14_25_ne_type 292 | , (c_real_som_ne_type_12 +1)/(1+ max_14_37_ne_type) as c_real_som_ne_type_12__smoothdivby__max_14_37_ne_type 293 | 294 | , (c_real_som_ne_type_12 +1)/(1+ min_14_16_ne_type) as c_real_som_ne_type_12__smoothdivby__min_14_16_ne_type 295 | , (c_real_som_ne_type_12 +1)/(1+ min_14_19_ne_type) as c_real_som_ne_type_12__smoothdivby__min_14_19_ne_type 296 | , (c_real_som_ne_type_12 +1)/(1+ min_14_25_ne_type) as c_real_som_ne_type_12__smoothdivby__min_14_25_ne_type 297 | , (c_real_som_ne_type_12 +1)/(1+ min_14_37_ne_type) as c_real_som_ne_type_12__smoothdivby__min_14_37_ne_type 298 | 299 | , (c_real_som_ne_type_12 +1)/(1+ avg_14_16_ne_type) as c_real_som_ne_type_12__smoothdivby__avg_14_16_ne_type 300 | , (c_real_som_ne_type_12 +1)/(1+ avg_14_19_ne_type) as c_real_som_ne_type_12__smoothdivby__avg_14_19_ne_type 301 | , (c_real_som_ne_type_12 +1)/(1+ avg_14_25_ne_type) as c_real_som_ne_type_12__smoothdivby__avg_14_25_ne_type 302 | , (c_real_som_ne_type_12 +1)/(1+ avg_14_37_ne_type) as c_real_som_ne_type_12__smoothdivby__avg_14_37_ne_type 303 | 304 | FROM whole_f7_4_step1; -------------------------------------------------------------------------------- /代码/特征工程/8_0_时间之年月.txt: -------------------------------------------------------------------------------- 1 | --最后补上这两列特征吧 2 | drop table if exists whole_f8; 3 | create table whole_f8 as 4 | select *, 5 | datepart(to_date(sale_date,'yyyymm') , 'yyyy') as year, datepart(to_date(sale_date,'yyyymm') , 'mm') as month 6 | from whole_before_f8; -------------------------------------------------------------------------------- /代码/特征工程/选出二月的作为训练集.txt: -------------------------------------------------------------------------------- 1 | drop table if exists dataset0_train_feb; 2 | create table dataset0_train_feb as 3 | select from dataset0_train_f8 4 | where month = 2; -------------------------------------------------------------------------------- /代码/预处理/分车型看乘的倍数.txt: -------------------------------------------------------------------------------- 1 | 2 | 3 | -- 用车型开窗,窗内根据销量从小到大dense排序 4 | create table tmp_dense_rank_sale_in_class as 5 | select class_id, sale_quantity, dense_rank() over (partition by class_id order by sale_quantity asc) as d_rank 6 | from whole_before_split_v2; 7 | 8 | drop table if exists rank_step2; 9 | create table rank_step2 as 10 | select distinct class_id, sale_quantity, d_rank from 11 | tmp_dense_rank_sale_in_class 12 | where d_rank = 1 or d_rank = 2 or d_rank = 3 13 | ; 14 | 15 | drop table if exists rank_step2_5; 16 | create table rank_step2_5 as 17 | select a.*, b.sale_quantity as rank1_in_window from (select distinct class_id from yc_passenger_car_sales ) a 18 | left outer join ( select * from rank_step2 where d_rank=1) b 19 | on a.class_id = b.class_id; 20 | 21 | drop table if exists rank_step3; 22 | create table rank_step3 as 23 | select a.*, b.sale_quantity as rank2_in_window from rank_step2_5 a 24 | left outer join ( select * from rank_step2 where d_rank=2) b 25 | on a.class_id = b.class_id; 26 | 27 | drop table if exists rank_step4; 28 | create table rank_step4 as 29 | select a.*, b.sale_quantity as rank3_in_window from rank_step3 a 30 | left outer join ( select * from rank_step2 where d_rank=3) b 31 | on a.class_id = b.class_id; 32 | 33 | 34 | -- select * from rank_step4; 35 | 36 | drop table if exists times; 37 | create table times as 38 | select *, 39 | case when rank3_in_window - rank2_in_window = rank2_in_window and rank2_in_window - rank1_in_window = rank1_in_window then rank2_in_window - rank1_in_window 40 | else least(rank3_in_window - rank2_in_window, rank2_in_window - rank1_in_window) end as times 41 | from rank_step4; 42 | 43 | select * from times; 44 | select * from times where times is null; -------------------------------------------------------------------------------- /代码说明.md: -------------------------------------------------------------------------------- 1 | # 代码运行说明 2 | 3 | 代码目录下,*.txt中保存的都是数加平台上运行的SQL语句(和PAI命令)。 4 | 本文顺序即为代码运行顺序。 5 | 流程思路可以参考“解题思路.md”。 6 | 7 | ## 1.预处理 8 | #### 1.1 9 | 代码目录中: `预处理 -> 分车型看乘的倍数.txt` 10 | 11 | ## 2.特征工程 12 | 解题思路中“预处理”章节的一部分内容在本文中归到“特征工程”章节内了。 13 | #### 2.1 14 | 代码目录中: `特征工程 -> 1_先还原成原始销量.txt` 15 | 16 | #### 2.2 17 | 代码目录中: `特征工程 -> 2_分省市车型按时间统计销量.txt` 18 | 19 | #### 2.3.0 20 | 代码目录中: `特征工程 -> 3_0_生成oc特征.txt` 21 | 22 | #### 2.3.1 23 | 代码目录中: `特征工程 -> 3_1_拼上oc特征.txt` 24 | 25 | #### 2.4.1 26 | 代码目录中: `特征工程 -> 4_1_省市车型时间.txt` 27 | 28 | #### 2.4.2 29 | 代码目录中: `特征工程 -> 4_2_ssm_车型.txt` 30 | 31 | #### 2.4.3 32 | 代码目录中: `特征工程 -> 4_3_fdfrfir_车型.txt` 33 | 34 | #### 2.5.0 35 | 代码目录中: `特征工程 -> 5_0_分出轿车suvmpv等类别.txt` 36 | 37 | #### 2.5.1 38 | 代码目录中: `特征工程 -> 5_1_省市车类型时间.txt` 39 | 40 | #### 2.5.1.2 41 | 代码目录中: `特征工程 -> 5_1_2_ssm_车类型.txt` 42 | 43 | #### 2.5.1.3 44 | 代码目录中: `特征工程 -> 5_1_3_fdfrfir_车类型.txt` 45 | 46 | #### 2.5.2.0 47 | 代码目录中: `特征工程 -> 5_2_0_按排量分.txt` 48 | 49 | #### 2.5.2.1 50 | 代码目录中: `特征工程 -> 5_2_1_省市排量时间.txt` 51 | 52 | #### 2.5.2.2 53 | 代码目录中: `特征工程 -> 5_2_2_ssm_排量.txt` 54 | 55 | #### 2.5.2.3 56 | 代码目录中: `特征工程 -> 5_2_3_fdfrfir_排量.txt` 57 | 58 | #### 2.5.3.0 59 | 代码目录中: `特征工程 -> 5_3_0_按新能源类别分.txt` 60 | 61 | #### 2.5.3.1 62 | 代码目录中: `特征工程 -> 5_3_1_省市新能源时间.txt` 63 | 64 | #### 2.5.3.2 65 | 代码目录中: `特征工程 -> 5_3_2_ssm_新能源.txt` 66 | 67 | #### 2.5.3.3 68 | 代码目录中: `特征工程 -> 5_3_3_fdfrfir_新能源.txt` 69 | 70 | 71 | #### 2.6.0.1 72 | 代码目录中: `特征工程 -> 6_0_1_省市时间开窗之车型.txt` 73 | 74 | #### 2.6.0.2 75 | 代码目录中: `特征工程 -> 6_0_2_省市时间开窗之车类型.txt` 76 | 77 | #### 2.6.0.3 78 | 代码目录中: `特征工程 -> 6_0_3_省市时间开窗之排量.txt` 79 | 80 | #### 2.6.0.4 81 | 代码目录中: `特征工程 -> 6_0_4_省市时间开窗之新能源.txt` 82 | 83 | 84 | #### 2.7.1 85 | 代码目录中: `特征工程 -> 7_1_省市车型历史交叉.txt` 86 | 87 | #### 2.7.2 88 | 代码目录中: `特征工程 -> 7_2_省市车类型交叉.txt` 89 | 90 | #### 2.7.3 91 | 代码目录中: `特征工程 -> 7_3_省市排量交叉.txt` 92 | 93 | #### 2.7.4 94 | 代码目录中: `特征工程 -> 7_4_省市新能源交叉.txt` 95 | 96 | 97 | 98 | #### 2.8.0 99 | join各个特征:在PAI平台完成。 100 | 实验目录及实验名为:`for_Feb_prediction` -> `JOIN_before_f8` 101 | 截图所在目录:`截图` -> `JOIN_before_f8` 102 | ![](https://note.youdao.com/yws/public/resource/b1fac6368ce6469bfed546a0b1e310d3/xmlnote/95180E114C5248B2A26C6B2E62A02549/5534) 103 | 104 | 105 | 106 | #### 2.8.0.1 107 | 代码目录中: `特征工程 -> 8_0_时间之年月.txt` 108 | 109 | 110 | ## 3.数据集拆分及训练集构建 111 | ####3.1 拆分 112 | 在PAI平台完成。 113 | 实验目录及实验名为:`for_Feb_prediction` -> `5份数据集拆分_都是跳一个月` 114 | 截图所在目录:`截图` -> `5份数据集拆分.png` 115 | ![](https://note.youdao.com/yws/public/resource/b1fac6368ce6469bfed546a0b1e310d3/xmlnote/139EB8BDA7AD44428CC5F4D5A6551062/5541) 116 | 117 | 事实上,只运行最左边那一块(拆分出 dataset0_train_f8 和 dataset0_test_f8 的)即可。 118 | 119 | #### 3.2 训练集构建 120 | 所谓训练集构建,即为了最终训练的XGB之一,使用近三年的sale_date为2月的记录作为训练集。因此要从dataset0_train_f8中选出二月的。 121 | 122 | 代码目录中: `特征工程 -> 选出二月的作为训练集.txt` 123 | 124 | ## 4.模型训练 125 | **说明**: 126 | 1. 【这里有bug】平台上总是无法保存完整的下述PAI命令,运行这里时**需要从代码目录中的相应文件中,手动将代码和参数复制过去**。 127 | 2. 最终训练了两个XGB,一个用了完整的近三年数据训练,另一个仅用了近三年中2月的数据训练。二者代码一样,参数不同,详见下述。 128 | 129 | ### 运行流程 130 | 1. 运行下述第一个模型(其中包括了训练和预测); 131 | 2. 运行PAI实验:`for_Feb_prediction` -> `XGB结果处理` 132 | 其中右下角的写数据表的输出表名应指定为:`final_result_xgb_skip_jan`; 133 | 134 | 截图所在目录:`截图` -> `XGB结果处理.png` 135 | ![](https://note.youdao.com/yws/public/resource/b1fac6368ce6469bfed546a0b1e310d3/xmlnote/6D723DD0104A48D697EE4A0F6A0EB0BC/5543) 136 | 137 | 3. 运行第二个模型; 138 | 4. 运行PAI实验:`for_Feb_prediction` -> `XGB结果处理` 139 | 其中右下角的写数据表的输出表名应指定为:`xgb_trained_with_febs`; 140 | 141 | 142 | ### 模型代码: 143 | 代码目录中: `模型训练 -> XGB.txt` 144 | 145 | ### 模型1参数: 146 | eta=0.006 147 | round=145 148 | train_X=dataset0_train_f8 149 | test_X=dataset0_test_f8 150 | features 见代码目录内的 `模型训练 -> 输入特征.txt` 151 | 152 | ### 模型2参数: 153 | train_X=dataset0_train_feb 154 | round=140 155 | 其他参数同模型1。 156 | 157 | 158 | ## 5.结果融合 159 | 运行PAI实验:`for_Feb_prediction` -> `融合`。 160 | 截图所在目录:`截图` -> `融合.png` 161 | ![](https://note.youdao.com/yws/public/resource/b1fac6368ce6469bfed546a0b1e310d3/xmlnote/947FB6A3EB1C402C9E4C2FBCCE86436C/5545) 162 | 163 | 164 | 165 | 最终结果表为`yc_result_submit_b`。 166 | -------------------------------------------------------------------------------- /初赛相关/src/feature_extraction/xiao_utils.py: -------------------------------------------------------------------------------- 1 | from pandas.tseries.offsets import * 2 | 3 | # 辅助函数们 4 | def months_among(cell, start, end): 5 | if(cell >= start and cell < end): 6 | return True 7 | return False 8 | 9 | def f(group, deltaToStartYearMonth, deltaToEndYearMonth, col_name='sale_quantity'): 10 | """ 11 | Args: 12 | group: 该函数apply到的group(在原20157条的数据集上按class_id分的组) 13 | deltaToStartYearMonth: 从当前date的月到目标范围的 start 月,之间差了几个月 14 | deltaToEndYearMonth: 从当前date的月到目标范围的 end 月,之间差了几个月 15 | Return: 16 | 目标时间范围内该分组的销量总和 17 | """ 18 | return group['sale_date'].apply(f1, args=(group, deltaToStartYearMonth, deltaToEndYearMonth, col_name)) 19 | 20 | def f1(date, group, deltaToStartYearMonth, deltaToEndYearMonth, col_name='sale_quantity'): 21 | """ 22 | Args: 23 | date: 该函数apply到的项,即当前记录的sale_date项 24 | group: 当前记录所属的group 25 | deltaToStartYearMonth: 从当前date的月到目标范围的 start 月,之间差了几个月 26 | deltaToEndYearMonth: 从当前date的月到目标范围的 end 月,之间差了几个月 27 | Return: 28 | 目标时间范围内该分组的销量总和 29 | """ 30 | start = date + DateOffset(months=deltaToStartYearMonth) 31 | end = date + DateOffset(months=deltaToEndYearMonth) 32 | return group[group['sale_date'].apply(months_among, args=(start, end))][col_name].sum() 33 | 34 | 35 | def g(group, deltaToStartYearMonth, deltaToEndYearMonth): 36 | """ 37 | Args: 38 | group: 该函数apply到的group(在原20157条的数据集上按class_id分的组) 39 | deltaToStartYearMonth: 从当前date的月到目标范围的 start 月,之间差了几个月 40 | deltaToEndYearMonth: 从当前date的月到目标范围的 end 月,之间差了几个月 41 | Return: 42 | 目标时间范围内该分组的销量总和 43 | """ 44 | return group['sale_date'].apply(f1, args=(group, deltaToStartYearMonth, deltaToEndYearMonth)) 45 | 46 | def g1(date, group, deltaToStartYearMonth, deltaToEndYearMonth): 47 | """ 48 | Args: 49 | date: 该函数apply到的项,即当前记录的sale_date项 50 | group: 当前记录所属的group 51 | deltaToStartYearMonth: 从当前date的月到目标范围的 start 月,之间差了几个月 52 | deltaToEndYearMonth: 从当前date的月到目标范围的 end 月,之间差了几个月 53 | Return: 54 | 目标时间范围内该分组的销量总和 55 | """ 56 | start = date + DateOffset(months=deltaToStartYearMonth) 57 | end = date + DateOffset(months=deltaToEndYearMonth) 58 | return group[group['sale_date'].apply(months_among, args=(start, end))]['C_rcm_0'].sum() 59 | 60 | 61 | -------------------------------------------------------------------------------- /初赛相关/src/feature_extraction/时间特征.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 11, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import pandas as pd\n", 10 | "import numpy as np\n", 11 | "from pandas.tseries.offsets import *\n", 12 | "# from xiao_utils import months_among, f, f1\n", 13 | "from xiao_utils import f" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 12, 19 | "metadata": {}, 20 | "outputs": [ 21 | { 22 | "name": "stdout", 23 | "output_type": "stream", 24 | "text": [ 25 | "\n", 26 | "Int64Index: 20297 entries, 0 to 139\n", 27 | "Data columns (total 32 columns):\n", 28 | "TR 20157 non-null object\n", 29 | "brand_id 20297 non-null int64\n", 30 | "car_height 20157 non-null float64\n", 31 | "car_length 20157 non-null float64\n", 32 | "car_width 20157 non-null float64\n", 33 | "class_id 20297 non-null int64\n", 34 | "compartment 20157 non-null float64\n", 35 | "cylinder_number 20157 non-null float64\n", 36 | "department_id 20157 non-null float64\n", 37 | "displacement 20157 non-null float64\n", 38 | "driven_type_id 20157 non-null float64\n", 39 | "emission_standards_id 20157 non-null float64\n", 40 | "engine_torque 20138 non-null float64\n", 41 | "equipment_quality 20157 non-null float64\n", 42 | "front_track 20157 non-null float64\n", 43 | "fuel_type_id 20154 non-null float64\n", 44 | "gearbox_type 20157 non-null object\n", 45 | "if_MPV_id 20157 non-null float64\n", 46 | "if_charging 20157 non-null object\n", 47 | "if_luxurious_id 20157 non-null float64\n", 48 | "level_id 19859 non-null float64\n", 49 | "newenergy_type_id 20157 non-null float64\n", 50 | "power 20157 non-null float64\n", 51 | "price 11377 non-null float64\n", 52 | "price_level 20157 non-null float64\n", 53 | "rated_passenger 20157 non-null object\n", 54 | "rear_track 20157 non-null float64\n", 55 | "sale_date 20297 non-null datetime64[ns]\n", 56 | "sale_quantity 20157 non-null float64\n", 57 | "total_quality 20157 non-null float64\n", 58 | "type_id 20157 non-null float64\n", 59 | "wheelbase 20157 non-null float64\n", 60 | "dtypes: datetime64[ns](1), float64(25), int64(2), object(4)\n", 61 | "memory usage: 5.1+ MB\n" 62 | ] 63 | } 64 | ], 65 | "source": [ 66 | "# 将level_id字段中的-替换为np.nan\n", 67 | "df = pd.read_csv('../../data/origin/[new] yancheng_train_20171226.csv', dtype={'sale_date':str}, na_values=['-'], low_memory=False)\n", 68 | "df['sale_date']= pd.to_datetime(df['sale_date'], format='%Y%m')\n", 69 | "\n", 70 | "# 将price_level字段转换成有序类别的类型,并用其数值填入该列。\n", 71 | "df['price_level'] = df['price_level'].astype('category', categories=['5WL','5-8W','8-10W','10-15W','15-20W','20-25W','25-35W','35-50W','50-75W'], ordered=True)\n", 72 | "df['price_level'] = df['price_level'].cat.codes\n", 73 | "\n", 74 | "# 待选方案:先把power和扭矩字段带/的行复制一份,然后将新行里的销量清零,将原行和新行的power和扭矩字段的值分别赋为slash前后的值。\n", 75 | "# 现行方案:先他娘的直接把slash和后面的值删掉。省得影响记录条数相关的统计量。\n", 76 | "def process_power_and_torque(s):\n", 77 | " return s.split('/')[0]\n", 78 | "df['power'] = df['power'].astype(str).apply(process_power_and_torque).astype(float) #[18600]\n", 79 | "df['engine_torque'] = df['engine_torque'].astype(str).apply(process_power_and_torque).astype(float)\n", 80 | "\n", 81 | "# -------------------------------------------------------------\n", 82 | "# 把2017年11月的数据拼进来,一块填入其特征,用于最终输出要提交的结果。\n", 83 | "empty_Nov = pd.read_csv('../../data/origin/yancheng_testA_20171225.csv', dtype={'predict_date':str}, na_values=['-'], low_memory=False)\n", 84 | "empty_Nov['predict_date']= pd.to_datetime(empty_Nov['predict_date'], format='%Y%m')\n", 85 | "empty_Nov.rename(columns = {'predict_date': 'sale_date', 'predict_quantity':'sale_quantity'}, inplace = True)\n", 86 | "\n", 87 | "\n", 88 | "# 读取玩了,先不急着拼,先把车型到品牌的映射关系join进来\n", 89 | "class_to_brand = df[['class_id','brand_id']].groupby(['class_id']).mean().reset_index()\n", 90 | "empyt_Nov = pd.merge(left=empty_Nov, right=class_to_brand, on='class_id', how='left')\n", 91 | "empty_Nov['brand_id']= class_to_brand['brand_id']\n", 92 | "# empty_Nov\n", 93 | "# class_to_brand\n", 94 | "\n", 95 | "# class_to_brand\n", 96 | "\n", 97 | "# 读取完了,拼上去\n", 98 | "df = pd.concat([df, empty_Nov])\n", 99 | "df.info()" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 13, 105 | "metadata": {}, 106 | "outputs": [], 107 | "source": [ 108 | "%qtconsole" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "9. 过去几年中,当前月的总销量占当年总销量的比例\n", 116 | "1. 过去几年中,单月各车型总销量" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "### 该类型特征的 column name 都以 T_ 开头,意为Time(时间)\n", 124 | "#### 单纯的时间信息\n", 125 | "1. 年\n", 126 | "2. 月\n", 127 | "3. 年、月的Onehot\n", 128 | "4. 月份相关的【还未加入】\n", 129 | " 4. 当月有几个周六\n", 130 | " 5. 当月有几个周日\n", 131 | " 6. 当月有多少天\n", 132 | " 7. 当月周六+周日一共有多少天\n", 133 | " 8. 当月周末占全月天数的比例(即上一条除以上上一条)\n", 134 | " 10. 当月有多少天假期\n", 135 | "9. 当月是哪个季节??【还未加入】\n", 136 | "1. 当月是上半年还是下半年??【还未加入】\n", 137 | "\n", 138 | "#### 历史销量信息(不区分车型/品牌)~~【还未加入】~~\n", 139 | "1. 过去5年,单月各车型总销量【T_som_i, Sale of One Month】\n", 140 | "3. 从去年开始往前看,当前月对应月份的销量在当年总销量中的比例(如当前行的sale_date=201710,则这个特征计算的是201610的总销量在2016年全年销量中的占比)【还未加入】\n" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": 3, 146 | "metadata": {}, 147 | "outputs": [ 148 | { 149 | "data": { 150 | "text/plain": [ 151 | "Index(['TR', 'brand_id', 'car_height', 'car_length', 'car_width', 'class_id',\n", 152 | " 'compartment', 'cylinder_number', 'department_id', 'displacement',\n", 153 | " 'driven_type_id', 'emission_standards_id', 'engine_torque',\n", 154 | " 'equipment_quality', 'front_track', 'fuel_type_id', 'gearbox_type',\n", 155 | " 'if_MPV_id', 'if_charging', 'if_luxurious_id', 'level_id',\n", 156 | " 'newenergy_type_id', 'power', 'price', 'price_level', 'rated_passenger',\n", 157 | " 'rear_track', 'sale_date', 'sale_quantity', 'total_quality', 'type_id',\n", 158 | " 'wheelbase'],\n", 159 | " dtype='object')" 160 | ] 161 | }, 162 | "execution_count": 3, 163 | "metadata": {}, 164 | "output_type": "execute_result" 165 | } 166 | ], 167 | "source": [ 168 | "df.columns" 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": 9, 174 | "metadata": {}, 175 | "outputs": [ 176 | { 177 | "name": "stdout", 178 | "output_type": "stream", 179 | "text": [ 180 | "\n", 181 | "DatetimeIndex: 71 entries, 2012-01-01 to 2017-11-01\n", 182 | "Data columns (total 1 columns):\n", 183 | "T_som_0 70 non-null float64\n", 184 | "dtypes: float64(1)\n", 185 | "memory usage: 1.1 KB\n" 186 | ] 187 | } 188 | ], 189 | "source": [ 190 | "g_date = df[['sale_date','sale_quantity']].groupby('sale_date').sum()\n", 191 | "g_date = g_date.rename(columns={'sale_quantity':'T_som_0'})\n" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": 10, 197 | "metadata": {}, 198 | "outputs": [ 199 | { 200 | "data": { 201 | "text/html": [ 202 | "
\n", 203 | "\n", 216 | "\n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | "
T_som_0
sale_date
2012-01-0122927.0
2012-02-0114433.0
2012-03-0115587.0
2012-04-0113438.0
2012-05-0118195.0
\n", 250 | "
" 251 | ], 252 | "text/plain": [ 253 | " T_som_0\n", 254 | "sale_date \n", 255 | "2012-01-01 22927.0\n", 256 | "2012-02-01 14433.0\n", 257 | "2012-03-01 15587.0\n", 258 | "2012-04-01 13438.0\n", 259 | "2012-05-01 18195.0" 260 | ] 261 | }, 262 | "execution_count": 10, 263 | "metadata": {}, 264 | "output_type": "execute_result" 265 | } 266 | ], 267 | "source": [ 268 | "g_date.head()" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": 5, 274 | "metadata": {}, 275 | "outputs": [ 276 | { 277 | "name": "stdout", 278 | "output_type": "stream", 279 | "text": [ 280 | "\n", 281 | "Int64Index: 20297 entries, 0 to 139\n", 282 | "Data columns (total 53 columns):\n", 283 | "TR 20157 non-null object\n", 284 | "brand_id 20297 non-null int64\n", 285 | "car_height 20157 non-null float64\n", 286 | "car_length 20157 non-null float64\n", 287 | "car_width 20157 non-null float64\n", 288 | "class_id 20297 non-null int64\n", 289 | "compartment 20157 non-null float64\n", 290 | "cylinder_number 20157 non-null float64\n", 291 | "department_id 20157 non-null float64\n", 292 | "displacement 20157 non-null float64\n", 293 | "driven_type_id 20157 non-null float64\n", 294 | "emission_standards_id 20157 non-null float64\n", 295 | "engine_torque 20138 non-null float64\n", 296 | "equipment_quality 20157 non-null float64\n", 297 | "front_track 20157 non-null float64\n", 298 | "fuel_type_id 20154 non-null float64\n", 299 | "gearbox_type 20157 non-null object\n", 300 | "if_MPV_id 20157 non-null float64\n", 301 | "if_charging 20157 non-null object\n", 302 | "if_luxurious_id 20157 non-null float64\n", 303 | "level_id 19859 non-null float64\n", 304 | "newenergy_type_id 20157 non-null float64\n", 305 | "power 20157 non-null float64\n", 306 | "price 11377 non-null float64\n", 307 | "price_level 20157 non-null float64\n", 308 | "rated_passenger 20157 non-null object\n", 309 | "rear_track 20157 non-null float64\n", 310 | "sale_date 20297 non-null datetime64[ns]\n", 311 | "sale_quantity 20157 non-null float64\n", 312 | "total_quality 20157 non-null float64\n", 313 | "type_id 20157 non-null float64\n", 314 | "wheelbase 20157 non-null float64\n", 315 | "year 20297 non-null int64\n", 316 | "month 20297 non-null int64\n", 317 | "is_leap_year 20297 non-null bool\n", 318 | "year_oh_2012 20297 non-null uint8\n", 319 | "year_oh_2013 20297 non-null uint8\n", 320 | "year_oh_2014 20297 non-null uint8\n", 321 | "year_oh_2015 20297 non-null uint8\n", 322 | "year_oh_2016 20297 non-null uint8\n", 323 | "year_oh_2017 20297 non-null uint8\n", 324 | "month_oh_1 20297 non-null uint8\n", 325 | "month_oh_2 20297 non-null uint8\n", 326 | "month_oh_3 20297 non-null uint8\n", 327 | "month_oh_4 20297 non-null uint8\n", 328 | "month_oh_5 20297 non-null uint8\n", 329 | "month_oh_6 20297 non-null uint8\n", 330 | "month_oh_7 20297 non-null uint8\n", 331 | "month_oh_8 20297 non-null uint8\n", 332 | "month_oh_9 20297 non-null uint8\n", 333 | "month_oh_10 20297 non-null uint8\n", 334 | "month_oh_11 20297 non-null uint8\n", 335 | "month_oh_12 20297 non-null uint8\n", 336 | "dtypes: bool(1), datetime64[ns](1), float64(25), int64(4), object(4), uint8(18)\n", 337 | "memory usage: 5.8+ MB\n" 338 | ] 339 | } 340 | ], 341 | "source": [ 342 | "# 单纯时间信息\n", 343 | "df['year'] = df['sale_date'].apply(lambda x:x.year)\n", 344 | "df['month'] = df['sale_date'].apply(lambda x:x.month)\n", 345 | "df['is_leap_year'] = df['sale_date'].apply(lambda x:x.is_leap_year)\n", 346 | "df['year_oh'] = df['year']\n", 347 | "df['month_oh'] = df['month']\n", 348 | "df = pd.get_dummies(df, columns=['year_oh','month_oh'])\n", 349 | "df.info()" 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": 7, 355 | "metadata": {}, 356 | "outputs": [], 357 | "source": [ 358 | "T_features = pd.merge(df, g_date, how='left', left_on='sale_date', right_index=True)" 359 | ] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": 8, 364 | "metadata": {}, 365 | "outputs": [], 366 | "source": [ 367 | "T_features.to_csv(\"../../data/features/T_features.csv\",index=False)" 368 | ] 369 | }, 370 | { 371 | "cell_type": "code", 372 | "execution_count": null, 373 | "metadata": {}, 374 | "outputs": [], 375 | "source": [ 376 | "# 历史销量信息\n", 377 | "# 主要逻辑\n", 378 | "def calc_sale_features_on_time(df):\n", 379 | " \"\"\"\n", 380 | " Args:\n", 381 | " df: 完整的数据集\n", 382 | " Return:\n", 383 | " tmp:基于历史销量(不分品牌和车型)信息,构造出的特征们\n", 384 | " \"\"\"\n", 385 | "# g_date = df.groupby(['class_id','sale_date']).sum().reset_index()[['class_id','sale_date','sale_quantity']]\n", 386 | "# gg = g_date.groupby('class_id')\n", 387 | " \n", 388 | " g_date = df[['sale_date','sale_quantity']].groupby('sale_date').sum()\n", 389 | " \n", 390 | "\n", 391 | " # 过去几年内的每个月销量\n", 392 | " tmp = g_cls_date\n", 393 | " for i in range(61):\n", 394 | " tmp['sale_of_month_' + str(i+1) + '_ago'] = gg.apply(f, -(i+1), -i).reset_index()['sale_date']\n", 395 | "\n", 396 | "\n", 397 | " # 该车型过去2~60个月分别的销量和\n", 398 | " tmp['sum_sale_of_last_1_month'] = tmp['sale_of_month_1_ago']\n", 399 | " for i in range(60):\n", 400 | " # tmp['sum_sale_of_last_' + str(i+1) + '_month'] = gg.apply(f, -(i+1), 0).reset_index()['sale_date']\n", 401 | " tmp['sum_sale_of_last_' + str(i+2) + '_month'] = tmp['sum_sale_of_last_' + str(i+1) + '_month'] + \\\n", 402 | " tmp['sale_of_month_' + str(i+2) + '_ago']\n", 403 | " tmp = tmp.drop('sum_sale_of_last_1_month', axis=1) # 再把这一列删掉,因为和前面的 sale_of_month_1_ago 列是重复的\n", 404 | " \n", 405 | " # 一阶差分\n", 406 | " for i in range(60):\n", 407 | " thismonth = tmp['sale_of_month_' + str((i+1)*12) + '_ago']\n", 408 | " lastmonth = tmp['sale_of_month_' + str((i+1)*12+1) + '_ago'] # gg.apply(f, -(i+1)*12-1, -(i+1)*12).reset_index()['sale_date']\n", 409 | " tmp['rate_of_this_month_divby_last_month_' + str(i+1) + '_year_ago'] = thismonth / lastmonth\n", 410 | " tmp['diff_of_this_month_sub_last_month_' + str(i+1) + '_year_ago'] = thismonth - lastmonth\n", 411 | "\n", 412 | " # 该车型往年这个月比上个月的销量比值\n", 413 | " # 该车型往年这个月减去上个月的销量差值\n", 414 | " for i in range(3): # 只看过去三年的\n", 415 | " thismonth = tmp['sale_of_month_' + str((i+1)*12) + '_ago']\n", 416 | " lastmonth = tmp['sale_of_month_' + str((i+1)*12+1) + '_ago'] # gg.apply(f, -(i+1)*12-1, -(i+1)*12).reset_index()['sale_date']\n", 417 | " tmp['rate_of_this_month_divby_last_month_' + str(i+1) + '_year_ago'] = thismonth / lastmonth\n", 418 | " tmp['diff_of_this_month_sub_last_month_' + str(i+1) + '_year_ago'] = thismonth - lastmonth\n", 419 | "\n", 420 | " # 该车型上个月比上上个月的比值\n", 421 | " thisyear_lastmonth = tmp['sale_of_month_1_ago']\n", 422 | " thisyear_lastlastmonth = tmp['sale_of_month_2_ago']\n", 423 | " tmp['rate_of_last_divby_lastlast'] = thisyear_lastmonth / thisyear_lastlastmonth\n", 424 | " # 该车型上个月减去上上个月的差值\n", 425 | " tmp['diff_of_last_sub_lastlast'] = thisyear_lastmonth - thisyear_lastlastmonth\n", 426 | "\n", 427 | " # 注意要把np.inf替换为空值,在上面算月销量比例时,引入了inf,其实应该作为空值。\n", 428 | " return tmp.replace([np.inf, -np.inf], np.nan)" 429 | ] 430 | } 431 | ], 432 | "metadata": { 433 | "kernelspec": { 434 | "display_name": "Python 3", 435 | "language": "python", 436 | "name": "python3" 437 | }, 438 | "language_info": { 439 | "codemirror_mode": { 440 | "name": "ipython", 441 | "version": 3 442 | }, 443 | "file_extension": ".py", 444 | "mimetype": "text/x-python", 445 | "name": "python", 446 | "nbconvert_exporter": "python", 447 | "pygments_lexer": "ipython3", 448 | "version": "3.6.1" 449 | } 450 | }, 451 | "nbformat": 4, 452 | "nbformat_minor": 2 453 | } 454 | -------------------------------------------------------------------------------- /截图/5份数据集拆分.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IceSuger/yancheng-sales/82e36cfa46733a3dc8e69ea7f25a82851e9d2a29/截图/5份数据集拆分.png -------------------------------------------------------------------------------- /截图/JOIN_before_f8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IceSuger/yancheng-sales/82e36cfa46733a3dc8e69ea7f25a82851e9d2a29/截图/JOIN_before_f8.png -------------------------------------------------------------------------------- /截图/XGB结果处理.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IceSuger/yancheng-sales/82e36cfa46733a3dc8e69ea7f25a82851e9d2a29/截图/XGB结果处理.png -------------------------------------------------------------------------------- /截图/融合.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IceSuger/yancheng-sales/82e36cfa46733a3dc8e69ea7f25a82851e9d2a29/截图/融合.png -------------------------------------------------------------------------------- /解题思路.md: -------------------------------------------------------------------------------- 1 | # 解题思路 2 | 3 | 最终线上名次14。 4 | 5 | - 复赛A榜只用了特征工程+单模型XGBoost。 6 | 7 | - 复赛B榜最终预测时融合了**机器学习算法模型**和**人工规则**两种方法。 8 | 前者简单概括就是**特征工程+XGBoost**,后者都不用简单概括,本身就很简单,就是用A榜时预测的**上个月的销量乘个系数**来预测本月销量。 9 | 10 | 模型这里特征工程的大致思路就是:先将销量还原为原始销量(下面再细说),然后围绕车型(即不同的class_id)、车类型(即各个车型是轿车、SUV还是MPV)、车排量的等级(根据经验划了几个档次)、车是否为新能源车这几项分别归类来做了历史销量的一系列统计特征,此外再辅以一些车型本身的特点信息。 11 | 12 | 下面我分几个部分详细谈谈解题思路,其中数据探索分析EDA的过程及发现就不单独提出来说了,而是穿插在各个部分中间。 13 | 14 | ## 0. 看看评价指标 15 | 不同于初赛直接算RMSE,复赛的评价指标为RMSLE,即下式,其中参数意义不再赘述。 16 | ![评价指标](https://note.youdao.com/yws/public/resource/b1fac6368ce6469bfed546a0b1e310d3/xmlnote/2488EF57D50F40E38370AC1D19F7E7FB/5521) 17 | 上述评价函数中由于log的存在,使得应该被关注的重点不再是大绝对值的项,而是大**比例**的项。 18 | 比如有的项很小,如0,而预测值为8,看起来绝对值上没差多少,但计算时是:![](https://note.youdao.com/yws/public/resource/b1fac6368ce6469bfed546a0b1e310d3/xmlnote/831E2978C40C4206847D3F1754D4AFB8/5524) 19 | 相比之下,如果原值1000,预测值1500,绝对值差了500,但计算误差时是:![](https://note.youdao.com/yws/public/resource/b1fac6368ce6469bfed546a0b1e310d3/xmlnote/B88DC09DDA4845FBA933AE6CC2B7FC75/5526) 20 | 高下立判呐! 21 | 22 | 事实上,评价指标为RMSLE,暗示了我们两点:1. **绝对值小的值**值得关注;2. 假设能够知道自己针对某个数据点的预测误差在一定范围内,那么应该尽量选择给出大的预测值,因为在正负差不多的误差范围内,预测偏小比预测偏大的后果更严重(这一点通过看log函数的图像即可理解)。 23 | 24 | 那么需要考虑一下数据集中数据的分布情况,对训练数据集做统计后发现,“小值”占的比例很大。如下图: 25 | ![](https://note.youdao.com/yws/public/resource/b1fac6368ce6469bfed546a0b1e310d3/xmlnote/63309BCD0F024B6F8E8A5CFF5DA823E6/5528) 26 | 27 | 进一步,由于回归模型优化的目标(如xgb回归时的目标reg:linear)通常为均方误差等,可以认为是盯着**绝对**误差去减小它,于是自然考虑在训练前将label列做`log1p`的变换,预测后再还原回去。**然而**……在线下验证集上验证这种做法的效果并不好(相比于不做变换直接训练,结果反而略微变差了)。 28 | 29 | **不过**评价指标这里暗示了对“小值”应当更加关注,这对后面的原始销量还原、训练集构造、特征选取,都有指导意义。 30 | 31 | ## 1. 预处理 32 | 这里的预处理操作并不包括异常值清洗等,事实上这里所谓的预处理,也可以认为是特征工程的一部分。 33 | 34 | 1. 首先将同一省市、同一车型、同一年月的销量数据合并,即将整个训练集变为`(province_id, city_id, class_id, sale_date)`这个四元组唯一确定一条记录的形式; 35 | 2. 由于计划在特征工程中不涉及全表统计量,故可以在进一步提特征之前,先将测试集和训练集拼到一起,方便统一进行提特征的操作,而不用担心线下验证集上会有可能发生“数据穿越”的问题; 36 | 3. 对`String`类型的几列做处理: 37 | 1. `sale_date`:分别转为 `bigint` 和 `datetime` 类型,便于后面的数据集拆分和特征构造; 38 | 2. `displacement`:排量中的数字部分提取出来,丢掉后面的L和T; 39 | 3. `if_charging`:转换为`bigint`; 40 | 4. `rated_passenger`:取最大的值出来,如“5-7”则取7; 41 | 5. `power`:取最小值,如“70/144”,取70; 42 | 4. 还原真实销量值: 43 | ###### 这里稍微啰嗦一下。 44 | 首先由于赛题中的说明: 45 | > 参赛者允许使用任何可公开的外部数据辅助预测。 46 | 47 | 我查了网上报道中[17年全年各月的乘用车销量概况](http://auto.sohu.com/s2018/sar132/index.shtml),并对赛题数据集中各月销量做了统计,发现数据集中的销量比报道中的数据大很多。就各月销量总和来看,赛题数据与报道数据之间比值都在2~3倍,且不是同一个常数。那么结合之前观察数据集发现小值居多,首先猜测:可能是对原始销量做了变换`e^log_2(x)`,如下图中的蓝线所示(橙色线为赛题数据,灰色线为网上报道的值),从图上可以看出整体相对于报道值都偏小。我当时也猜测过其他变换方式,但尝试了几个发现都明显猜错了。 48 | 49 | 直到后来,单独统计了十个左右的随机挑选出来的**某个具体城市的具体车型的历史销量**,本来目的是看看趋势,却意外地从数据中发现一个规律:有的车型这一列销量数据结尾都是0或5,有的显然都是2、3、4的整数倍,于是乎...我有一个大胆的想法...(当然可能其实这样的变换也是很常见很常用的...但对于我这个萌新来说还是很意外的...)可能是主办方分车型分别将原始销量值乘了1、2、3、4、5、6等等倍数。于是对数据集整体进行统计后,基本验证了我这个猜想,统计出各个车型所乘的倍数后,看了一下倍数的直方图,1~5这五个被乘系数的各自总量是接近的,当时还猜测过这背后会不会隐藏了什么leakage,当然进一步统计计算后我认为这个倍乘的系数应该是随机分的... 50 | 51 | 统计各个车型对应的倍乘系数的方式是:利用窗口函数,分别统计该车型销量的distinct的最低、次低、第三低的值,并取其间的差的最小值作为该车型的被乘系数。当然这种方式有可能出现错误,但大体上应该还是可以的,而且这种方式应该也是比较方便省事的... 52 | 53 | ![猜测原始销量](https://note.youdao.com/yws/public/resource/b1fac6368ce6469bfed546a0b1e310d3/xmlnote/4FCEC44EC68F4CBAB4351792D879BA6B/5530) 54 | 55 | **需要澄清一点:由于比赛规则强调不允许下载数据,我在比赛过程中分析数据的过程中并没有下载数据,上图是我现在赛后为了写解题思路时更形象可视,临时将平台上的统计数据拷贝出来,在Excel中绘制的。(赛后把部分统计值复制粘贴一下,应该不算犯规哈?)** 56 | 也顺便吐槽一点,平台的可视化功能好像还不够强大(当然也可能是有这方面功能但我不会用...至少是没找到相关文档)... 57 | 58 | 5. 线下验证集的构造: 59 | 我最初在线下构造了四个训练/验证集(A/B榜最初都是4个),A榜四个训练及验证集的划分分别如下: 60 | 61 | 命名 | 训练集(3年) | 测试集/验证集(1个月)|备注 62 | ---|---|---|--- 63 | dataset0 | 2015.01~2017.12 | 2018.01 | 实际提交的 64 | dataset1 | 2014.12~2017.11 | 2017.12 |1~4这四个数据集都是线下验证用的 65 | dataset2 | 2014.11~2017.10 | 2017.11 | 66 | dataset3 | 2014.01~2016.12 | 2017.01 | 67 | dataset4 | 2013.12~2016.11 | 2016.12 | 68 | 69 | B榜四个训练及验证集的划分分别如下: 70 | 71 | 命名 | 训练集(3年) | 测试集/验证集(1个月)|备注 72 | ---|---|---|--- 73 | dataset0 | 2014.12~2017.12 | 2018.02 | 实际提交的 74 | dataset1 | 2013.11~2016.11 | 2017.01 |1~4这四个数据集都是线下验证用的 75 | dataset2 | 2013.12~2016.12 | 2017.02 | 76 | dataset3 | 2012.12~2015.12 | 2016.02 | 77 | dataset4 | 2014.10~2017.10 | 2017.12 | 78 | 79 | 80 | 后来出于尽量保证线上线下一致,且平台资源也有限,A榜只用了dataset3,B榜只用了dataset2和3。 81 | 82 | 83 | ## 2.特征工程 84 | ### i.绝大部分特征都是围绕着各种分类后统计的历史销量及销量趋势来提的。 85 | ### ii.数据探索分析过程中的部分发现 86 | a. 激增和骤降 87 | 根据相邻月份销量的差分和比值,我大致定义了销量激增和骤降,并对这两种情况在数据集中发生的情况进行的统计。比较重要的结论是:激增在各月发生的频率相差不大,骤降则在2月发生得明显多于其他月份。这个结论对后来训练集的选取有帮助。 88 | 89 | b. 车的类型(轿车、SUV等) 90 | 无论是从初赛和复赛对数据的统计分析,还是从生活经验,还是从网上关于乘用车产销量的报道,都可以发现,轿车和SUV在不同时间有着不同的销量和销量走势,因此将其区分开来,必然有利于对销量的预测。 91 | 然而赛题数据并没有给出车型属于轿车还是SUV(或者也可能是我没发现...毕竟是刚过实习期的新司机...对车并不懂...),仅仅是标注了是否为MPV。于是只能根据经验和统计数据,大致地进行划分。对不同车型的外形参数统计后,发现车高`car_height`似乎有着与我的日常生活经验比较一致的、可以用于区分轿车和SUV的分布特点,如下图:![](https://note.youdao.com/yws/public/resource/b1fac6368ce6469bfed546a0b1e310d3/xmlnote/AA3CD27AD0494C6D805796BD583E9DE8/5532) 92 | 于是简单地指定一个规则:2厢且高度大于1600的车,是SUV。 93 | 94 | 95 | ### iii.特征简述 96 | 下面简单列一下构造了哪些特征、用了哪些特征吧,也包括了一部分前述的数据预处理的工作。代码运行说明中的特征工程部分(或源码中)提到的特征命名,与这里的描述相对应。 97 | **说明**:特征列的命名中的`_c_`没有实际含义,`_d_`,`_dr_`和`_draft_`都表示演草,实际上表明相应的表和列都是中间结果。 98 | 99 | 1. **对`yc_passenger_car_sales`做除以倍数的处理【原赛题数据中的销量仍名为`sale_quantity`,还原后的销量为`real_sale`】** 100 | 101 | 2. **分(省、市、车型、时间)统计销量和——即对应前述“数据预处理”第一步;** 102 | 103 | 3. **车型自身特点的特征** 104 | 3.1 拼上`oc`特征(`oc`意为`only class`,即指此类特征为单纯依据车型自身特点提取的) 105 | 3.2 对排量等,归类 106 | a. 根据车的厢数`compartment`和车高`height`共同判断该车型属于轿车还是SUV,此外由于数据中已经标出了MPV,故将车型归为四类:轿车、SUV、MPV、其他; 107 | b. 排量分为:小于1、1 ~ 1.3、1.3 ~ 1.6、1.6 ~ 2.4、2.4以上; 108 | c. 新能源分类采用原始数据里的值; 109 | 110 | 4. **计算各类`som、ssm、fd、fr、fir`【B榜:最终都不要`som_1`, 且`ssm_`从2开始:`ssm_2_2, ssm_2_3`】** 111 | 4.1 省 市 车型 时间 112 | 4.2 省 市 时间 113 | 4.3 省 车型 时间 114 | 4.4 (全国)车型 时间 115 | 116 | 4.1 ~ 4.4 分别表示group by的参数。 117 | 实际上,A榜用了 4.1 ~ 4.4 的特征,但B榜由于时间和资源的限制,仅仅提了4.1的特征。 118 | 这一批特征命名含义分别为:`som`:单月销量、`ssm`:从本月往前数的累积销量、`fd`:不同月单月销量的差、`fr`:不同月单月销量的比值、`fir`:不同月销量间增长的比例(下降则为负) 119 | 120 | 最终使用的特征分别是:过去一年内的各个月的单月销量、过去2个月至一年内的分别的累积销量、当月与前一个月的差/比/fir、前一个月与上上个月的差/比/fir、今年本月与去年本月的差/比/fir、去年本月与前年本月的差/比/fir; 121 | 5. **按车型自身特点分:轿车、SUV、MPV,新能源类别,大小排量,(厢数),品牌。这些变量,都可替换上面4.1 ~ 4.4中的“车型”变量。** 122 | 5.1 轿车/SUV/MPV,省、市、时间 123 | 5.2 排量分了几个档次 124 | 5.3 新能源类别,按赛题数据原有的新能源类别 125 | 5.4 品牌——由于时间、精力所限,没有加入这一部分特征,另一方面原因是,初赛时这一部分特征的效果并不好。 126 | 127 | 6. **各种窗口函数,如rank、avg、max、min、median** 128 | 6.0.1 ~ 6.0.4 省 市 时间 上的:车型、车类型、排量、新能源销量 129 | 6.1.1 ~ 6.1.4 省 时间 上的:车型、车类型、排量、新能源销量——这一部分在A榜实现了,B榜没有 130 | 131 | 7. **历史销量交叉特征** 132 | 7.1 省 市 车型 133 | 7.2 省 市 车类型 134 | 7.3 省 市 排量 135 | 7.4 省 市 新能源 136 | 137 | 仍然是分了上述几种不同的分类来进行分别统计。这一部分的特征主要是:某个时间段的均值/中位数/最大值/最小值,与另一个时间段的均值/中位数/最大值/最小值,做差和做比(做比这里通过+1平滑绕过了除零等问题) 138 | 139 | 8. **时间**:年、月单独抽出来作为两列 140 | 141 | 142 | ## 3.训练模型 143 | 1. 训练集的构造 144 | 首先,在大致浏览过数据后,结合初赛的经验,没有选择使用全部6年的数据作为训练集,而是在提完特征后,对拆分出的五份数据集,都是使用最近三年的数据作为训练集。 145 | 其次,由于前述的EDA中的发现,2月具有一定特殊性,即2月发生骤降的可能性较大,而我之前提的特征,许多都在反映近几个月的销量走势,因此模型容易学习到近期涨,则下个月(或针对B榜确切地说是下下个月)也涨的趋势。因此尝试只选出近三年的训练集中目标为2月的记录作为训练集。线下验证(在dataset2和dataset3这两个预测目标均为二月的验证集上)的结果表明这种选择方式,能够有效降低针对二月的预测误差。 146 | 147 | 2. 模型 148 | 模型就是XGBoost,没太多可说的。期初也尝试了平台上的GBDT,但对调参不熟悉,在验证集上取得的结果并不理想,因此后来放弃了GBDT。 149 | XGB的参数,由于资源有限,在B榜也没有怎么调,仅仅大概尝试了两三组参数。 150 | 151 | 最终模型的预测结果要按各自对应的倍数`times`乘回去。 152 | ## 4.人工规则及融合 153 | 初赛中,在统计各个车型历史销量的走势及整体销量的走势等的基础上,人工指定了系数,直接乘到上个月的销量上来作为目标月份销量的预测值,效果还可以。所以在复赛中一开始也尝试了这个套路,结果并不尽人意。 154 | 然而啊...在B榜最后的提交时还是抱着一丝希望,对A榜中我预测的1月结果乘了0.6的比例作为对2月的预测值,然后将这个结果也放进了最终的融合当中。 155 | 156 | 多提一句。 157 | 在这个规则上,还考虑过一点,由于赛题规则开放一切可公开外部数据,于是尝试依据销量报道中对轿车、SUV等分别统计环比、同比销量涨跌的幅度,来进行这里的乘系数操作(*实际上,这就已经是在使用未来数据了,不过本题并没有要求不允许使用这种数据*)。但在线下验证集上试了一下,效果很差。我猜原因包括但不限于:1. 这种方法本身当然不是特别靠谱啦;2.我在前期特征工程中对轿车、SUV的区分做得不好。 158 | 159 | 最终融合的参数是: 160 | 1. XGB。训练集为过去三年的全部数据。round=150,seed=0的直接跳过一月预测二月; 161 | 2. XGB。训练集为过去三年的二月数据。换round=145,seed=1的预测二月的; 162 | 3. A榜一月的次优成绩v20180310(因为失误,最优成绩没有保存...),拿来乘以0.6。 163 | 164 | 三者比例: 165 | 0.262 166 | 0.61 167 | 0.128 168 | 169 | 170 | (然而,实际上融合之后就后悔了...因为A榜时也尝试过融合这种简单人工规则的做法,效果并不好...而且第一个XGB的结果也必然是偏大的,估计还不如换换第二个XGB的参数,跑两三个不一样的结果来融合,或者只提交单模型的结果) 171 | 172 | 最终结果0.9780,排在B榜14名。 173 | --------------------------------------------------------------------------------