├── README.md ├── adverStat ├── adverStat.iml ├── pom.xml ├── src │ └── main │ │ ├── java │ │ └── scala │ │ │ ├── JdbcHelper.scala │ │ │ └── advertStat.scala │ │ └── resources │ │ └── ad.sql └── target │ └── classes │ ├── META-INF │ └── adverStat.kotlin_module │ └── scala │ ├── AdBlacklistDAO$$anon$1.class │ ├── AdBlacklistDAO$.class │ ├── AdBlacklistDAO.class │ ├── AdClickTrendDAO$$anon$5.class │ ├── AdClickTrendDAO$.class │ ├── AdClickTrendDAO.class │ ├── AdProvinceTop3DAO$.class │ ├── AdProvinceTop3DAO.class │ ├── AdStatDAO$$anon$4.class │ ├── AdStatDAO$.class │ ├── AdStatDAO.class │ ├── AdUserClickCountDAO$$anon$2.class │ ├── AdUserClickCountDAO$$anon$3.class │ ├── AdUserClickCountDAO$.class │ ├── AdUserClickCountDAO.class │ ├── advertStat$$typecreator5$1.class │ ├── advertStat$.class │ ├── advertStat.class │ ├── test$.class │ └── test.class ├── commons ├── commons.iml ├── pom.xml ├── src │ └── main │ │ ├── java │ │ └── commons │ │ │ ├── conf │ │ │ └── ConfigurationManager.scala │ │ │ ├── constant │ │ │ └── Constants.scala │ │ │ ├── model │ │ │ └── DataModel.scala │ │ │ ├── pool │ │ │ └── PooledMySqlClientFactory.scala │ │ │ └── utils │ │ │ └── Utils.scala │ │ └── resources │ │ ├── commerce.properties │ │ └── log4j.properties └── target │ └── classes │ ├── commerce.properties │ ├── commons │ ├── conf │ │ ├── ConfigurationManager$.class │ │ └── ConfigurationManager.class │ ├── constant │ │ ├── Constants$.class │ │ └── Constants.class │ ├── model │ │ ├── AdBlacklist$.class │ │ ├── AdBlacklist.class │ │ ├── AdClickTrend$.class │ │ ├── AdClickTrend.class │ │ ├── AdProvinceTop3$.class │ │ ├── AdProvinceTop3.class │ │ ├── AdStat$.class │ │ ├── AdStat.class │ │ ├── AdUserClickCount$.class │ │ ├── AdUserClickCount.class │ │ ├── ProductInfo$.class │ │ ├── ProductInfo.class │ │ ├── SessionAggrStat$.class │ │ ├── SessionAggrStat.class │ │ ├── SessionDetail$.class │ │ ├── SessionDetail.class │ │ ├── SessionRandomExtract$.class │ │ ├── SessionRandomExtract.class │ │ ├── Top10Category$.class │ │ ├── Top10Category.class │ │ ├── Top10Session$.class │ │ ├── Top10Session.class │ │ ├── UserInfo$.class │ │ ├── UserInfo.class │ │ ├── UserVisitAction$.class │ │ └── UserVisitAction.class │ ├── pool │ │ ├── CreateMySqlPool$.class │ │ ├── CreateMySqlPool.class │ │ ├── MySqlProxy$.class │ │ ├── MySqlProxy.class │ │ ├── PooledMySqlClientFactory$.class │ │ ├── PooledMySqlClientFactory.class │ │ └── QueryCallback.class │ └── utils │ │ ├── DateUtils$.class │ │ ├── DateUtils.class │ │ ├── NumberUtils$.class │ │ ├── NumberUtils.class │ │ ├── ParamUtils$.class │ │ ├── ParamUtils.class │ │ ├── StringUtil$.class │ │ ├── StringUtil.class │ │ ├── ValidUtils$.class │ │ └── ValidUtils.class │ ├── log4j.properties │ └── test │ ├── DataModel.scala │ ├── JdbcHelper.scala │ ├── PageConvertStat.scala │ ├── PageStat.scala │ └── ad.sql ├── mock ├── mock.iml ├── pom.xml ├── src │ └── main │ │ └── java │ │ └── scala │ │ ├── MockDataGenerate.scala │ │ └── MockRealTimeData.scala └── target │ └── classes │ └── scala │ ├── MockDataGenerate$$typecreator13$1.class │ ├── MockDataGenerate$$typecreator21$1.class │ ├── MockDataGenerate$$typecreator5$1.class │ ├── MockDataGenerate$.class │ ├── MockDataGenerate.class │ ├── MockRealTimeData$.class │ └── MockRealTimeData.class ├── pom.xml ├── readme.md └── session ├── pom.xml ├── session.iml ├── src └── main │ └── java │ ├── scala │ ├── sessionAccumulator.scala │ └── sessionStat.scala │ └── server │ ├── SortKey.scala │ ├── serverFive.scala │ ├── serverFour.scala │ ├── serverOne.scala │ ├── serverThree.scala │ └── serverTwo.scala └── target └── classes ├── META-INF └── session.kotlin_module ├── scala ├── sessionAccumulator.class ├── sessionStat$.class └── sessionStat.class └── server ├── SortKey$.class ├── SortKey.class ├── serverFive.class ├── serverFour.class ├── serverOne$$typecreator4$1.class ├── serverOne$$typecreator4$2.class ├── serverOne$$typecreator5$1.class ├── serverOne$$typecreator5$2.class ├── serverOne.class ├── serverThree.class └── serverTwo.class /README.md: -------------------------------------------------------------------------------- 1 | >电商分析平台 2 | 3 | 该项目是我根据尚硅谷大数据电商分析平台视频做的笔记,总共分成了大概十个需求,每个需求我都用一篇文章来解析 4 | 5 | 6 | ## 项目文章目录: 7 | [项目搭建及,commons模块解析,离线实时数据准备](https://blog.csdn.net/zisuu/article/details/106361630) 8 | 9 | [项目需求解析](https://blog.csdn.net/zisuu/article/details/106302167) 10 | 11 | 12 | 13 | [需求一:各个范围Session步长、访问时长占比统计](https://blog.csdn.net/zisuu/article/details/106329092) 14 | 15 | 16 | [需求二:按照比列随机抽取session](https://blog.csdn.net/zisuu/article/details/106333719) 17 | 18 | [需求三:热门top10商品](https://blog.csdn.net/zisuu/article/details/106335694) 19 | 20 | [需求四:Top10热门品类的Top10活跃Session统计](https://blog.csdn.net/zisuu/article/details/106338047) 21 | 22 | [需求五:计算给定的页面访问流的页面单跳转化率](https://blog.csdn.net/zisuu/article/details/106341485) 23 | 24 | [需求六:实时统计之黑名单机制](https://blog.csdn.net/zisuu/article/details/106354769) 25 | 26 | 27 | [需求七,九前置知识](https://blog.csdn.net/zisuu/article/details/106358260) 28 | 29 | [需求七:实时统计之各省各城市广告点击量实时统计](https://blog.csdn.net/zisuu/article/details/106356262) 30 | 31 | [需求八:实时统计之各省份广告top3排名](https://blog.csdn.net/zisuu/article/details/106357644) 32 | 33 | 34 | [需求九:实时统计之最近一小时广告点击量实时统计](https://blog.csdn.net/zisuu/article/details/106359362) 35 | 36 | [需求十:总结](https://blog.csdn.net/zisuu/article/details/106359657) 37 | 38 | 39 | ## 项目整体概述 40 | **课程简介** 41 | >本课程是一套完整的企业级电商大数据分析系统,在当下最为热门的Spark生态体系基础上构建企业级数据分析平台,本系统包括离线分析系统与实时分析系统,技术栈涵盖Spark Core,Spark SQL,Spark Streaming与Spark性能调优,并在课程中穿插Spark内核原理与面试要点,能够让学员在实战中全面掌握Spark生态体系的核心技术框架。 42 | 本课程内容丰富,所有需求均来自于企业内部,对于每一个具体需求,讲师全部采用文字与图片相结合的讲解方式,从零实现每一个需求代码并对代码进行逐行解析,让学员知其然并知其所以然,通过本课程,能够让你对Spark技术框架的理解达到新的高度。 43 | 44 | **如何学习?** 45 | 46 | 47 | - 到github下载源码(顺便给个start噢!) 48 | 地址:[spark-shopAnalyze](https://github.com/zisuu870/spark-shopAnalyze) 49 | 50 | - 根据目录中第一篇文章,理解commons模块和mock模块的作用,并跟着文章创建一个maven工程!!!这个是很重要的, 51 | - 根据目录中第二篇文章,理解需求的大概内容, 52 | - 跟着目录顺序,理解每个需求的大致内容,然后一定要自己手打一遍 53 | - 每做完一个需求,总结该需求所学 54 | - 遇到不会的算子自查百度 55 | 56 | 57 | 58 | **所用技术框架** 59 | - spark(spark-sql,spark-streaming-spark-sql) 60 | - hive 61 | - kafka 62 | - mysql 63 | - hadoop-hdfs 64 | 65 | **所需环境** 66 | 67 | - hadoop 68 | >本人是利用virtualBox搭建了hadoop的完全分布式环境如果你还没有hadoop环境,可以参考下面两篇文章: 69 | 70 | [【超详细】最新Windows下安装Virtual Box后安装CentOS7并设置双网卡实现宿主机与虚拟机互相访问](https://blog.csdn.net/adamlinsfz/article/details/84108536) 71 | [【超详细】最新VirtualBox+CentOS7+Hadoop2.8.5手把手搭建完全分布式Hadoop集群(从小白逐步进阶)](https://blog.csdn.net/adamlinsfz/article/details/84333389) 72 | 73 | - IDEA scala,spark开发环境 74 | 75 | [如何用Idea运行我们的Spark项目](https://www.cnblogs.com/tjp40922/p/12177913.html) 76 | 77 | - sparkStreaming与kafka的整合 78 | 79 | [Spark Streaming整合Kafka](https://www.jianshu.com/p/ec3bf53dcf3f) 80 | 81 | 82 | 83 | 84 | 85 | **主要功能** 86 | 主要分为离线统计和实时统计两部分,共分十个需求,每个需求一篇文章进行详解,保证能看的懂 87 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200526170258516.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3ppc3V1,size_16,color_FFFFFF,t_70) 88 | 89 | **你能学到什么?** 90 | 91 | - 整合hadoop-hdfs,kafka,spark,spark-sql,spark-streaming,hive等大数据常用框架,对所学知识起到梳理作用 92 | - 对spark的各个算子,以及spark-sql,spark-streaming深入理解.这个教程主要的核心框架就是spark 93 | - 知道常见的大数据计算模式,懂得如何对计算需求进行分析,逆推,并且做到活学活用 94 | 95 | 96 | ## 项目模块分析 97 | 98 | **项目目录:** 99 | 100 | 101 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200526171831646.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3ppc3V1,size_16,color_FFFFFF,t_70) 102 | **commons模块** 103 | 104 | >commons主要用于一些配置读取,对象连接池获取,代码规范等 105 | 106 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200526171935541.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3ppc3V1,size_16,color_FFFFFF,t_70) 107 | 108 | **mock模块** 109 | - mock模块主要用于模拟数据的获取, 110 | - MockDataGenerate用于产生离线的数据,你可以选择保存到hadoop中,亦或者保存到hive中.如果你还没学过hive,那就保存到hadoop 111 | - MockRealTimeData用于产生实时数据,并通过kafka将数据发送到sparkStreaming,以便统计实时的数据 112 | 113 | 114 | 115 | 116 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200526172158317.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3ppc3V1,size_16,color_FFFFFF,t_70) 117 | **sesion模块** 118 | 119 | - sesion模块是离线数据统计的模块 120 | - 其中sessionStat是主函数所在处 121 | - server目录下的各个server是各个需求的代码处,会通过主函数sessionStat进行引用 122 | - sessionAccumulator是自定义的累加器 123 | - SortKey是自定义排序器 124 | 125 | 126 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200526172431181.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3ppc3V1,size_16,color_FFFFFF,t_70) 127 | 128 | **adverStat模块** 129 | 130 | - advertStat是主函数所在 131 | - 因为实时部分的需求是上下相互关联的,所以都在一个主函数中进行调用 132 | - jdbcHelper,可视为java中的dao层 133 | 134 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200526172729749.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3ppc3V1,size_16,color_FFFFFF,t_70) 135 | 136 | -------------------------------------------------------------------------------- /adverStat/adverStat.iml: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /adverStat/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | shopAnalyze 7 | org.example 8 | 1.0-SNAPSHOT 9 | 10 | 4.0.0 11 | 12 | adverStat 13 | 14 | 15 | 16 | org.apache.hadoop 17 | hadoop-client 18 | 2.8.5 19 | 20 | 21 | com.fasterxml.jackson.core 22 | jackson-core 23 | 2.10.0 24 | 25 | 26 | com.fasterxml.jackson.core 27 | jackson-annotations 28 | 2.10.0 29 | 30 | 31 | 32 | com.fasterxml.jackson.core 33 | jackson-databind 34 | 2.10.0 35 | 36 | 37 | 38 | 39 | com.alibaba 40 | fastjson 41 | 1.2.36 42 | 43 | 44 | 45 | org.apache.hadoop 46 | hadoop-common 47 | 2.8.5 48 | 49 | 50 | org.apache.hadoop 51 | hadoop-hdfs 52 | 2.8.5 53 | 54 | 55 | commons-beanutils 56 | commons-beanutils 57 | 1.9.3 58 | 59 | 60 | org.apache.hadoop 61 | hadoop-yarn-common 62 | 2.8.5 63 | 64 | 65 | org.codehaus.janino 66 | janino 67 | 3.0.8 68 | 69 | 70 | org.apache.spark 71 | spark-sql_2.12 72 | 2.4.5 73 | 74 | 75 | mysql 76 | mysql-connector-java 77 | 8.0.20 78 | 79 | 80 | org.example 81 | commons 82 | ${project.version} 83 | 84 | 85 | 86 | org.apache.spark 87 | spark-core_2.12 88 | 2.4.5 89 | 90 | 91 | 92 | org.apache.spark 93 | spark-hive_2.12 94 | 2.4.5 95 | 96 | 97 | org.apache.spark 98 | spark-streaming_2.12 99 | 2.4.5 100 | 101 | 102 | org.apache.spark 103 | spark-streaming-kafka-0-10_2.12 104 | 2.4.3 105 | 106 | 107 | slf4j-log4j12 108 | org.slf4j 109 | 110 | 111 | 112 | 113 | org.apache.spark 114 | spark-sql_2.12 115 | 2.4.5 116 | 117 | 118 | -------------------------------------------------------------------------------- /adverStat/src/main/java/scala/JdbcHelper.scala: -------------------------------------------------------------------------------- 1 | package scala 2 | 3 | /* 4 | * Copyright (c) 2017. Atguigu Inc. All Rights Reserved. 5 | * Date: 11/1/17 3:40 PM. 6 | * Author: wuyufei. 7 | */ 8 | 9 | import java.sql.{DriverManager, ResultSet} 10 | 11 | import commons.model.{AdBlacklist, AdClickTrend, AdProvinceTop3, AdStat, AdUserClickCount} 12 | import commons.pool.{CreateMySqlPool, QueryCallback} 13 | 14 | import scala.collection.mutable.ArrayBuffer 15 | 16 | /** 17 | * 用户黑名单DAO类 18 | */ 19 | object AdBlacklistDAO { 20 | 21 | /** 22 | * 批量插入广告黑名单用户 23 | * 24 | * @param adBlacklists 25 | */ 26 | def insertBatch(adBlacklists: Array[AdBlacklist]) { 27 | // 批量插入 28 | val sql = "INSERT INTO ad_blacklist VALUES(?)" 29 | 30 | val paramsList = new ArrayBuffer[Array[Any]]() 31 | 32 | // 向paramsList添加userId 33 | for (adBlacklist <- adBlacklists) { 34 | val params: Array[Any] = Array(adBlacklist.userid) 35 | paramsList += params 36 | } 37 | // 获取对象池单例对象 38 | val mySqlPool = CreateMySqlPool() 39 | // 从对象池中提取对象 40 | val client = mySqlPool.borrowObject() 41 | 42 | // 执行批量插入操作 43 | client.executeBatch(sql, paramsList.toArray) 44 | // 使用完成后将对象返回给对象池 45 | mySqlPool.returnObject(client) 46 | } 47 | 48 | /** 49 | * 查询所有广告黑名单用户 50 | * 51 | * @return 52 | */ 53 | def findAll(): Array[AdBlacklist] = { 54 | // 将黑名单中的所有数据查询出来 55 | val sql = "SELECT * FROM ad_blacklist" 56 | 57 | val adBlacklists = new ArrayBuffer[AdBlacklist]() 58 | 59 | // 获取对象池单例对象 60 | val mySqlPool = CreateMySqlPool() 61 | // 从对象池中提取对象 62 | val client = mySqlPool.borrowObject() 63 | 64 | // 执行sql查询并且通过处理函数将所有的userid加入array中 65 | client.executeQuery(sql, null, new QueryCallback { 66 | override def process(rs: ResultSet): Unit = { 67 | while (rs.next()) { 68 | val userid = rs.getInt(1).toLong 69 | adBlacklists += AdBlacklist(userid) 70 | } 71 | } 72 | }) 73 | 74 | // 使用完成后将对象返回给对象池 75 | mySqlPool.returnObject(client) 76 | adBlacklists.toArray 77 | } 78 | } 79 | 80 | 81 | /** 82 | * 用户广告点击量DAO实现类 83 | * 84 | */ 85 | object AdUserClickCountDAO { 86 | def updateBatch1(adUserClickCounts: Array[AdUserClickCount]): Unit ={ 87 | val mySqlPool=CreateMySqlPool(); 88 | val client=mySqlPool.borrowObject(); 89 | val buffer=new StringBuilder(); 90 | var sql = "INSERT INTO ad_user_click_count VALUES" 91 | buffer.append(sql); 92 | var index=0; 93 | for (index<-0 to adUserClickCounts.size-1){ 94 | val action=adUserClickCounts(index); 95 | var sql1="("+action.date+","+action.userid+","+action.adid+","+action.clickCount+")"; 96 | buffer.append(sql1); 97 | if (index 0) { 124 | updateAdUserClickCounts += adUserClickCount 125 | } else { 126 | insertAdUserClickCounts += adUserClickCount 127 | } 128 | } 129 | }) 130 | } 131 | 132 | // 执行批量插入 133 | val insertSQL = "INSERT INTO ad_user_click_count VALUES(?,?,?,?)" 134 | val insertParamsList: ArrayBuffer[Array[Any]] = ArrayBuffer[Array[Any]]() 135 | 136 | // 将待插入项全部加入到参数列表中 137 | for (adUserClickCount <- insertAdUserClickCounts) { 138 | insertParamsList += Array[Any](adUserClickCount.date, adUserClickCount.userid, adUserClickCount.adid, adUserClickCount.clickCount) 139 | } 140 | 141 | // 执行批量插入 142 | client.executeBatch(insertSQL, insertParamsList.toArray) 143 | 144 | // 执行批量更新 145 | // clickCount=clickCount + :此处的UPDATE是进行累加 146 | val updateSQL = "UPDATE ad_user_click_count SET clickCount=clickCount + ? WHERE date=? AND userid=? AND adid=?" 147 | val updateParamsList: ArrayBuffer[Array[Any]] = ArrayBuffer[Array[Any]]() 148 | 149 | // 将待更新项全部加入到参数列表中 150 | for (adUserClickCount <- updateAdUserClickCounts) { 151 | updateParamsList += Array[Any](adUserClickCount.clickCount, adUserClickCount.date, adUserClickCount.userid, adUserClickCount.adid) 152 | } 153 | 154 | // 执行批量更新 155 | client.executeBatch(updateSQL, updateParamsList.toArray) 156 | 157 | // 使用完成后将对象返回给对象池 158 | mySqlPool.returnObject(client) 159 | } 160 | 161 | /** 162 | * 根据多个key查询用户广告点击量 163 | * 164 | * @param date 日期 165 | * @param userid 用户id 166 | * @param adid 广告id 167 | * @return 168 | */ 169 | def findClickCountByMultiKey(date: String, userid: Long, adid: Long): Int = { 170 | // 获取对象池单例对象 171 | val mySqlPool = CreateMySqlPool() 172 | // 从对象池中提取对象 173 | val client = mySqlPool.borrowObject() 174 | 175 | val sql = "SELECT clickCount FROM ad_user_click_count " + 176 | "WHERE date=? " + 177 | "AND userid=? " + 178 | "AND adid=?" 179 | 180 | var clickCount = 0 181 | val params = Array[Any](date, userid, adid) 182 | 183 | // 根据多个条件查询指定用户的点击量,将查询结果累加到clickCount中 184 | client.executeQuery(sql, params, new QueryCallback { 185 | override def process(rs: ResultSet): Unit = { 186 | if (rs.next()) { 187 | clickCount = rs.getInt(1) 188 | } 189 | } 190 | }) 191 | // 使用完成后将对象返回给对象池 192 | mySqlPool.returnObject(client) 193 | clickCount 194 | } 195 | } 196 | 197 | 198 | /** 199 | * 广告实时统计DAO实现类 200 | * 201 | * @author Administrator 202 | * 203 | */ 204 | object AdStatDAO { 205 | 206 | def updateBatch(adStats: Array[AdStat]) { 207 | // 获取对象池单例对象 208 | val mySqlPool = CreateMySqlPool() 209 | // 从对象池中提取对象 210 | val client = mySqlPool.borrowObject() 211 | 212 | 213 | // 区分开来哪些是要插入的,哪些是要更新的 214 | val insertAdStats = ArrayBuffer[AdStat]() 215 | val updateAdStats = ArrayBuffer[AdStat]() 216 | 217 | val selectSQL = "SELECT count(*) " + 218 | "FROM ad_stat " + 219 | "WHERE date=? " + 220 | "AND province=? " + 221 | "AND city=? " + 222 | "AND adid=?" 223 | 224 | for (adStat <- adStats) { 225 | 226 | val params = Array[Any](adStat.date, adStat.province, adStat.city, adStat.adid) 227 | // 通过查询结果判断当前项时待插入还是待更新 228 | client.executeQuery(selectSQL, params, new QueryCallback { 229 | override def process(rs: ResultSet): Unit = { 230 | if (rs.next() && rs.getInt(1) > 0) { 231 | updateAdStats += adStat 232 | } else { 233 | insertAdStats += adStat 234 | } 235 | } 236 | }) 237 | } 238 | 239 | // 对于需要插入的数据,执行批量插入操作 240 | val insertSQL = "INSERT INTO ad_stat VALUES(?,?,?,?,?)" 241 | 242 | val insertParamsList: ArrayBuffer[Array[Any]] = ArrayBuffer[Array[Any]]() 243 | 244 | for (adStat <- insertAdStats) { 245 | insertParamsList += Array[Any](adStat.date, adStat.province, adStat.city, adStat.adid, adStat.clickCount) 246 | } 247 | 248 | client.executeBatch(insertSQL, insertParamsList.toArray) 249 | 250 | // 对于需要更新的数据,执行批量更新操作 251 | // 此处的UPDATE是进行覆盖 252 | val updateSQL = "UPDATE ad_stat SET clickCount=? " + 253 | "WHERE date=? " + 254 | "AND province=? " + 255 | "AND city=? " + 256 | "AND adid=?" 257 | 258 | val updateParamsList: ArrayBuffer[Array[Any]] = ArrayBuffer[Array[Any]]() 259 | 260 | for (adStat <- updateAdStats) { 261 | updateParamsList += Array[Any](adStat.clickCount, adStat.date, adStat.province, adStat.city, adStat.adid) 262 | } 263 | 264 | client.executeBatch(updateSQL, updateParamsList.toArray) 265 | 266 | // 使用完成后将对象返回给对象池 267 | mySqlPool.returnObject(client) 268 | } 269 | 270 | } 271 | 272 | 273 | /** 274 | * 各省份top3热门广告DAO实现类 275 | * 276 | * @author Administrator 277 | * 278 | */ 279 | object AdProvinceTop3DAO { 280 | 281 | def updateBatch(adProvinceTop3s: Array[AdProvinceTop3]) { 282 | // 获取对象池单例对象 283 | val mySqlPool = CreateMySqlPool() 284 | // 从对象池中提取对象 285 | val client = mySqlPool.borrowObject() 286 | 287 | // dateProvinces可以实现一次去重 288 | // AdProvinceTop3:date province adid clickCount,由于每条数据由date province adid组成 289 | // 当只取date province时,一定会有重复的情况 290 | val dateProvinces = ArrayBuffer[String]() 291 | 292 | for (adProvinceTop3 <- adProvinceTop3s) { 293 | // 组合新key 294 | val key = adProvinceTop3.date + "_" + adProvinceTop3.province 295 | 296 | // dateProvinces中不包含当前key才添加 297 | // 借此去重 298 | if (!dateProvinces.contains(key)) { 299 | dateProvinces += key 300 | } 301 | } 302 | 303 | // 根据去重后的date和province,进行批量删除操作 304 | // 先将原来的数据全部删除 305 | val deleteSQL = "DELETE FROM ad_province_top3 WHERE date=? AND province=?" 306 | 307 | val deleteParamsList: ArrayBuffer[Array[Any]] = ArrayBuffer[Array[Any]]() 308 | 309 | for (dateProvince <- dateProvinces) { 310 | 311 | val dateProvinceSplited = dateProvince.split("_") 312 | val date = dateProvinceSplited(0) 313 | val province = dateProvinceSplited(1) 314 | 315 | val params = Array[Any](date, province) 316 | deleteParamsList += params 317 | } 318 | 319 | client.executeBatch(deleteSQL, deleteParamsList.toArray) 320 | 321 | // 批量插入传入进来的所有数据 322 | val insertSQL = "INSERT INTO ad_province_top3 VALUES(?,?,?,?)" 323 | 324 | val insertParamsList: ArrayBuffer[Array[Any]] = ArrayBuffer[Array[Any]]() 325 | 326 | // 将传入的数据转化为参数列表 327 | for (adProvinceTop3 <- adProvinceTop3s) { 328 | insertParamsList += Array[Any](adProvinceTop3.date, adProvinceTop3.province, adProvinceTop3.adid, adProvinceTop3.clickCount) 329 | } 330 | 331 | client.executeBatch(insertSQL, insertParamsList.toArray) 332 | 333 | // 使用完成后将对象返回给对象池 334 | mySqlPool.returnObject(client) 335 | } 336 | 337 | } 338 | 339 | 340 | /** 341 | * 广告点击趋势DAO实现类 342 | * 343 | * @author Administrator 344 | * 345 | */ 346 | object AdClickTrendDAO extends Serializable { 347 | 348 | def updateBatch(adClickTrends: Array[AdClickTrend]) { 349 | // 获取对象池单例对象 350 | val mySqlPool = CreateMySqlPool() 351 | // 从对象池中提取对象 352 | val client = mySqlPool.borrowObject() 353 | 354 | // 区分开来哪些是要插入的,哪些是要更新的 355 | val updateAdClickTrends = ArrayBuffer[AdClickTrend]() 356 | val insertAdClickTrends = ArrayBuffer[AdClickTrend]() 357 | 358 | val selectSQL = "SELECT count(*) " + 359 | "FROM ad_click_trend " + 360 | "WHERE date=? " + 361 | "AND hour=? " + 362 | "AND minute=? " + 363 | "AND adid=?" 364 | 365 | for (adClickTrend <- adClickTrends) { 366 | // 通过查询结果判断当前项时待插入还是待更新 367 | val params = Array[Any](adClickTrend.date, adClickTrend.hour, adClickTrend.minute, adClickTrend.adid) 368 | client.executeQuery(selectSQL, params, new QueryCallback { 369 | override def process(rs: ResultSet): Unit = { 370 | if (rs.next() && rs.getInt(1) > 0) { 371 | updateAdClickTrends += adClickTrend 372 | } else { 373 | insertAdClickTrends += adClickTrend 374 | } 375 | } 376 | }) 377 | 378 | } 379 | 380 | // 执行批量更新操作 381 | // 此处的UPDATE是覆盖 382 | val updateSQL = "UPDATE ad_click_trend SET clickCount=? " + 383 | "WHERE date=? " + 384 | "AND hour=? " + 385 | "AND minute=? " + 386 | "AND adid=?" 387 | 388 | val updateParamsList: ArrayBuffer[Array[Any]] = ArrayBuffer[Array[Any]]() 389 | 390 | for (adClickTrend <- updateAdClickTrends) { 391 | updateParamsList += Array[Any](adClickTrend.clickCount, adClickTrend.date, adClickTrend.hour, adClickTrend.minute, adClickTrend.adid) 392 | } 393 | 394 | client.executeBatch(updateSQL, updateParamsList.toArray) 395 | 396 | // 执行批量更新操作 397 | val insertSQL = "INSERT INTO ad_click_trend VALUES(?,?,?,?,?)" 398 | 399 | val insertParamsList: ArrayBuffer[Array[Any]] = ArrayBuffer[Array[Any]]() 400 | 401 | for (adClickTrend <- insertAdClickTrends) { 402 | insertParamsList += Array[Any](adClickTrend.date, adClickTrend.hour, adClickTrend.minute, adClickTrend.adid, adClickTrend.clickCount) 403 | } 404 | 405 | client.executeBatch(insertSQL, insertParamsList.toArray) 406 | 407 | // 使用完成后将对象返回给对象池 408 | mySqlPool.returnObject(client) 409 | } 410 | 411 | } 412 | 413 | -------------------------------------------------------------------------------- /adverStat/src/main/java/scala/advertStat.scala: -------------------------------------------------------------------------------- 1 | package scala 2 | 3 | import java.util.Date 4 | 5 | import commons.conf.ConfigurationManager 6 | import commons.constant.Constants 7 | import commons.model.{AdBlacklist, AdClickTrend, AdProvinceTop3, AdStat, AdUserClickCount} 8 | import commons.utils.DateUtils 9 | import org.apache.kafka.common.serialization.StringDeserializer 10 | import org.apache.spark.SparkConf 11 | import org.apache.spark.sql.SparkSession 12 | import org.apache.spark.sql.catalyst.expressions.{Hour, Minute} 13 | import org.apache.spark.streaming.dstream.DStream 14 | import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies} 15 | import org.apache.spark.streaming.{Duration, Minutes, Seconds, StreamingContext} 16 | 17 | import scala.collection.mutable.{ArrayBuffer, ListBuffer} 18 | 19 | object advertStat { 20 | 21 | 22 | def main(args: Array[String]): Unit = { 23 | val sparkConf = new SparkConf().setAppName("adver").setMaster("local[*]").set("spark.serializer","org.apache.spark.serializer.KryoSerializer"); 24 | val sparkSession = SparkSession.builder().config(sparkConf).getOrCreate(); 25 | sparkSession.sparkContext.setLogLevel("ERROR"); 26 | 27 | // val streamingContext = StreamingContext.getActiveOrCreate(checkpointDir, func) 28 | val ssc = new StreamingContext(sparkSession.sparkContext, Seconds(5)) 29 | 30 | val kafka_brokers = ConfigurationManager.config.getString("kafka.broker.list") 31 | val kafka_topics = ConfigurationManager.config.getString(Constants.KAFKA_TOPICS) 32 | 33 | val kafkaParam = Map( 34 | "bootstrap.servers" -> kafka_brokers, 35 | "key.deserializer" -> classOf[StringDeserializer], 36 | "value.deserializer" -> classOf[StringDeserializer], 37 | "group.id" -> "0", 38 | // auto.offset.reset 39 | // latest: 先去Zookeeper获取offset,如果有,直接使用,如果没有,从最新的数据开始消费; 40 | // earlist: 先去Zookeeper获取offset,如果有,直接使用,如果没有,从最开始的数据开始消费 41 | // none: 先去Zookeeper获取offset,如果有,直接使用,如果没有,直接报错 42 | "auto.offset.reset" -> "latest", 43 | "enable.auto.commit" -> (false:java.lang.Boolean) 44 | ) 45 | 46 | // adRealTimeDStream: DStream[RDD RDD RDD ...] RDD[message] message: key value 47 | val adRealTimeDStream = KafkaUtils.createDirectStream[String, String](ssc, 48 | LocationStrategies.PreferConsistent, 49 | ConsumerStrategies.Subscribe[String, String](Array(kafka_topics), kafkaParam) 50 | ) 51 | val adReadTimeValueDStream=adRealTimeDStream.map(item=>item.value); 52 | val adRealTimeFilterDstream=adReadTimeValueDStream.transform{ 53 | RDDS=>{ 54 | val blackList=AdBlacklistDAO.findAll(); 55 | val black=blackList.map(item=>item.userid); 56 | RDDS.filter{ 57 | log=>{ 58 | val userId=log.split(" ")(3).toLong; 59 | !black.contains(userId); 60 | } 61 | } 62 | } 63 | 64 | } 65 | 66 | ssc.checkpoint("hdfs://hadoop1:9000/sparkStreaming") 67 | adRealTimeFilterDstream.checkpoint(Duration(10000)) 68 | 69 | 70 | 71 | /* 72 | 需求一------实时维护黑名单 73 | */ 74 | 75 | //generateBlackList(adRealTimeFilterDstream); 76 | 77 | /* 78 | 需求二------实时统计各省各区域的广告点击量 79 | */ 80 | //val key2ProvinceCityCountDStream=provinceCityClickStat(adRealTimeFilterDstream) 81 | 82 | /* 83 | 需求三_-------------top3广告 84 | */ 85 | // proveinceTope3Adver(sparkSession,key2ProvinceCityCountDStream) 86 | 87 | 88 | /* 89 | 需求四-------------实时统计近一个小时的广告点击量 90 | */ 91 | getRecentHourClickCount(adRealTimeFilterDstream) 92 | 93 | ssc.start(); 94 | ssc.awaitTermination(); 95 | 96 | } 97 | def getRecentHourClickCount(adRealTimeFilterDstream: DStream[String]) = { 98 | //1.转化key为dateTime_adid 99 | val key2TimeMinute=adRealTimeFilterDstream.map{ 100 | case(log)=>{ 101 | val logSplit = log.split(" ") 102 | val timeStamp = logSplit(0).toLong 103 | // yyyyMMddHHmm 104 | val timeMinute = DateUtils.formatTimeMinute(new Date(timeStamp)) 105 | val adid = logSplit(4).toLong 106 | 107 | val key = timeMinute + "_" + adid 108 | 109 | (key, 1L) 110 | } 111 | } 112 | //2.window operation 统计 113 | val windowKey2=key2TimeMinute.reduceByKeyAndWindow((a:Long, b:Long)=>(a+b), Seconds(10), Seconds(5)); 114 | //3.封装入库 115 | windowKey2.foreachRDD{ 116 | rdd => rdd.foreachPartition{ 117 | // (key, count) 118 | items=> 119 | val trendArray = new ArrayBuffer[AdClickTrend]() 120 | for((key, count) <- items){ 121 | val keySplit = key.split("_") 122 | // yyyyMMddHHmm 123 | val timeMinute = keySplit(0) 124 | val date = timeMinute.substring(0, 8) 125 | val hour = timeMinute.substring(8,10) 126 | val minute = timeMinute.substring(10) 127 | val adid = keySplit(1).toLong 128 | 129 | trendArray += AdClickTrend(date, hour, minute, adid, count) 130 | } 131 | trendArray.foreach(println); 132 | //AdClickTrendDAO.updateBatch(trendArray.toArray) 133 | } 134 | } 135 | } 136 | def proveinceTope3Adver(sparkSession: SparkSession, 137 | key2ProvinceCityCountDStream: DStream[(String, Long)])={ 138 | //1.转化key为date_province_adid,value仍然是原本的count 139 | val key2ProvinceCountDStream=key2ProvinceCityCountDStream.map{ 140 | case (key,count)=>{ 141 | val keySplit = key.split("_") 142 | val date = keySplit(0) 143 | val province = keySplit(1) 144 | val adid = keySplit(3) 145 | (date+"_"+province+"_"+adid,count); 146 | } 147 | } 148 | //2.累增,创建临时表 149 | val key2ProvinceAggCountDStream=key2ProvinceCountDStream.reduceByKey(_+_); 150 | val top3DStream=key2ProvinceAggCountDStream.transform{ 151 | stream=>{ 152 | val temp=stream.map{ 153 | case (key,count)=>{ 154 | val keySplit = key.split("_") 155 | val date = keySplit(0) 156 | val province = keySplit(1) 157 | val adid = keySplit(2).toLong 158 | 159 | (date, province, adid, count) 160 | } 161 | } 162 | import sparkSession.implicits._; 163 | temp.toDF("date","province","adid","count").createOrReplaceTempView("tmp_basic_info"); 164 | 165 | val sql = "select date, province, adid, count from(" + 166 | "select date, province, adid, count, " + 167 | "row_number() over(partition by date,province order by count desc) rank from tmp_basic_info) " + 168 | "where rank <= 3" 169 | sparkSession.sql(sql).rdd; 170 | } 171 | } 172 | //3.数据封装 173 | top3DStream.foreachRDD{ 174 | // rdd : RDD[row] 175 | rdd => 176 | rdd.foreachPartition{ 177 | // items : row 178 | items => 179 | val top3Array = new ArrayBuffer[AdProvinceTop3]() 180 | for(item <- items){ 181 | val date = item.getAs[String]("date") 182 | val province = item.getAs[String]("province") 183 | val adid = item.getAs[Long]("adid") 184 | val count = item.getAs[Long]("count") 185 | 186 | top3Array += AdProvinceTop3(date, province, adid, count) 187 | } 188 | //top3Array.foreach(println); 189 | //AdProvinceTop3DAO.updateBatch(top3Array.toArray) 190 | } 191 | } 192 | 193 | } 194 | def provinceCityClickStat(adRealTimeFilterDStream: DStream[String])={ 195 | val key2ProvinceCityDStream = adRealTimeFilterDStream.map{ 196 | case log => 197 | val logSplit = log.split(" ") 198 | val timeStamp = logSplit(0).toLong 199 | // dateKey : yy-mm-dd 200 | val dateKey = DateUtils.formatDateKey(new Date(timeStamp)) 201 | val province = logSplit(1) 202 | val city = logSplit(2) 203 | val adid = logSplit(4) 204 | 205 | val key = dateKey + "_" + province + "_" + city + "_" + adid 206 | (key, 1L) 207 | } 208 | 209 | //使用updateStateByKey算子,维护数据的更新 210 | val key2StateDStream = key2ProvinceCityDStream.updateStateByKey[Long]{ 211 | (values:Seq[Long], state:Option[Long])=>{ 212 | var newValues=state.getOrElse(0L); 213 | for(v<-values)newValues+=v; 214 | Some(newValues); 215 | } 216 | } 217 | key2StateDStream.foreachRDD{ 218 | rdd => rdd.foreachPartition{ 219 | items => 220 | val adStatArray = new ArrayBuffer[AdStat]() 221 | // key: date province city adid 222 | for((key, count) <- items){ 223 | val keySplit = key.split("_") 224 | val date = keySplit(0) 225 | val province = keySplit(1) 226 | val city = keySplit(2) 227 | val adid = keySplit(3).toLong 228 | 229 | adStatArray += AdStat(date, province, city, adid, count) 230 | } 231 | // AdStatDAO.updateBatch(adStatArray.toArray) 232 | //adStatArray.foreach(println); 233 | } 234 | } 235 | key2StateDStream 236 | } 237 | def generateBlackList(adRealTimeFilterDstream: DStream[String])= { 238 | val key2NumDStream=adRealTimeFilterDstream.map { 239 | case (log)=>{ 240 | val logSplit = log.split(" ") 241 | val timeStamp = logSplit(0).toLong 242 | // yy-mm-dd 243 | val dateKey = DateUtils.formatDateKey(new Date(timeStamp)) 244 | val userId = logSplit(3).toLong 245 | val adid = logSplit(4).toLong 246 | 247 | val key = dateKey + "_" + userId + "_" + adid 248 | 249 | (key, 1L) 250 | } 251 | } 252 | key2NumDStream 253 | //1.先统计每个用户的点击次数 254 | val keyCountStream=key2NumDStream.reduceByKey(_+_); 255 | var flag=0; 256 | //2.更新数据库 257 | keyCountStream.foreachRDD{ 258 | RDDS=>RDDS.foreachPartition{ 259 | part=>{ 260 | val clickCountArray=new ArrayBuffer[AdUserClickCount](); 261 | for((k,v)<-part){ 262 | val keySplit = k.split("_") 263 | val date = keySplit(0) 264 | val userId = keySplit(1).toLong 265 | val adid = keySplit(2).toLong 266 | 267 | clickCountArray += AdUserClickCount(date, userId, adid, v) 268 | } 269 | if (clickCountArray.size>0){ 270 | flag=1; 271 | AdUserClickCountDAO.updateBatch1(clickCountArray.toArray); 272 | } 273 | } 274 | } 275 | } 276 | if (flag==1){ 277 | //3.对keyCountStream中的每个rdd,通过查询数据库,获取点击次数,从而进行过滤操作 278 | val filterKeyCountStream=keyCountStream.filter { 279 | case (key,count)=>{ 280 | val keySplit = key.split("_") 281 | val date = keySplit(0) 282 | val userId = keySplit(1).toLong 283 | val adid = keySplit(2).toLong 284 | 285 | val clickCount = AdUserClickCountDAO.findClickCountByMultiKey(date, userId, adid) 286 | 287 | if(clickCount > 10){ 288 | println("userID:"+userId+"is die"); 289 | true 290 | }else{ 291 | false 292 | } 293 | } 294 | } 295 | //4.将剩下的数据加入黑名单中 296 | val filterBlackListDstream=filterKeyCountStream.map{ 297 | case (key,count)=>{ 298 | key.split("_")(1).toLong 299 | } 300 | }.transform(rdds=>rdds.distinct()); 301 | filterBlackListDstream.foreachRDD{ 302 | rdds=>rdds.foreachPartition{ 303 | part=>{ 304 | val buffer=new ListBuffer[AdBlacklist]; 305 | for(userId<-part){ 306 | buffer+=AdBlacklist(userId); 307 | } 308 | AdBlacklistDAO.insertBatch(buffer.toArray) 309 | 310 | } 311 | } 312 | } 313 | } 314 | 315 | } 316 | 317 | 318 | } 319 | -------------------------------------------------------------------------------- /adverStat/src/main/resources/ad.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Navicat Premium Data Transfer 3 | 4 | Source Server : localhost 5 | Source Server Type : MySQL 6 | Source Server Version : 50720 7 | Source Host : localhost 8 | Source Database : commerce 9 | 10 | Target Server Type : MySQL 11 | Target Server Version : 50720 12 | File Encoding : utf-8 13 | 14 | Date: 11/03/2017 11:23:32 AM 15 | */ 16 | 17 | SET FOREIGN_KEY_CHECKS = 0; 18 | 19 | -- ---------------------------- 20 | -- Table structure for `ad_blacklist` 21 | -- ---------------------------- 22 | DROP TABLE IF EXISTS `ad_blacklist`; 23 | CREATE TABLE `ad_blacklist` ( 24 | `userid` int(11) DEFAULT NULL 25 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 26 | 27 | -- ---------------------------- 28 | -- Table structure for `ad_click_trend` 29 | -- ---------------------------- 30 | DROP TABLE IF EXISTS `ad_click_trend`; 31 | CREATE TABLE `ad_click_trend` ( 32 | `date` varchar(30) DEFAULT NULL, 33 | `hour` varchar(30) DEFAULT NULL, 34 | `minute` varchar(30) DEFAULT NULL, 35 | `adid` int(11) DEFAULT NULL, 36 | `clickCount` int(11) DEFAULT NULL 37 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 38 | 39 | -- ---------------------------- 40 | -- Table structure for `ad_province_top3` 41 | -- ---------------------------- 42 | DROP TABLE IF EXISTS `ad_province_top3`; 43 | CREATE TABLE `ad_province_top3` ( 44 | `date` varchar(30) DEFAULT NULL, 45 | `province` varchar(100) DEFAULT NULL, 46 | `adid` int(11) DEFAULT NULL, 47 | `clickCount` int(11) DEFAULT NULL 48 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 49 | 50 | -- ---------------------------- 51 | -- Table structure for `ad_stat` 52 | -- ---------------------------- 53 | DROP TABLE IF EXISTS `ad_stat`; 54 | CREATE TABLE `ad_stat` ( 55 | `date` varchar(30) DEFAULT NULL, 56 | `province` varchar(100) DEFAULT NULL, 57 | `city` varchar(100) DEFAULT NULL, 58 | `adid` int(11) DEFAULT NULL, 59 | `clickCount` int(11) DEFAULT NULL 60 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 61 | 62 | -- ---------------------------- 63 | -- Table structure for `ad_user_click_count` 64 | -- ---------------------------- 65 | DROP TABLE IF EXISTS `ad_user_click_count`; 66 | CREATE TABLE `ad_user_click_count` ( 67 | `date` varchar(30) DEFAULT NULL, 68 | `userid` int(11) DEFAULT NULL, 69 | `adid` int(11) DEFAULT NULL, 70 | `clickCount` int(11) DEFAULT NULL 71 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 72 | 73 | -------------------------------------------------------------------------------- /adverStat/target/classes/META-INF/adverStat.kotlin_module: -------------------------------------------------------------------------------- 1 |  -------------------------------------------------------------------------------- /adverStat/target/classes/scala/AdBlacklistDAO$$anon$1.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/AdBlacklistDAO$$anon$1.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/AdBlacklistDAO$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/AdBlacklistDAO$.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/AdBlacklistDAO.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/AdBlacklistDAO.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/AdClickTrendDAO$$anon$5.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/AdClickTrendDAO$$anon$5.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/AdClickTrendDAO$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/AdClickTrendDAO$.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/AdClickTrendDAO.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/AdClickTrendDAO.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/AdProvinceTop3DAO$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/AdProvinceTop3DAO$.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/AdProvinceTop3DAO.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/AdProvinceTop3DAO.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/AdStatDAO$$anon$4.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/AdStatDAO$$anon$4.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/AdStatDAO$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/AdStatDAO$.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/AdStatDAO.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/AdStatDAO.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/AdUserClickCountDAO$$anon$2.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/AdUserClickCountDAO$$anon$2.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/AdUserClickCountDAO$$anon$3.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/AdUserClickCountDAO$$anon$3.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/AdUserClickCountDAO$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/AdUserClickCountDAO$.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/AdUserClickCountDAO.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/AdUserClickCountDAO.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/advertStat$$typecreator5$1.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/advertStat$$typecreator5$1.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/advertStat$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/advertStat$.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/advertStat.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/advertStat.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/test$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/test$.class -------------------------------------------------------------------------------- /adverStat/target/classes/scala/test.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/adverStat/target/classes/scala/test.class -------------------------------------------------------------------------------- /commons/commons.iml: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /commons/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | shopAnalyze 7 | org.example 8 | 1.0-SNAPSHOT 9 | 10 | 4.0.0 11 | 12 | commons 13 | 14 | 15 | 16 | org.apache.spark 17 | spark-core_2.12 18 | 2.4.5 19 | 20 | 21 | 22 | 23 | org.apache.commons 24 | commons-configuration2 25 | 2.5 26 | 27 | 28 | 29 | 30 | 31 | 32 | commons-beanutils 33 | commons-beanutils 34 | 1.9.3 35 | 36 | 37 | commons-beanutils 38 | commons-beanutils-core 39 | 1.8.3 40 | 41 | 42 | 43 | 44 | 45 | org.apache.commons 46 | commons-pool2 47 | 2.5.0 48 | 49 | 50 | 51 | org.apache.spark 52 | spark-sql_2.12 53 | 2.4.5 54 | 55 | 56 | org.apache.spark 57 | spark-streaming_2.12 58 | 2.4.5 59 | 60 | 61 | 62 | mysql 63 | mysql-connector-java 64 | 8.0.20 65 | 66 | 67 | 68 | -------------------------------------------------------------------------------- /commons/src/main/java/commons/conf/ConfigurationManager.scala: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2018. Atguigu Inc. All Rights Reserved. 3 | */ 4 | 5 | package commons.conf 6 | 7 | import org.apache.commons.configuration2.{FileBasedConfiguration, PropertiesConfiguration} 8 | import org.apache.commons.configuration2.builder.FileBasedConfigurationBuilder 9 | import org.apache.commons.configuration2.builder.fluent.Parameters 10 | 11 | /** 12 | * 配置工具类,基于文件的配置生成器,会读取resources下的commerce.properties,并返回所有配置信息 13 | * 14 | */ 15 | object ConfigurationManager { 16 | 17 | // 创建用于初始化配置生成器实例的参数对象 18 | private val params = new Parameters() 19 | // FileBasedConfigurationBuilder:产生一个传入的类的实例对象 20 | // FileBasedConfiguration:融合FileBased与Configuration的接口 21 | // PropertiesConfiguration:从一个或者多个文件读取配置的标准配置加载器 22 | // configure():通过params实例初始化配置生成器 23 | // 向FileBasedConfigurationBuilder()中传入一个标准配置加载器类,生成一个加载器类的实例对象,然后通过params参数对其初始化 24 | private val builder = new FileBasedConfigurationBuilder[FileBasedConfiguration](classOf[PropertiesConfiguration]) 25 | .configure(params.properties().setFileName("commerce.properties")) 26 | 27 | // 通过getConfiguration获取配置对象 28 | val config = builder.getConfiguration() 29 | 30 | } 31 | -------------------------------------------------------------------------------- /commons/src/main/java/commons/constant/Constants.scala: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2018. Atguigu Inc. All Rights Reserved. 3 | */ 4 | 5 | package commons.constant 6 | 7 | /** 8 | * 常量接口 9 | */ 10 | object Constants { 11 | 12 | /** 13 | * 项目配置相关的常量 14 | */ 15 | val JDBC_DATASOURCE_SIZE = "jdbc.datasource.size" 16 | val JDBC_URL = "jdbc.url" 17 | val JDBC_USER = "jdbc.user" 18 | val JDBC_PASSWORD = "jdbc.password" 19 | 20 | val KAFKA_TOPICS = "kafka.topics" 21 | 22 | /** 23 | * Spark作业相关的常量 24 | */ 25 | val SPARK_APP_NAME_SESSION = "UserVisitSessionAnalyzeSpark" 26 | val SPARK_APP_NAME_PAGE = "PageOneStepConvertRateSpark" 27 | 28 | /** 29 | * user_visit_action、user_info、product_info表中字段对应的字段名常量 30 | */ 31 | val FIELD_SESSION_ID = "sessionid" 32 | val FIELD_SEARCH_KEYWORDS = "searchKeywords" 33 | val FIELD_CLICK_CATEGORY_IDS = "clickCategoryIds" 34 | val FIELD_AGE = "age" 35 | val FIELD_PROFESSIONAL = "professional" 36 | val FIELD_CITY = "city" 37 | val FIELD_SEX = "sex" 38 | val FIELD_VISIT_LENGTH = "visitLength" 39 | val FIELD_STEP_LENGTH = "stepLength" 40 | val FIELD_START_TIME = "startTime" 41 | val FIELD_CLICK_COUNT = "clickCount" 42 | val FIELD_ORDER_COUNT = "orderCount" 43 | val FIELD_PAY_COUNT = "payCount" 44 | val FIELD_CATEGORY_ID = "categoryid" 45 | 46 | /** 47 | * Spark累加器Key名称常量 48 | */ 49 | val SESSION_COUNT = "session_count" 50 | 51 | val TIME_PERIOD_1s_3s = "1s_3s" 52 | val TIME_PERIOD_4s_6s = "4s_6s" 53 | val TIME_PERIOD_7s_9s = "7s_9s" 54 | val TIME_PERIOD_10s_30s = "10s_30s" 55 | val TIME_PERIOD_30s_60s = "30s_60s" 56 | val TIME_PERIOD_1m_3m = "1m_3m" 57 | val TIME_PERIOD_3m_10m = "3m_10m" 58 | val TIME_PERIOD_10m_30m = "10m_30m" 59 | val TIME_PERIOD_30m = "30m" 60 | 61 | val STEP_PERIOD_1_3 = "1_3" 62 | val STEP_PERIOD_4_6 = "4_6" 63 | val STEP_PERIOD_7_9 = "7_9" 64 | val STEP_PERIOD_10_30 = "10_30" 65 | val STEP_PERIOD_30_60 = "30_60" 66 | val STEP_PERIOD_60 = "60" 67 | 68 | /** 69 | * task.params.json中限制条件对应的常量字段 70 | */ 71 | val TASK_PARAMS = "task.params.json" 72 | val PARAM_START_DATE = "startDate" 73 | val PARAM_END_DATE = "endDate" 74 | val PARAM_START_AGE = "startAge" 75 | val PARAM_END_AGE = "endAge" 76 | val PARAM_PROFESSIONALS = "professionals" 77 | val PARAM_CITIES = "cities" 78 | val PARAM_SEX = "sex" 79 | val PARAM_KEYWORDS = "keywords" 80 | val PARAM_CATEGORY_IDS = "categoryIds" 81 | val PARAM_TARGET_PAGE_FLOW = "targetPageFlow" 82 | 83 | } 84 | -------------------------------------------------------------------------------- /commons/src/main/java/commons/model/DataModel.scala: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2018. Atguigu Inc. All Rights Reserved. 3 | */ 4 | 5 | package commons.model 6 | /** 7 | * 广告黑名单 8 | * 9 | */ 10 | case class AdBlacklist(userid:Long) 11 | 12 | /** 13 | * 用户广告点击量 14 | * @author wuyufei 15 | * 16 | */ 17 | case class AdUserClickCount(date:String, 18 | userid:Long, 19 | adid:Long, 20 | clickCount:Long) 21 | 22 | 23 | /** 24 | * 广告实时统计 25 | * 26 | */ 27 | case class AdStat(date:String, 28 | province:String, 29 | city:String, 30 | adid:Long, 31 | clickCount:Long) 32 | 33 | /** 34 | * 各省top3热门广告 35 | * 36 | */ 37 | case class AdProvinceTop3(date:String, 38 | province:String, 39 | adid:Long, 40 | clickCount:Long) 41 | 42 | /** 43 | * 广告点击趋势 44 | * 45 | */ 46 | case class AdClickTrend(date:String, 47 | hour:String, 48 | minute:String, 49 | adid:Long, 50 | clickCount:Long) 51 | 52 | //***************** 输入表 ********************* 53 | 54 | /**对象池的配置,当对于数据库连接池,用于避免对象创建过程中的损耗 55 | * 用户访问动作表 56 | * 57 | * @param date 用户点击行为的日期 58 | * @param user_id 用户的ID 59 | * @param session_id Session的ID 60 | * @param page_id 某个页面的ID 61 | * @param action_time 点击行为的时间点 62 | * @param search_keyword 用户搜索的关键词 63 | * @param click_category_id 某一个商品品类的ID 64 | * @param click_product_id 某一个商品的ID 65 | * @param order_category_ids 一次订单中所有品类的ID集合 66 | * @param order_product_ids 一次订单中所有商品的ID集合 67 | * @param pay_category_ids 一次支付中所有品类的ID集合 68 | * @param pay_product_ids 一次支付中所有商品的ID集合 69 | * @param city_id 城市ID 70 | */ 71 | case class UserVisitAction(date: String, 72 | user_id: Long, 73 | session_id: String, 74 | page_id: Long, 75 | action_time: String, 76 | search_keyword: String, 77 | click_category_id: Long, 78 | click_product_id: Long, 79 | order_category_ids: String, 80 | order_product_ids: String, 81 | pay_category_ids: String, 82 | pay_product_ids: String, 83 | city_id: Long 84 | ) 85 | 86 | /** 87 | * 用户信息表 88 | * 89 | * @param user_id 用户的ID 90 | * @param username 用户的名称 91 | * @param name 用户的名字 92 | * @param age 用户的年龄 93 | * @param professional 用户的职业 94 | * @param city 用户所在的城市 95 | * @param sex 用户的性别 96 | */ 97 | case class UserInfo(user_id: Long, 98 | username: String, 99 | name: String, 100 | age: Int, 101 | professional: String, 102 | city: String, 103 | sex: String 104 | ) 105 | 106 | /** 107 | * 产品表 108 | * 109 | * @param product_id 商品的ID 110 | * @param product_name 商品的名称 111 | * @param extend_info 商品额外的信息 112 | */ 113 | case class ProductInfo(product_id: Long, 114 | product_name: String, 115 | extend_info: String 116 | ) 117 | /* 118 | * Copyright (c) 2018. Atguigu Inc. All Rights Reserved. 119 | */ 120 | 121 | //***************** 输出表 ********************* 122 | 123 | /** 124 | * 聚合统计表 125 | * 126 | * @param taskid 当前计算批次的ID 127 | * @param session_count 所有Session的总和 128 | * @param visit_length_1s_3s_ratio 1-3sSession访问时长占比 129 | * @param visit_length_4s_6s_ratio 4-6sSession访问时长占比 130 | * @param visit_length_7s_9s_ratio 7-9sSession访问时长占比 131 | * @param visit_length_10s_30s_ratio 10-30sSession访问时长占比 132 | * @param visit_length_30s_60s_ratio 30-60sSession访问时长占比 133 | * @param visit_length_1m_3m_ratio 1-3mSession访问时长占比 134 | * @param visit_length_3m_10m_ratio 3-10mSession访问时长占比 135 | * @param visit_length_10m_30m_ratio 10-30mSession访问时长占比 136 | * @param visit_length_30m_ratio 30mSession访问时长占比 137 | * @param step_length_1_3_ratio 1-3步长占比 138 | * @param step_length_4_6_ratio 4-6步长占比 139 | * @param step_length_7_9_ratio 7-9步长占比 140 | * @param step_length_10_30_ratio 10-30步长占比 141 | * @param step_length_30_60_ratio 30-60步长占比 142 | * @param step_length_60_ratio 大于60步长占比 143 | */ 144 | case class SessionAggrStat(taskid: String, 145 | session_count: Long, 146 | visit_length_1s_3s_ratio: Double, 147 | visit_length_4s_6s_ratio: Double, 148 | visit_length_7s_9s_ratio: Double, 149 | visit_length_10s_30s_ratio: Double, 150 | visit_length_30s_60s_ratio: Double, 151 | visit_length_1m_3m_ratio: Double, 152 | visit_length_3m_10m_ratio: Double, 153 | visit_length_10m_30m_ratio: Double, 154 | visit_length_30m_ratio: Double, 155 | step_length_1_3_ratio: Double, 156 | step_length_4_6_ratio: Double, 157 | step_length_7_9_ratio: Double, 158 | step_length_10_30_ratio: Double, 159 | step_length_30_60_ratio: Double, 160 | step_length_60_ratio: Double 161 | ) 162 | 163 | /** 164 | * Session随机抽取表 165 | * 166 | * @param taskid 当前计算批次的ID 167 | * @param sessionid 抽取的Session的ID 168 | * @param startTime Session的开始时间 169 | * @param searchKeywords Session的查询字段 170 | * @param clickCategoryIds Session点击的类别id集合 171 | */ 172 | case class SessionRandomExtract(taskid:String, 173 | sessionid:String, 174 | startTime:String, 175 | searchKeywords:String, 176 | clickCategoryIds:String) 177 | 178 | /** 179 | * Session随机抽取详细表 180 | * 181 | * @param taskid 当前计算批次的ID 182 | * @param userid 用户的ID 183 | * @param sessionid Session的ID 184 | * @param pageid 某个页面的ID 185 | * @param actionTime 点击行为的时间点 186 | * @param searchKeyword 用户搜索的关键词 187 | * @param clickCategoryId 某一个商品品类的ID 188 | * @param clickProductId 某一个商品的ID 189 | * @param orderCategoryIds 一次订单中所有品类的ID集合 190 | * @param orderProductIds 一次订单中所有商品的ID集合 191 | * @param payCategoryIds 一次支付中所有品类的ID集合 192 | * @param payProductIds 一次支付中所有商品的ID集合 193 | **/ 194 | case class SessionDetail(taskid:String, 195 | userid:Long, 196 | sessionid:String, 197 | pageid:Long, 198 | actionTime:String, 199 | searchKeyword:String, 200 | clickCategoryId:Long, 201 | clickProductId:Long, 202 | orderCategoryIds:String, 203 | orderProductIds:String, 204 | payCategoryIds:String, 205 | payProductIds:String) 206 | 207 | /** 208 | * 品类Top10表 209 | * @param taskid 210 | * @param categoryid 211 | * @param clickCount 212 | * @param orderCount 213 | * @param payCount 214 | */ 215 | case class Top10Category(taskid:String, 216 | categoryid:Long, 217 | clickCount:Long, 218 | orderCount:Long, 219 | payCount:Long) 220 | 221 | /** 222 | * Top10 Session 223 | * @param taskid 224 | * @param categoryid 225 | * @param sessionid 226 | * @param clickCount 227 | */ 228 | case class Top10Session(taskid:String, 229 | categoryid:Long, 230 | sessionid:String, 231 | clickCount:Long) 232 | -------------------------------------------------------------------------------- /commons/src/main/java/commons/pool/PooledMySqlClientFactory.scala: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2018. Atguigu Inc. All Rights Reserved. 3 | */ 4 | 5 | package commons.pool 6 | 7 | import java.sql.{Connection, DriverManager, PreparedStatement, ResultSet} 8 | 9 | import commons.conf.ConfigurationManager 10 | import commons.constant.Constants 11 | import org.apache.commons.pool2.impl.{DefaultPooledObject, GenericObjectPool, GenericObjectPoolConfig} 12 | import org.apache.commons.pool2.{BasePooledObjectFactory, PooledObject} 13 | 14 | // 创建用于处理MySQL查询结果的类的抽象接口 15 | trait QueryCallback { 16 | def process(rs: ResultSet) 17 | } 18 | 19 | /** 20 | * MySQL客户端代理对象 21 | * 22 | * @param jdbcUrl MySQL URL 23 | * @param jdbcUser MySQL 用户 24 | * @param jdbcPassword MySQL 密码 25 | * @param client 默认客户端实现 26 | */ 27 | case class MySqlProxy(jdbcUrl: String, jdbcUser: String, jdbcPassword: String, client: Option[Connection] = None) { 28 | 29 | // 获取客户端连接对象 30 | private val mysqlClient = client getOrElse { 31 | DriverManager.getConnection(jdbcUrl, jdbcUser, jdbcPassword) 32 | } 33 | 34 | /** 35 | * 执行增删改SQL语句 36 | * 37 | * @param sql 38 | * @param params 39 | * @return 影响的行数 40 | */ 41 | def executeUpdate(sql: String, params: Array[Any]): Int = { 42 | var rtn = 0 43 | var pstmt: PreparedStatement = null 44 | 45 | try { 46 | // 第一步:关闭自动提交 47 | mysqlClient.setAutoCommit(false) 48 | // 第二步:根据传入的sql语句创建prepareStatement 49 | pstmt = mysqlClient.prepareStatement(sql) 50 | 51 | // 第三步:为prepareStatement中的每个参数填写数值 52 | if (params != null && params.length > 0) { 53 | for (i <- 0 until params.length) { 54 | pstmt.setObject(i + 1, params(i)) 55 | } 56 | } 57 | // 第四步:执行增删改操作 58 | rtn = pstmt.executeUpdate() 59 | // 第五步:手动提交 60 | mysqlClient.commit() 61 | } catch { 62 | case e: Exception => e.printStackTrace 63 | } 64 | rtn 65 | } 66 | 67 | /** 68 | * 执行查询SQL语句 69 | * 70 | * @param sql 71 | * @param params 72 | */ 73 | def executeQuery(sql: String, params: Array[Any], queryCallback: QueryCallback) { 74 | var pstmt: PreparedStatement = null 75 | var rs: ResultSet = null 76 | 77 | try { 78 | // 第一步:根据传入的sql语句创建prepareStatement 79 | pstmt = mysqlClient.prepareStatement(sql) 80 | 81 | // 第二步:为prepareStatement中的每个参数填写数值 82 | if (params != null && params.length > 0) { 83 | for (i <- 0 until params.length) { 84 | pstmt.setObject(i + 1, params(i)) 85 | } 86 | } 87 | 88 | // 第三步:执行查询操作 89 | rs = pstmt.executeQuery() 90 | // 第四步:处理查询后的结果 91 | queryCallback.process(rs) 92 | } catch { 93 | case e: Exception => e.printStackTrace 94 | } 95 | } 96 | 97 | /** 98 | * 批量执行SQL语句 99 | * 100 | * @param sql 101 | * @param paramsList 102 | * @return 每条SQL语句影响的行数 103 | */ 104 | def executeBatch(sql: String, paramsList: Array[Array[Any]]): Array[Int] = { 105 | var rtn: Array[Int] = null 106 | var pstmt: PreparedStatement = null 107 | try { 108 | // 第一步:关闭自动提交 109 | mysqlClient.setAutoCommit(false) 110 | pstmt = mysqlClient.prepareStatement(sql) 111 | 112 | // 第二步:为prepareStatement中的每个参数填写数值 113 | if (paramsList != null && paramsList.length > 0) { 114 | for (params <- paramsList) { 115 | for (i <- 0 until params.length) { 116 | pstmt.setObject(i + 1, params(i)) 117 | } 118 | pstmt.addBatch() 119 | } 120 | } 121 | 122 | // 第三步:执行批量的SQL语句 123 | rtn = pstmt.executeBatch() 124 | 125 | // 第四步:手动提交 126 | mysqlClient.commit() 127 | } catch { 128 | case e: Exception => e.printStackTrace 129 | } 130 | rtn 131 | } 132 | def executeBatch1(sql:String): Unit ={ 133 | val pstmt = mysqlClient.prepareStatement(sql) 134 | pstmt.execute(); 135 | println("succcess-=-----------------") 136 | } 137 | 138 | // 关闭MySQL客户端 139 | def shutdown(): Unit = mysqlClient.close() 140 | } 141 | 142 | /** 143 | * 将MySqlProxy实例视为对象,MySqlProxy实例的创建使用对象池进行维护 144 | */ 145 | 146 | /** 147 | * 创建自定义工厂类,继承BasePooledObjectFactory工厂类,负责对象的创建、包装和销毁 148 | * @param jdbcUrl 149 | * @param jdbcUser 150 | * @param jdbcPassword 151 | * @param client 152 | */ 153 | class PooledMySqlClientFactory(jdbcUrl: String, jdbcUser: String, jdbcPassword: String, client: Option[Connection] = None) extends BasePooledObjectFactory[MySqlProxy] with Serializable { 154 | 155 | // 用于池来创建对象 156 | override def create(): MySqlProxy = MySqlProxy(jdbcUrl, jdbcUser, jdbcPassword, client) 157 | 158 | // 用于池来包装对象 159 | override def wrap(obj: MySqlProxy): PooledObject[MySqlProxy] = new DefaultPooledObject(obj) 160 | 161 | // 用于池来销毁对象 162 | override def destroyObject(p: PooledObject[MySqlProxy]): Unit = { 163 | p.getObject.shutdown() 164 | super.destroyObject(p) 165 | } 166 | 167 | } 168 | 169 | /** 170 | * 创建MySQL池工具类 171 | */ 172 | object CreateMySqlPool { 173 | 174 | // 加载JDBC驱动,只需要一次 175 | Class.forName("com.mysql.cj.jdbc.Driver") 176 | 177 | // 在org.apache.commons.pool2.impl中预设了三个可以直接使用的对象池:GenericObjectPool、GenericKeyedObjectPool和SoftReferenceObjectPool 178 | // 创建genericObjectPool为GenericObjectPool 179 | // GenericObjectPool的特点是可以设置对象池中的对象特征,包括LIFO方式、最大空闲数、最小空闲数、是否有效性检查等等 180 | private var genericObjectPool: GenericObjectPool[MySqlProxy] = null 181 | 182 | // 伴生对象通过apply完成对象的创建 183 | def apply(): GenericObjectPool[MySqlProxy] = { 184 | // 单例模式 185 | if (this.genericObjectPool == null) { 186 | this.synchronized { 187 | // 获取MySQL配置参数 188 | val jdbcUrl = ConfigurationManager.config.getString(Constants.JDBC_URL) 189 | val jdbcUser = ConfigurationManager.config.getString(Constants.JDBC_USER) 190 | val jdbcPassword = ConfigurationManager.config.getString(Constants.JDBC_PASSWORD) 191 | val size = ConfigurationManager.config.getInt(Constants.JDBC_DATASOURCE_SIZE) 192 | 193 | val pooledFactory = new PooledMySqlClientFactory(jdbcUrl, jdbcUser, jdbcPassword) 194 | val poolConfig = { 195 | // 创建标准对象池配置类的实例 196 | val c = new GenericObjectPoolConfig 197 | // 设置配置对象参数 198 | // 设置最大对象数 199 | c.setMaxTotal(size) 200 | // 设置最大空闲对象数 201 | c.setMaxIdle(size) 202 | c 203 | } 204 | // 对象池的创建需要工厂类和配置类 205 | // 返回一个GenericObjectPool对象池 206 | this.genericObjectPool = new GenericObjectPool[MySqlProxy](pooledFactory, poolConfig) 207 | } 208 | } 209 | genericObjectPool 210 | } 211 | } 212 | 213 | -------------------------------------------------------------------------------- /commons/src/main/java/commons/utils/Utils.scala: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2018. Atguigu Inc. All Rights Reserved. 3 | */ 4 | 5 | package commons.utils 6 | 7 | import java.text.SimpleDateFormat 8 | import java.util.{Calendar, Date} 9 | 10 | 11 | import org.joda.time.DateTime 12 | import org.joda.time.format.DateTimeFormat 13 | 14 | import scala.collection.mutable 15 | 16 | /** 17 | * 日期时间工具类 18 | * 使用Joda实现,使用Java提供的Date会存在线程安全问题 19 | */ 20 | object DateUtils { 21 | 22 | val TIME_FORMAT = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss") 23 | val DATE_FORMAT = DateTimeFormat.forPattern("yyyy-MM-dd") 24 | val DATEKEY_FORMAT = DateTimeFormat.forPattern("yyyyMMdd") 25 | val DATE_TIME_FORMAT = DateTimeFormat.forPattern("yyyyMMddHHmm") 26 | 27 | /** 28 | * 判断一个时间是否在另一个时间之前 29 | * @param time1 第一个时间 30 | * @param time2 第二个时间 31 | * @return 判断结果 32 | */ 33 | def before(time1:String, time2:String):Boolean = { 34 | if(TIME_FORMAT.parseDateTime(time1).isBefore(TIME_FORMAT.parseDateTime(time2))) { 35 | return true 36 | } 37 | false 38 | } 39 | 40 | /** 41 | * 判断一个时间是否在另一个时间之后 42 | * @param time1 第一个时间 43 | * @param time2 第二个时间 44 | * @return 判断结果 45 | */ 46 | def after(time1:String, time2:String):Boolean = { 47 | if(TIME_FORMAT.parseDateTime(time1).isAfter(TIME_FORMAT.parseDateTime(time2))) { 48 | return true 49 | } 50 | false 51 | } 52 | 53 | /** 54 | * 计算时间差值(单位为秒) 55 | * @param time1 时间1 56 | * @param time2 时间2 57 | * @return 差值 58 | */ 59 | def minus(time1:String, time2:String): Int = { 60 | return (TIME_FORMAT.parseDateTime(time1).getMillis - TIME_FORMAT.parseDateTime(time2).getMillis)/1000 toInt 61 | } 62 | 63 | /** 64 | * 获取年月日和小时 65 | * @param datetime 时间(yyyy-MM-dd HH:mm:ss) 66 | * @return 结果(yyyy-MM-dd_HH) 67 | */ 68 | def getDateHour(datetime:String):String = { 69 | val date = datetime.split(" ")(0) 70 | val hourMinuteSecond = datetime.split(" ")(1) 71 | val hour = hourMinuteSecond.split(":")(0) 72 | date + "_" + hour 73 | } 74 | 75 | /** 76 | * 获取当天日期(yyyy-MM-dd) 77 | * @return 当天日期 78 | */ 79 | def getTodayDate():String = { 80 | DateTime.now().toString(DATE_FORMAT) 81 | } 82 | 83 | /** 84 | * 获取昨天的日期(yyyy-MM-dd) 85 | * @return 昨天的日期 86 | */ 87 | def getYesterdayDate():String = { 88 | DateTime.now().minusDays(1).toString(DATE_FORMAT) 89 | } 90 | 91 | /** 92 | * 格式化日期(yyyy-MM-dd) 93 | * @param date Date对象 94 | * @return 格式化后的日期 95 | */ 96 | def formatDate(date:Date):String = { 97 | new DateTime(date).toString(DATE_FORMAT) 98 | } 99 | 100 | /** 101 | * 格式化时间(yyyy-MM-dd HH:mm:ss) 102 | * @param date Date对象 103 | * @return 格式化后的时间 104 | */ 105 | def formatTime(date:Date):String = { 106 | new DateTime(date).toString(TIME_FORMAT) 107 | } 108 | 109 | /** 110 | * 解析时间字符串 111 | * @param time 时间字符串 112 | * @return Date 113 | */ 114 | def parseTime(time:String):Date = { 115 | TIME_FORMAT.parseDateTime(time).toDate 116 | } 117 | 118 | def main(args: Array[String]): Unit = { 119 | print(DateUtils.parseTime("2017-10-31 20:27:53")) 120 | } 121 | 122 | /** 123 | * 格式化日期key 124 | * @param date 125 | * @return 126 | */ 127 | def formatDateKey(date:Date):String = { 128 | new DateTime(date).toString(DATEKEY_FORMAT) 129 | } 130 | 131 | /** 132 | * 格式化日期key 133 | * @return 134 | */ 135 | def parseDateKey(datekey: String ):Date = { 136 | DATEKEY_FORMAT.parseDateTime(datekey).toDate 137 | } 138 | 139 | /** 140 | * 格式化时间,保留到分钟级别 141 | * yyyyMMddHHmm 142 | * @param date 143 | * @return 144 | */ 145 | def formatTimeMinute(date: Date):String = { 146 | new DateTime(date).toString(DATE_TIME_FORMAT) 147 | } 148 | 149 | } 150 | 151 | 152 | 153 | object ParamUtils{ 154 | def getPageFlow(): Array[String] ={ 155 | var z=Array(1,2,3,4,5,6,7); 156 | val r=z.slice(0,z.length-1).zip(z.tail).map{ 157 | case (p1,p2)=>{ 158 | p1+"-"+p2; 159 | } 160 | } 161 | r; 162 | } 163 | } 164 | /** 165 | * 数字格工具类 166 | * 167 | * 168 | */ 169 | object NumberUtils { 170 | 171 | /** 172 | * 格式化小数 173 | * @param scale 四舍五入的位数 174 | * @return 格式化小数 175 | */ 176 | def formatDouble(num:Double, scale:Int):Double = { 177 | val bd = BigDecimal(num) 178 | bd.setScale(scale, BigDecimal.RoundingMode.HALF_UP).doubleValue() 179 | } 180 | 181 | } 182 | 183 | 184 | 185 | 186 | /** 187 | * 字符串工具类 188 | * 189 | */ 190 | object StringUtil { 191 | 192 | /** 193 | * 判断字符串是否为空 194 | * @param str 字符串 195 | * @return 是否为空 196 | */ 197 | def isEmpty(str:String):Boolean = { 198 | str == null || "".equals(str) 199 | } 200 | 201 | /** 202 | * 判断字符串是否不为空 203 | * @param str 字符串 204 | * @return 是否不为空 205 | */ 206 | def isNotEmpty(str:String):Boolean = { 207 | str != null && !"".equals(str) 208 | } 209 | 210 | /** 211 | * 截断字符串两侧的逗号 212 | * @param str 字符串 213 | * @return 字符串 214 | */ 215 | def trimComma(str:String):String = { 216 | var result = "" 217 | if(str.startsWith(",")) { 218 | result = str.substring(1) 219 | } 220 | if(str.endsWith(",")) { 221 | result = str.substring(0, str.length() - 1) 222 | } 223 | result 224 | } 225 | 226 | /** 227 | * 补全两位数字 228 | * @param str 229 | * @return 230 | */ 231 | def fulfuill(str: String):String = { 232 | if(str.length() == 2) { 233 | str 234 | } else { 235 | "0" + str 236 | } 237 | } 238 | 239 | /** 240 | * 从拼接的字符串中提取字段 241 | * @param str 字符串 242 | * @param delimiter 分隔符 243 | * @param field 字段 244 | * @return 字段值 245 | */ 246 | def getFieldFromConcatString(str:String, delimiter:String, field:String):String = { 247 | try { 248 | val fields = str.split(delimiter); 249 | for(concatField <- fields) { 250 | if(concatField.split("=").length == 2) { 251 | val fieldName = concatField.split("=")(0) 252 | val fieldValue = concatField.split("=")(1) 253 | if(fieldName.equals(field)) { 254 | return fieldValue 255 | } 256 | } 257 | } 258 | } catch{ 259 | case e:Exception => e.printStackTrace() 260 | } 261 | null 262 | } 263 | 264 | /** 265 | * 从拼接的字符串中给字段设置值 266 | * @param str 字符串 267 | * @param delimiter 分隔符 268 | * @param field 字段名 269 | * @param newFieldValue 新的field值 270 | * @return 字段值 271 | */ 272 | def setFieldInConcatString(str:String, delimiter:String, field:String, newFieldValue:String):String = { 273 | 274 | val fieldsMap = new mutable.HashMap[String,String]() 275 | 276 | for(fileds <- str.split(delimiter)){ 277 | var arra = fileds.split("=") 278 | if(arra(0).compareTo(field) == 0) 279 | fieldsMap += (field -> newFieldValue) 280 | else 281 | fieldsMap += (arra(0) -> arra(1)) 282 | } 283 | fieldsMap.map(item=> item._1 + "=" + item._2).mkString(delimiter) 284 | } 285 | 286 | } 287 | 288 | 289 | /** 290 | * 校验工具类 291 | * 292 | */ 293 | object ValidUtils { 294 | 295 | /** 296 | * 校验数据中的指定字段,是否在指定范围内 297 | * @param data 数据 298 | * @param dataField 数据字段 299 | * @param parameter 参数 300 | * @param startParamField 起始参数字段 301 | * @param endParamField 结束参数字段 302 | * @return 校验结果 303 | */ 304 | def between(data:String, dataField:String, parameter:String, startParamField:String, endParamField:String):Boolean = { 305 | 306 | val startParamFieldStr = StringUtil.getFieldFromConcatString(parameter, "\\|", startParamField) 307 | val endParamFieldStr = StringUtil.getFieldFromConcatString(parameter, "\\|", endParamField) 308 | if(startParamFieldStr == null || endParamFieldStr == null) { 309 | return true 310 | } 311 | 312 | val startParamFieldValue = startParamFieldStr.toInt 313 | val endParamFieldValue = endParamFieldStr.toInt 314 | 315 | val dataFieldStr = StringUtil.getFieldFromConcatString(data, "\\|", dataField) 316 | if(dataFieldStr != null) { 317 | val dataFieldValue = dataFieldStr.toInt 318 | if(dataFieldValue >= startParamFieldValue && dataFieldValue <= endParamFieldValue) { 319 | return true 320 | } else { 321 | return false 322 | } 323 | } 324 | false 325 | } 326 | 327 | /** 328 | * 校验数据中的指定字段,是否有值与参数字段的值相同 329 | * @param data 数据 330 | * @param dataField 数据字段 331 | * @param parameter 参数 332 | * @param paramField 参数字段 333 | * @return 校验结果 334 | */ 335 | def in(data:String, dataField:String, parameter:String, paramField:String):Boolean = { 336 | val paramFieldValue = StringUtil.getFieldFromConcatString(parameter, "\\|", paramField) 337 | if(paramFieldValue == null) { 338 | return true 339 | } 340 | val paramFieldValueSplited = paramFieldValue.split(",") 341 | 342 | val dataFieldValue = StringUtil.getFieldFromConcatString(data, "\\|", dataField) 343 | if(dataFieldValue != null && dataFieldValue != "-1") { 344 | val dataFieldValueSplited = dataFieldValue.split(",") 345 | 346 | for(singleDataFieldValue <- dataFieldValueSplited) { 347 | for(singleParamFieldValue <- paramFieldValueSplited) { 348 | if(singleDataFieldValue.compareTo(singleParamFieldValue) ==0) { 349 | return true 350 | } 351 | } 352 | } 353 | } 354 | false 355 | } 356 | 357 | /** 358 | * 校验数据中的指定字段,是否在指定范围内 359 | * @param data 数据 360 | * @param dataField 数据字段 361 | * @param parameter 参数 362 | * @param paramField 参数字段 363 | * @return 校验结果 364 | */ 365 | def equal(data:String, dataField:String, parameter:String, paramField:String):Boolean = { 366 | val paramFieldValue = StringUtil.getFieldFromConcatString(parameter, "\\|", paramField) 367 | if(paramFieldValue == null) { 368 | return true 369 | } 370 | 371 | val dataFieldValue = StringUtil.getFieldFromConcatString(data, "\\|", dataField) 372 | if(dataFieldValue != null) { 373 | if(dataFieldValue.compareTo(paramFieldValue) == 0) { 374 | return true 375 | } 376 | } 377 | false 378 | } 379 | 380 | } -------------------------------------------------------------------------------- /commons/src/main/resources/commerce.properties: -------------------------------------------------------------------------------- 1 | 2 | # jbdc配置 3 | jdbc.datasource.size=10 4 | jdbc.url=jdbc:mysql://localhost:3306/commerce?useUnicode=true&characterEncoding=utf8&serverTimezone=UTC 5 | jdbc.user=root 6 | jdbc.password=123456 7 | 8 | # 指定分析的用户范围 9 | # 可以使用的属性如下: 10 | # startDate: 格式: yyyy-MM-DD [必选] 11 | # endDate: 格式: yyyy-MM-DD [必选] 12 | # startAge: 范围: 0 - 59 13 | # endAge: 范围: 0 - 59 14 | # professionals: 范围:professionals[0 - 59] 15 | # cities: 0 - 9 ((0,"北京","华北"),(1,"上海","华东"),(2,"南京","华东"),(3,"广州","华南"),(4,"三亚","华南"),(5,"武汉","华中"),(6,"长沙","华中"),(7,"西安","西北"),(8,"成都","西南"),(9,"哈尔滨","东北")) 16 | # sex: 范围: 0 - 1 17 | # keywords: 范围: ("火锅", "蛋糕", "重庆辣子鸡", "重庆小面", "呷哺呷哺", "新辣道鱼火锅", "国贸大厦", "太古商场", "日本料理", "温泉") 18 | # categoryIds:0 - 99,以逗号分隔 19 | # targetPageFlow: 0 - 99, 以逗号分隔 20 | task.params.json={startDate:"2020-05-21", \ 21 | endDate:"2020-05-24", \ 22 | startAge: 20, \ 23 | endAge: 50, \ 24 | professionals: "", \ 25 | cities: "", \ 26 | sex:"", \ 27 | keywords:"", \ 28 | categoryIds:"", \ 29 | targetPageFlow:"1,2,3,4,5,6,7"} 30 | 31 | # Kafka配置 32 | kafka.broker.list=121.199.16.65:9092 33 | kafka.topics=AdRealTimeLog1 -------------------------------------------------------------------------------- /commons/src/main/resources/log4j.properties: -------------------------------------------------------------------------------- 1 | log4j.rootLogger=info, stdout 2 | log4j.appender.stdout=org.apache.log4j.ConsoleAppender 3 | log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 4 | log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} %5p --- [%50t] %-80c(line:%5L) : %m%n 5 | 6 | log4j.appender.R=org.apache.log4j.RollingFileAppender 7 | log4j.appender.R.File=../log/agent.log 8 | log4j.appender.R.MaxFileSize=1024KB 9 | log4j.appender.R.MaxBackupIndex=1 10 | 11 | log4j.appender.R.layout=org.apache.log4j.PatternLayout 12 | log4j.appender.R.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} %5p --- [%50t] %-80c(line:%6L) : %m%n -------------------------------------------------------------------------------- /commons/target/classes/commerce.properties: -------------------------------------------------------------------------------- 1 | 2 | # jbdc配置 3 | jdbc.datasource.size=10 4 | jdbc.url=jdbc:mysql://localhost:3306/commerce?useUnicode=true&characterEncoding=utf8&serverTimezone=UTC 5 | jdbc.user=root 6 | jdbc.password=123456 7 | 8 | # 指定分析的用户范围 9 | # 可以使用的属性如下: 10 | # startDate: 格式: yyyy-MM-DD [必选] 11 | # endDate: 格式: yyyy-MM-DD [必选] 12 | # startAge: 范围: 0 - 59 13 | # endAge: 范围: 0 - 59 14 | # professionals: 范围:professionals[0 - 59] 15 | # cities: 0 - 9 ((0,"北京","华北"),(1,"上海","华东"),(2,"南京","华东"),(3,"广州","华南"),(4,"三亚","华南"),(5,"武汉","华中"),(6,"长沙","华中"),(7,"西安","西北"),(8,"成都","西南"),(9,"哈尔滨","东北")) 16 | # sex: 范围: 0 - 1 17 | # keywords: 范围: ("火锅", "蛋糕", "重庆辣子鸡", "重庆小面", "呷哺呷哺", "新辣道鱼火锅", "国贸大厦", "太古商场", "日本料理", "温泉") 18 | # categoryIds:0 - 99,以逗号分隔 19 | # targetPageFlow: 0 - 99, 以逗号分隔 20 | task.params.json={startDate:"2020-05-21", \ 21 | endDate:"2020-05-24", \ 22 | startAge: 20, \ 23 | endAge: 50, \ 24 | professionals: "", \ 25 | cities: "", \ 26 | sex:"", \ 27 | keywords:"", \ 28 | categoryIds:"", \ 29 | targetPageFlow:"1,2,3,4,5,6,7"} 30 | 31 | # Kafka配置 32 | kafka.broker.list=121.199.16.65:9092 33 | kafka.topics=AdRealTimeLog1 -------------------------------------------------------------------------------- /commons/target/classes/commons/conf/ConfigurationManager$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/conf/ConfigurationManager$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/conf/ConfigurationManager.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/conf/ConfigurationManager.class -------------------------------------------------------------------------------- /commons/target/classes/commons/constant/Constants$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/constant/Constants$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/constant/Constants.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/constant/Constants.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/AdBlacklist$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/AdBlacklist$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/AdBlacklist.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/AdBlacklist.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/AdClickTrend$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/AdClickTrend$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/AdClickTrend.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/AdClickTrend.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/AdProvinceTop3$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/AdProvinceTop3$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/AdProvinceTop3.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/AdProvinceTop3.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/AdStat$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/AdStat$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/AdStat.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/AdStat.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/AdUserClickCount$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/AdUserClickCount$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/AdUserClickCount.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/AdUserClickCount.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/ProductInfo$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/ProductInfo$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/ProductInfo.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/ProductInfo.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/SessionAggrStat$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/SessionAggrStat$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/SessionAggrStat.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/SessionAggrStat.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/SessionDetail$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/SessionDetail$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/SessionDetail.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/SessionDetail.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/SessionRandomExtract$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/SessionRandomExtract$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/SessionRandomExtract.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/SessionRandomExtract.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/Top10Category$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/Top10Category$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/Top10Category.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/Top10Category.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/Top10Session$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/Top10Session$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/Top10Session.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/Top10Session.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/UserInfo$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/UserInfo$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/UserInfo.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/UserInfo.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/UserVisitAction$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/UserVisitAction$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/model/UserVisitAction.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/model/UserVisitAction.class -------------------------------------------------------------------------------- /commons/target/classes/commons/pool/CreateMySqlPool$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/pool/CreateMySqlPool$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/pool/CreateMySqlPool.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/pool/CreateMySqlPool.class -------------------------------------------------------------------------------- /commons/target/classes/commons/pool/MySqlProxy$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/pool/MySqlProxy$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/pool/MySqlProxy.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/pool/MySqlProxy.class -------------------------------------------------------------------------------- /commons/target/classes/commons/pool/PooledMySqlClientFactory$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/pool/PooledMySqlClientFactory$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/pool/PooledMySqlClientFactory.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/pool/PooledMySqlClientFactory.class -------------------------------------------------------------------------------- /commons/target/classes/commons/pool/QueryCallback.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/pool/QueryCallback.class -------------------------------------------------------------------------------- /commons/target/classes/commons/utils/DateUtils$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/utils/DateUtils$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/utils/DateUtils.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/utils/DateUtils.class -------------------------------------------------------------------------------- /commons/target/classes/commons/utils/NumberUtils$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/utils/NumberUtils$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/utils/NumberUtils.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/utils/NumberUtils.class -------------------------------------------------------------------------------- /commons/target/classes/commons/utils/ParamUtils$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/utils/ParamUtils$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/utils/ParamUtils.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/utils/ParamUtils.class -------------------------------------------------------------------------------- /commons/target/classes/commons/utils/StringUtil$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/utils/StringUtil$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/utils/StringUtil.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/utils/StringUtil.class -------------------------------------------------------------------------------- /commons/target/classes/commons/utils/ValidUtils$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/utils/ValidUtils$.class -------------------------------------------------------------------------------- /commons/target/classes/commons/utils/ValidUtils.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/commons/target/classes/commons/utils/ValidUtils.class -------------------------------------------------------------------------------- /commons/target/classes/log4j.properties: -------------------------------------------------------------------------------- 1 | log4j.rootLogger=info, stdout 2 | log4j.appender.stdout=org.apache.log4j.ConsoleAppender 3 | log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 4 | log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} %5p --- [%50t] %-80c(line:%5L) : %m%n 5 | 6 | log4j.appender.R=org.apache.log4j.RollingFileAppender 7 | log4j.appender.R.File=../log/agent.log 8 | log4j.appender.R.MaxFileSize=1024KB 9 | log4j.appender.R.MaxBackupIndex=1 10 | 11 | log4j.appender.R.layout=org.apache.log4j.PatternLayout 12 | log4j.appender.R.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} %5p --- [%50t] %-80c(line:%6L) : %m%n -------------------------------------------------------------------------------- /commons/target/classes/test/DataModel.scala: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2017. Atguigu Inc. All Rights Reserved. 3 | * Date: 10/29/17 11:14 AM. 4 | * Author: wuyufei. 5 | */ 6 | 7 | /** 8 | * 广告黑名单 9 | * 10 | */ 11 | case class AdBlacklist(userid:Long) 12 | 13 | /** 14 | * 用户广告点击量 15 | * @author wuyufei 16 | * 17 | */ 18 | case class AdUserClickCount(date:String, 19 | userid:Long, 20 | adid:Long, 21 | clickCount:Long) 22 | 23 | 24 | /** 25 | * 广告实时统计 26 | * 27 | */ 28 | case class AdStat(date:String, 29 | province:String, 30 | city:String, 31 | adid:Long, 32 | clickCount:Long) 33 | 34 | /** 35 | * 各省top3热门广告 36 | * 37 | */ 38 | case class AdProvinceTop3(date:String, 39 | province:String, 40 | adid:Long, 41 | clickCount:Long) 42 | 43 | /** 44 | * 广告点击趋势 45 | * 46 | */ 47 | case class AdClickTrend(date:String, 48 | hour:String, 49 | minute:String, 50 | adid:Long, 51 | clickCount:Long) -------------------------------------------------------------------------------- /commons/target/classes/test/JdbcHelper.scala: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2017. Atguigu Inc. All Rights Reserved. 3 | * Date: 11/1/17 3:40 PM. 4 | * Author: wuyufei. 5 | */ 6 | 7 | import java.sql.ResultSet 8 | 9 | import commons.pool.{CreateMySqlPool, QueryCallback} 10 | 11 | import scala.collection.mutable.ArrayBuffer 12 | 13 | /** 14 | * 用户黑名单DAO类 15 | */ 16 | object AdBlacklistDAO { 17 | 18 | /** 19 | * 批量插入广告黑名单用户 20 | * 21 | * @param adBlacklists 22 | */ 23 | def insertBatch(adBlacklists: Array[AdBlacklist]) { 24 | // 批量插入 25 | val sql = "INSERT INTO ad_blacklist VALUES(?)" 26 | 27 | val paramsList = new ArrayBuffer[Array[Any]]() 28 | 29 | // 向paramsList添加userId 30 | for (adBlacklist <- adBlacklists) { 31 | val params: Array[Any] = Array(adBlacklist.userid) 32 | paramsList += params 33 | } 34 | // 获取对象池单例对象 35 | val mySqlPool = CreateMySqlPool() 36 | // 从对象池中提取对象 37 | val client = mySqlPool.borrowObject() 38 | 39 | // 执行批量插入操作 40 | client.executeBatch(sql, paramsList.toArray) 41 | // 使用完成后将对象返回给对象池 42 | mySqlPool.returnObject(client) 43 | } 44 | 45 | /** 46 | * 查询所有广告黑名单用户 47 | * 48 | * @return 49 | */ 50 | def findAll(): Array[AdBlacklist] = { 51 | // 将黑名单中的所有数据查询出来 52 | val sql = "SELECT * FROM ad_blacklist" 53 | 54 | val adBlacklists = new ArrayBuffer[AdBlacklist]() 55 | 56 | // 获取对象池单例对象 57 | val mySqlPool = CreateMySqlPool() 58 | // 从对象池中提取对象 59 | val client = mySqlPool.borrowObject() 60 | 61 | // 执行sql查询并且通过处理函数将所有的userid加入array中 62 | client.executeQuery(sql, null, new QueryCallback { 63 | override def process(rs: ResultSet): Unit = { 64 | while (rs.next()) { 65 | val userid = rs.getInt(1).toLong 66 | adBlacklists += AdBlacklist(userid) 67 | } 68 | } 69 | }) 70 | 71 | // 使用完成后将对象返回给对象池 72 | mySqlPool.returnObject(client) 73 | adBlacklists.toArray 74 | } 75 | } 76 | 77 | 78 | /** 79 | * 用户广告点击量DAO实现类 80 | * 81 | */ 82 | object AdUserClickCountDAO { 83 | 84 | def updateBatch(adUserClickCounts: Array[AdUserClickCount]) { 85 | // 获取对象池单例对象 86 | val mySqlPool = CreateMySqlPool() 87 | // 从对象池中提取对象 88 | val client = mySqlPool.borrowObject() 89 | 90 | // 首先对用户广告点击量进行分类,分成待插入的和待更新的 91 | val insertAdUserClickCounts = ArrayBuffer[AdUserClickCount]() 92 | val updateAdUserClickCounts = ArrayBuffer[AdUserClickCount]() 93 | 94 | val selectSQL = "SELECT count(*) FROM ad_user_click_count WHERE date=? AND userid=? AND adid=? " 95 | 96 | for (adUserClickCount <- adUserClickCounts) { 97 | 98 | val selectParams: Array[Any] = Array(adUserClickCount.date, adUserClickCount.userid, adUserClickCount.adid) 99 | // 根据传入的用户点击次数统计数据从已有的ad_user_click_count中进行查询 100 | client.executeQuery(selectSQL, selectParams, new QueryCallback { 101 | override def process(rs: ResultSet): Unit = { 102 | // 如果能查询到并且点击次数大于0,则认为是待更新项 103 | if (rs.next() && rs.getInt(1) > 0) { 104 | updateAdUserClickCounts += adUserClickCount 105 | } else { 106 | insertAdUserClickCounts += adUserClickCount 107 | } 108 | } 109 | }) 110 | } 111 | 112 | // 执行批量插入 113 | val insertSQL = "INSERT INTO ad_user_click_count VALUES(?,?,?,?)" 114 | val insertParamsList: ArrayBuffer[Array[Any]] = ArrayBuffer[Array[Any]]() 115 | 116 | // 将待插入项全部加入到参数列表中 117 | for (adUserClickCount <- insertAdUserClickCounts) { 118 | insertParamsList += Array[Any](adUserClickCount.date, adUserClickCount.userid, adUserClickCount.adid, adUserClickCount.clickCount) 119 | } 120 | 121 | // 执行批量插入 122 | client.executeBatch(insertSQL, insertParamsList.toArray) 123 | 124 | // 执行批量更新 125 | // clickCount=clickCount + :此处的UPDATE是进行累加 126 | val updateSQL = "UPDATE ad_user_click_count SET clickCount=clickCount + ? WHERE date=? AND userid=? AND adid=?" 127 | val updateParamsList: ArrayBuffer[Array[Any]] = ArrayBuffer[Array[Any]]() 128 | 129 | // 将待更新项全部加入到参数列表中 130 | for (adUserClickCount <- updateAdUserClickCounts) { 131 | updateParamsList += Array[Any](adUserClickCount.clickCount, adUserClickCount.date, adUserClickCount.userid, adUserClickCount.adid) 132 | } 133 | 134 | // 执行批量更新 135 | client.executeBatch(updateSQL, updateParamsList.toArray) 136 | 137 | // 使用完成后将对象返回给对象池 138 | mySqlPool.returnObject(client) 139 | } 140 | 141 | /** 142 | * 根据多个key查询用户广告点击量 143 | * 144 | * @param date 日期 145 | * @param userid 用户id 146 | * @param adid 广告id 147 | * @return 148 | */ 149 | def findClickCountByMultiKey(date: String, userid: Long, adid: Long): Int = { 150 | // 获取对象池单例对象 151 | val mySqlPool = CreateMySqlPool() 152 | // 从对象池中提取对象 153 | val client = mySqlPool.borrowObject() 154 | 155 | val sql = "SELECT clickCount FROM ad_user_click_count " + 156 | "WHERE date=? " + 157 | "AND userid=? " + 158 | "AND adid=?" 159 | 160 | var clickCount = 0 161 | val params = Array[Any](date, userid, adid) 162 | 163 | // 根据多个条件查询指定用户的点击量,将查询结果累加到clickCount中 164 | client.executeQuery(sql, params, new QueryCallback { 165 | override def process(rs: ResultSet): Unit = { 166 | if (rs.next()) { 167 | clickCount = rs.getInt(1) 168 | } 169 | } 170 | }) 171 | // 使用完成后将对象返回给对象池 172 | mySqlPool.returnObject(client) 173 | clickCount 174 | } 175 | } 176 | 177 | 178 | /** 179 | * 广告实时统计DAO实现类 180 | * 181 | * @author Administrator 182 | * 183 | */ 184 | object AdStatDAO { 185 | 186 | def updateBatch(adStats: Array[AdStat]) { 187 | // 获取对象池单例对象 188 | val mySqlPool = CreateMySqlPool() 189 | // 从对象池中提取对象 190 | val client = mySqlPool.borrowObject() 191 | 192 | 193 | // 区分开来哪些是要插入的,哪些是要更新的 194 | val insertAdStats = ArrayBuffer[AdStat]() 195 | val updateAdStats = ArrayBuffer[AdStat]() 196 | 197 | val selectSQL = "SELECT count(*) " + 198 | "FROM ad_stat " + 199 | "WHERE date=? " + 200 | "AND province=? " + 201 | "AND city=? " + 202 | "AND adid=?" 203 | 204 | for (adStat <- adStats) { 205 | 206 | val params = Array[Any](adStat.date, adStat.province, adStat.city, adStat.adid) 207 | // 通过查询结果判断当前项时待插入还是待更新 208 | client.executeQuery(selectSQL, params, new QueryCallback { 209 | override def process(rs: ResultSet): Unit = { 210 | if (rs.next() && rs.getInt(1) > 0) { 211 | updateAdStats += adStat 212 | } else { 213 | insertAdStats += adStat 214 | } 215 | } 216 | }) 217 | } 218 | 219 | // 对于需要插入的数据,执行批量插入操作 220 | val insertSQL = "INSERT INTO ad_stat VALUES(?,?,?,?,?)" 221 | 222 | val insertParamsList: ArrayBuffer[Array[Any]] = ArrayBuffer[Array[Any]]() 223 | 224 | for (adStat <- insertAdStats) { 225 | insertParamsList += Array[Any](adStat.date, adStat.province, adStat.city, adStat.adid, adStat.clickCount) 226 | } 227 | 228 | client.executeBatch(insertSQL, insertParamsList.toArray) 229 | 230 | // 对于需要更新的数据,执行批量更新操作 231 | // 此处的UPDATE是进行覆盖 232 | val updateSQL = "UPDATE ad_stat SET clickCount=? " + 233 | "WHERE date=? " + 234 | "AND province=? " + 235 | "AND city=? " + 236 | "AND adid=?" 237 | 238 | val updateParamsList: ArrayBuffer[Array[Any]] = ArrayBuffer[Array[Any]]() 239 | 240 | for (adStat <- updateAdStats) { 241 | updateParamsList += Array[Any](adStat.clickCount, adStat.date, adStat.province, adStat.city, adStat.adid) 242 | } 243 | 244 | client.executeBatch(updateSQL, updateParamsList.toArray) 245 | 246 | // 使用完成后将对象返回给对象池 247 | mySqlPool.returnObject(client) 248 | } 249 | 250 | } 251 | 252 | 253 | /** 254 | * 各省份top3热门广告DAO实现类 255 | * 256 | * @author Administrator 257 | * 258 | */ 259 | object AdProvinceTop3DAO { 260 | 261 | def updateBatch(adProvinceTop3s: Array[AdProvinceTop3]) { 262 | // 获取对象池单例对象 263 | val mySqlPool = CreateMySqlPool() 264 | // 从对象池中提取对象 265 | val client = mySqlPool.borrowObject() 266 | 267 | // dateProvinces可以实现一次去重 268 | // AdProvinceTop3:date province adid clickCount,由于每条数据由date province adid组成 269 | // 当只取date province时,一定会有重复的情况 270 | val dateProvinces = ArrayBuffer[String]() 271 | 272 | for (adProvinceTop3 <- adProvinceTop3s) { 273 | // 组合新key 274 | val key = adProvinceTop3.date + "_" + adProvinceTop3.province 275 | 276 | // dateProvinces中不包含当前key才添加 277 | // 借此去重 278 | if (!dateProvinces.contains(key)) { 279 | dateProvinces += key 280 | } 281 | } 282 | 283 | // 根据去重后的date和province,进行批量删除操作 284 | // 先将原来的数据全部删除 285 | val deleteSQL = "DELETE FROM ad_province_top3 WHERE date=? AND province=?" 286 | 287 | val deleteParamsList: ArrayBuffer[Array[Any]] = ArrayBuffer[Array[Any]]() 288 | 289 | for (dateProvince <- dateProvinces) { 290 | 291 | val dateProvinceSplited = dateProvince.split("_") 292 | val date = dateProvinceSplited(0) 293 | val province = dateProvinceSplited(1) 294 | 295 | val params = Array[Any](date, province) 296 | deleteParamsList += params 297 | } 298 | 299 | client.executeBatch(deleteSQL, deleteParamsList.toArray) 300 | 301 | // 批量插入传入进来的所有数据 302 | val insertSQL = "INSERT INTO ad_province_top3 VALUES(?,?,?,?)" 303 | 304 | val insertParamsList: ArrayBuffer[Array[Any]] = ArrayBuffer[Array[Any]]() 305 | 306 | // 将传入的数据转化为参数列表 307 | for (adProvinceTop3 <- adProvinceTop3s) { 308 | insertParamsList += Array[Any](adProvinceTop3.date, adProvinceTop3.province, adProvinceTop3.adid, adProvinceTop3.clickCount) 309 | } 310 | 311 | client.executeBatch(insertSQL, insertParamsList.toArray) 312 | 313 | // 使用完成后将对象返回给对象池 314 | mySqlPool.returnObject(client) 315 | } 316 | 317 | } 318 | 319 | 320 | /** 321 | * 广告点击趋势DAO实现类 322 | * 323 | * @author Administrator 324 | * 325 | */ 326 | object AdClickTrendDAO { 327 | 328 | def updateBatch(adClickTrends: Array[AdClickTrend]) { 329 | // 获取对象池单例对象 330 | val mySqlPool = CreateMySqlPool() 331 | // 从对象池中提取对象 332 | val client = mySqlPool.borrowObject() 333 | 334 | // 区分开来哪些是要插入的,哪些是要更新的 335 | val updateAdClickTrends = ArrayBuffer[AdClickTrend]() 336 | val insertAdClickTrends = ArrayBuffer[AdClickTrend]() 337 | 338 | val selectSQL = "SELECT count(*) " + 339 | "FROM ad_click_trend " + 340 | "WHERE date=? " + 341 | "AND hour=? " + 342 | "AND minute=? " + 343 | "AND adid=?" 344 | 345 | for (adClickTrend <- adClickTrends) { 346 | // 通过查询结果判断当前项时待插入还是待更新 347 | val params = Array[Any](adClickTrend.date, adClickTrend.hour, adClickTrend.minute, adClickTrend.adid) 348 | client.executeQuery(selectSQL, params, new QueryCallback { 349 | override def process(rs: ResultSet): Unit = { 350 | if (rs.next() && rs.getInt(1) > 0) { 351 | updateAdClickTrends += adClickTrend 352 | } else { 353 | insertAdClickTrends += adClickTrend 354 | } 355 | } 356 | }) 357 | 358 | } 359 | 360 | // 执行批量更新操作 361 | // 此处的UPDATE是覆盖 362 | val updateSQL = "UPDATE ad_click_trend SET clickCount=? " + 363 | "WHERE date=? " + 364 | "AND hour=? " + 365 | "AND minute=? " + 366 | "AND adid=?" 367 | 368 | val updateParamsList: ArrayBuffer[Array[Any]] = ArrayBuffer[Array[Any]]() 369 | 370 | for (adClickTrend <- updateAdClickTrends) { 371 | updateParamsList += Array[Any](adClickTrend.clickCount, adClickTrend.date, adClickTrend.hour, adClickTrend.minute, adClickTrend.adid) 372 | } 373 | 374 | client.executeBatch(updateSQL, updateParamsList.toArray) 375 | 376 | // 执行批量更新操作 377 | val insertSQL = "INSERT INTO ad_click_trend VALUES(?,?,?,?,?)" 378 | 379 | val insertParamsList: ArrayBuffer[Array[Any]] = ArrayBuffer[Array[Any]]() 380 | 381 | for (adClickTrend <- insertAdClickTrends) { 382 | insertParamsList += Array[Any](adClickTrend.date, adClickTrend.hour, adClickTrend.minute, adClickTrend.adid, adClickTrend.clickCount) 383 | } 384 | 385 | client.executeBatch(insertSQL, insertParamsList.toArray) 386 | 387 | // 使用完成后将对象返回给对象池 388 | mySqlPool.returnObject(client) 389 | } 390 | 391 | } 392 | 393 | -------------------------------------------------------------------------------- /commons/target/classes/test/PageConvertStat.scala: -------------------------------------------------------------------------------- 1 | import java.util.UUID 2 | 3 | import commons.conf.ConfigurationManager 4 | import commons.constant.Constants 5 | import commons.model.UserVisitAction 6 | import commons.utils.{DateUtils, ParamUtils} 7 | import net.sf.json.JSONObject 8 | import org.apache.spark.SparkConf 9 | import org.apache.spark.sql.{SaveMode, SparkSession} 10 | 11 | import scala.collection.mutable 12 | 13 | object PageConvertStat { 14 | 15 | def main(args: Array[String]): Unit = { 16 | 17 | // 获取任务限制条件 18 | val jsonStr = ConfigurationManager.config.getString(Constants.TASK_PARAMS) 19 | val taskParam = JSONObject.fromObject(jsonStr) 20 | 21 | // 获取唯一主键 22 | val taskUUID = UUID.randomUUID().toString 23 | 24 | // 创建sparkConf 25 | val sparkConf = new SparkConf().setAppName("pageConvert").setMaster("local[*]") 26 | 27 | // 创建sparkSession 28 | val sparkSession = SparkSession.builder().config(sparkConf).enableHiveSupport().getOrCreate() 29 | 30 | // 获取用户行为数据 31 | val sessionId2ActionRDD = getUserVisitAction(sparkSession, taskParam) 32 | 33 | // pageFlowStr: "1,2,3,4,5,6,7" 34 | val pageFlowStr = ParamUtils.getParam(taskParam, Constants.PARAM_TARGET_PAGE_FLOW) 35 | // pageFlowArray: Array[Long] [1,2,3,4,5,6,7] 36 | val pageFlowArray = pageFlowStr.split(",") 37 | // pageFlowArray.slice(0, pageFlowArray.length - 1): [1,2,3,4,5,6] 38 | // pageFlowArray.tail: [2,3,4,5,6,7] 39 | // pageFlowArray.slice(0, pageFlowArray.length - 1).zip(pageFlowArray.tail): [(1,2), (2,3) , ..] 40 | // targetPageSplit: [1_2, 2_3, 3_4, ...] 41 | val targetPageSplit = pageFlowArray.slice(0, pageFlowArray.length - 1).zip(pageFlowArray.tail).map{ 42 | case (page1, page2) => page1 + "_" + page2 43 | } 44 | 45 | // sessionId2ActionRDD: RDD[(sessionId, action)] 46 | val sessionId2GroupRDD = sessionId2ActionRDD.groupByKey() 47 | 48 | // pageSpllitNumRDD: RDD[(String, 1L)] 49 | val pageSpllitNumRDD = sessionId2GroupRDD.flatMap{ 50 | case (sessionId, iterableAction) => 51 | // item1: action 52 | // item2: action 53 | // sortList: List[UserVisitAction] 54 | val sortList = iterableAction.toList.sortWith((item1, item2) =>{ 55 | DateUtils.parseTime(item1.action_time).getTime < DateUtils.parseTime(item2.action_time).getTime 56 | }) 57 | 58 | // pageList: List[Long] [1,2,3,4,...] 59 | val pageList = sortList.map{ 60 | case action => action.page_id 61 | } 62 | 63 | // pageList.slice(0, pageList.length - 1): [1,2,3,..,N-1] 64 | // pageList.tail: [2,3,4,..,N] 65 | // pageList.slice(0, pageList.length - 1).zip(pageList.tail): [(1,2), (2,3), ...] 66 | // pageSplit: [1_2, 2_3, ...] 67 | val pageSplit = pageList.slice(0, pageList.length - 1).zip(pageList.tail).map{ 68 | case (page1, page2) => page1 + "_" + page2 69 | } 70 | 71 | val pageSplitFilter = pageSplit.filter{ 72 | case pageSplit => targetPageSplit.contains(pageSplit) 73 | } 74 | 75 | pageSplitFilter.map{ 76 | case pageSplit => (pageSplit, 1L) 77 | } 78 | } 79 | 80 | // pageSplitCountMap: Map[(pageSplit, count)] 81 | val pageSplitCountMap = pageSpllitNumRDD.countByKey() 82 | 83 | val startPage = pageFlowArray(0).toLong 84 | 85 | val startPageCount = sessionId2ActionRDD.filter{ 86 | case (sessionId, action) => action.page_id == startPage 87 | }.count() 88 | 89 | getPageConvert(sparkSession, taskUUID, targetPageSplit, startPageCount, pageSplitCountMap) 90 | 91 | } 92 | 93 | def getPageConvert(sparkSession: SparkSession, 94 | taskUUID: String, 95 | targetPageSplit: Array[String], 96 | startPageCount: Long, 97 | ageSplitCountMap: collection.Map[String, Long]): Unit = { 98 | 99 | val pageSplitRatio = new mutable.HashMap[String, Double]() 100 | 101 | var lastPageCount = startPageCount.toDouble 102 | 103 | // 1,2,3,4,5,6,7 104 | // 1_2,2_3,... 105 | for(pageSplit <- targetPageSplit){ 106 | // 第一次循环: lastPageCount: page1 currentPageSplitCount: page1_page2 结果:page1_page2 107 | val currentPageSplitCount = ageSplitCountMap.get(pageSplit).get.toDouble 108 | val ratio = currentPageSplitCount / lastPageCount 109 | pageSplitRatio.put(pageSplit, ratio) 110 | lastPageCount = currentPageSplitCount 111 | } 112 | 113 | val convertStr = pageSplitRatio.map{ 114 | case (pageSplit, ratio) => pageSplit + "=" + ratio 115 | }.mkString("|") 116 | 117 | val pageSplit = PageSplitConvertRate(taskUUID, convertStr) 118 | 119 | val pageSplitRatioRDD = sparkSession.sparkContext.makeRDD(Array(pageSplit)) 120 | 121 | import sparkSession.implicits._ 122 | pageSplitRatioRDD.toDF().write 123 | .format("jdbc") 124 | .option("url", ConfigurationManager.config.getString(Constants.JDBC_URL)) 125 | .option("dbtable", "page_split_convert_rate_0308") 126 | .option("user", ConfigurationManager.config.getString(Constants.JDBC_USER)) 127 | .option("password", ConfigurationManager.config.getString(Constants.JDBC_PASSWORD)) 128 | .mode(SaveMode.Append) 129 | .save() 130 | 131 | } 132 | 133 | 134 | def getUserVisitAction(sparkSession: SparkSession, taskParam: JSONObject) = { 135 | val startDate = ParamUtils.getParam(taskParam, Constants.PARAM_START_DATE) 136 | val endDate = ParamUtils.getParam(taskParam, Constants.PARAM_END_DATE) 137 | 138 | val sql = "select * from user_visit_action where date>='" + startDate + "' and date<='" + 139 | endDate + "'" 140 | 141 | import sparkSession.implicits._ 142 | sparkSession.sql(sql).as[UserVisitAction].rdd.map(item => (item.session_id, item)) 143 | } 144 | 145 | } 146 | -------------------------------------------------------------------------------- /commons/target/classes/test/PageStat.scala: -------------------------------------------------------------------------------- 1 | import java.util.UUID 2 | 3 | import commons.conf.ConfigurationManager 4 | import commons.constant.Constants 5 | import commons.model.UserVisitAction 6 | import commons.utils.{DateUtils, NumberUtils, ParamUtils, StringUtils} 7 | import net.sf.json.JSONObject 8 | import org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.BlockTargetPair 9 | import org.apache.spark.SparkConf 10 | import org.apache.spark.rdd.RDD 11 | import org.apache.spark.sql.{SaveMode, SparkSession} 12 | 13 | import scala.collection.mutable 14 | 15 | object PageStat { 16 | 17 | def main(args: Array[String]): Unit = { 18 | 19 | // 获取任务限制条件 20 | val jsonStr = ConfigurationManager.config.getString(Constants.TASK_PARAMS) 21 | val taskParam = JSONObject.fromObject(jsonStr) 22 | 23 | // 获取唯一主键 24 | val taskUUID = UUID.randomUUID.toString 25 | 26 | // 创建sparkConf 27 | val sparkConf = new SparkConf().setAppName("pageStat").setMaster("local[*]") 28 | 29 | // 创建sparkSession 30 | val sparkSession = SparkSession.builder().config(sparkConf).enableHiveSupport().getOrCreate() 31 | 32 | val sessionId2ActionRDD = getActionRDD(sparkSession, taskParam) 33 | 34 | /* 获取目标访问页面切片 */ 35 | 36 | // 1,2,3,4,5,6,7 37 | val pageInfo = ParamUtils.getParam(taskParam, Constants.PARAM_TARGET_PAGE_FLOW) 38 | // [1,2,3,4,5,6,7] 39 | val pageArray = pageInfo.split(",") 40 | // pageArray.slice(0, pageArray.length - 1): [1,2,3,4,5,6] 41 | // pageArray.tail:[2,3,4,5,6,7] 42 | // zip: (1,2),(2,3)..... 43 | val targetPageFlow = pageArray.slice(0, pageArray.length - 1).zip(pageArray.tail).map{ 44 | case (item1, item2) => item1 + "_" + item2 45 | } 46 | 47 | /* 获取每一个session的页面访问流 */ 48 | 49 | // 得到一个session所有的行为数据 50 | val sessionId2GroupRDD = sessionId2ActionRDD.groupByKey() 51 | 52 | // 获取每一个session的页面访问流 53 | // 1. 按照action_time对session所有的行为数据进行排序 54 | // 2. 通过map操作得到action数据里面的page_id 55 | // 3. 得到按时间排列的page_id之后,先转化为页面切片形式 56 | // 4. 过滤,将不存在于目标统计页面切片的数据过滤掉 57 | // 5. 转化格式为(page1_page2, 1L) 58 | val pageId2NumRDD = getPageSplit(sparkSession, targetPageFlow, sessionId2GroupRDD) 59 | 60 | // 聚合操作 61 | // (page1_page2, count) 62 | val pageSplitCountMap = pageId2NumRDD.countByKey() 63 | 64 | val startPage = pageArray(0) 65 | 66 | val startPageCount = sessionId2ActionRDD.filter{ 67 | case (sessionId, userVisitAction) => 68 | userVisitAction.page_id == startPage.toLong 69 | }.count() 70 | 71 | // 得到最后的统计结果 72 | getPageConvertRate(sparkSession, taskUUID, targetPageFlow, startPageCount, pageSplitCountMap) 73 | } 74 | 75 | def getPageConvertRate(sparkSession: SparkSession, 76 | taskUUID: String, 77 | targetPageFlow:Array[String], 78 | startPageCount: Long, 79 | pageSplitCountMap: collection.Map[String, Long]): Unit = { 80 | 81 | val pageSplitConvertMap = new mutable.HashMap[String, Double]() 82 | 83 | var lastPageCount = startPageCount.toDouble 84 | 85 | for(page <- targetPageFlow){ 86 | val currentPageCount = pageSplitCountMap.get(page).get.toDouble 87 | val rate = NumberUtils.formatDouble(currentPageCount / lastPageCount, 2) 88 | pageSplitConvertMap.put(page, rate) 89 | lastPageCount = currentPageCount 90 | } 91 | 92 | val convertStr = pageSplitConvertMap.map{ 93 | case (k,v) => k + "=" + v 94 | }.mkString("|") 95 | 96 | val pageConvert = PageSplitConvertRate(taskUUID, convertStr) 97 | 98 | val pageConvertRDD = sparkSession.sparkContext.makeRDD(Array(pageConvert)) 99 | 100 | import sparkSession.implicits._ 101 | pageConvertRDD.toDF().write 102 | .format("jdbc") 103 | .option("url", ConfigurationManager.config.getString(Constants.JDBC_URL)) 104 | .option("dbtable", "page_split_convert_rate1108") 105 | .option("user", ConfigurationManager.config.getString(Constants.JDBC_USER)) 106 | .option("password", ConfigurationManager.config.getString(Constants.JDBC_PASSWORD)) 107 | .mode(SaveMode.Append) 108 | .save() 109 | } 110 | 111 | def getPageSplit(sparkSession: SparkSession, 112 | targetPageFlow: Array[String], 113 | sessionId2GroupRDD: RDD[(String, Iterable[UserVisitAction])]) = { 114 | sessionId2GroupRDD.flatMap{ 115 | case (sessionId, iterableAction) => 116 | // 首先按照时间进行排序 117 | val sortedAction = iterableAction.toList.sortWith((action1, action2) => { 118 | DateUtils.parseTime(action1.action_time).getTime < 119 | DateUtils.parseTime(action2.action_time).getTime 120 | }) 121 | 122 | val pageInfo = sortedAction.map(item => item.page_id) 123 | 124 | val pageFlow = pageInfo.slice(0, pageInfo.length - 1).zip(pageInfo.tail).map{ 125 | case (page1, page2) => page1 + "_" + page2 126 | } 127 | 128 | val pageSplitFiltered = pageFlow.filter(item => targetPageFlow.contains(item)).map(item => (item, 1L)) 129 | 130 | pageSplitFiltered 131 | } 132 | 133 | 134 | } 135 | 136 | 137 | def getActionRDD(sparkSession: SparkSession, taskParam: JSONObject) = { 138 | val startDate = ParamUtils.getParam(taskParam, Constants.PARAM_START_DATE) 139 | val endDate = ParamUtils.getParam(taskParam, Constants.PARAM_END_DATE) 140 | 141 | val sql = "select * from user_visit_action where date>='" + startDate + "' and date<='" + endDate + "'" 142 | 143 | import sparkSession.implicits._ 144 | sparkSession.sql(sql).as[UserVisitAction].rdd.map(item => (item.session_id, item)) 145 | } 146 | 147 | } 148 | -------------------------------------------------------------------------------- /commons/target/classes/test/ad.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Navicat Premium Data Transfer 3 | 4 | Source Server : localhost 5 | Source Server Type : MySQL 6 | Source Server Version : 50720 7 | Source Host : localhost 8 | Source Database : commerce 9 | 10 | Target Server Type : MySQL 11 | Target Server Version : 50720 12 | File Encoding : utf-8 13 | 14 | Date: 11/03/2017 11:23:32 AM 15 | */ 16 | 17 | SET FOREIGN_KEY_CHECKS = 0; 18 | 19 | -- ---------------------------- 20 | -- Table structure for `ad_blacklist` 21 | -- ---------------------------- 22 | DROP TABLE IF EXISTS `ad_blacklist`; 23 | CREATE TABLE `ad_blacklist` ( 24 | `userid` int(11) DEFAULT NULL 25 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 26 | 27 | -- ---------------------------- 28 | -- Table structure for `ad_click_trend` 29 | -- ---------------------------- 30 | DROP TABLE IF EXISTS `ad_click_trend`; 31 | CREATE TABLE `ad_click_trend` ( 32 | `date` varchar(30) DEFAULT NULL, 33 | `hour` varchar(30) DEFAULT NULL, 34 | `minute` varchar(30) DEFAULT NULL, 35 | `adid` int(11) DEFAULT NULL, 36 | `clickCount` int(11) DEFAULT NULL 37 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 38 | 39 | -- ---------------------------- 40 | -- Table structure for `ad_province_top3` 41 | -- ---------------------------- 42 | DROP TABLE IF EXISTS `ad_province_top3`; 43 | CREATE TABLE `ad_province_top3` ( 44 | `date` varchar(30) DEFAULT NULL, 45 | `province` varchar(100) DEFAULT NULL, 46 | `adid` int(11) DEFAULT NULL, 47 | `clickCount` int(11) DEFAULT NULL 48 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 49 | 50 | -- ---------------------------- 51 | -- Table structure for `ad_stat` 52 | -- ---------------------------- 53 | DROP TABLE IF EXISTS `ad_stat`; 54 | CREATE TABLE `ad_stat` ( 55 | `date` varchar(30) DEFAULT NULL, 56 | `province` varchar(100) DEFAULT NULL, 57 | `city` varchar(100) DEFAULT NULL, 58 | `adid` int(11) DEFAULT NULL, 59 | `clickCount` int(11) DEFAULT NULL 60 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 61 | 62 | -- ---------------------------- 63 | -- Table structure for `ad_user_click_count` 64 | -- ---------------------------- 65 | DROP TABLE IF EXISTS `ad_user_click_count`; 66 | CREATE TABLE `ad_user_click_count` ( 67 | `date` varchar(30) DEFAULT NULL, 68 | `userid` int(11) DEFAULT NULL, 69 | `adid` int(11) DEFAULT NULL, 70 | `clickCount` int(11) DEFAULT NULL 71 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 72 | 73 | -------------------------------------------------------------------------------- /mock/mock.iml: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /mock/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | shopAnalyze 7 | org.example 8 | 1.0-SNAPSHOT 9 | 10 | 4.0.0 11 | 12 | mock 13 | 14 | 15 | org.apache.hadoop 16 | hadoop-client 17 | 2.8.5 18 | 19 | 20 | org.apache.hadoop 21 | hadoop-common 22 | 2.8.5 23 | 24 | 25 | org.apache.hadoop 26 | hadoop-hdfs 27 | 2.8.5 28 | 29 | 30 | org.apache.hadoop 31 | hadoop-yarn-common 32 | 2.8.5 33 | 34 | 35 | org.codehaus.janino 36 | janino 37 | 3.0.8 38 | 39 | 40 | org.apache.spark 41 | spark-sql_2.12 42 | 2.4.5 43 | 44 | 45 | mysql 46 | mysql-connector-java 47 | 8.0.20 48 | 49 | 50 | org.example 51 | commons 52 | ${project.version} 53 | 54 | 55 | 56 | org.apache.spark 57 | spark-core_2.12 58 | 2.4.5 59 | 60 | 61 | 62 | org.apache.spark 63 | spark-hive_2.12 64 | 2.4.5 65 | 66 | 67 | org.apache.spark 68 | spark-streaming_2.12 69 | 2.4.5 70 | 71 | 72 | org.apache.spark 73 | spark-streaming-kafka-0-10_2.12 74 | 2.4.3 75 | 76 | 77 | slf4j-log4j12 78 | org.slf4j 79 | 80 | 81 | 82 | 83 | org.apache.spark 84 | spark-sql_2.12 85 | 2.4.5 86 | 87 | 88 | 89 | -------------------------------------------------------------------------------- /mock/src/main/java/scala/MockDataGenerate.scala: -------------------------------------------------------------------------------- 1 | package scala 2 | 3 | /* 4 | * Copyright (c) 2018. Atguigu Inc. All Rights Reserved. 5 | */ 6 | 7 | import java.util.UUID 8 | 9 | import commons.model.{ProductInfo, UserInfo, UserVisitAction} 10 | import commons.utils.{DateUtils, StringUtil} 11 | import org.apache.spark.SparkConf 12 | import org.apache.spark.sql.{DataFrame, SparkSession} 13 | 14 | import scala.collection.mutable.ArrayBuffer 15 | import scala.util.Random 16 | 17 | 18 | /** 19 | * 模拟的数据 20 | * date:是当前日期 21 | * age: 0 - 59 22 | * professionals: professional[0 - 59] 23 | * cities: 0 - 9 24 | * sex: 0 - 1 25 | * keywords: ("火锅", "蛋糕", "重庆辣子鸡", "重庆小面", "呷哺呷哺", "新辣道鱼火锅", "国贸大厦", "太古商场", "日本料理", "温泉") 26 | * categoryIds: 0 - 99 27 | * ProductId: 0 - 99 28 | */ 29 | object MockDataGenerate { 30 | 31 | /** 32 | * 模拟用户行为信息 33 | * 34 | * @return 35 | */ 36 | private def mockUserVisitActionData(): Array[UserVisitAction] = { 37 | 38 | val searchKeywords = Array("华为手机", "联想笔记本", "小龙虾", "卫生纸", "吸尘器", "Lamer", "机器学习", "苹果", "洗面奶", "保温杯") 39 | // yyyy-MM-dd 40 | val date = DateUtils.getTodayDate() 41 | // 关注四个行为:搜索、点击、下单、支付 42 | val actions = Array("search", "click", "order", "pay") 43 | val random = new Random() 44 | val rows = ArrayBuffer[UserVisitAction]() 45 | 46 | // 一共100个用户(有重复) 47 | for (i <- 0 to 100) { 48 | val userid = random.nextInt(100) 49 | // 每个用户产生10个session 50 | for (j <- 0 to 10) { 51 | // 不可变的,全局的,独一无二的128bit长度的标识符,用于标识一个session,体现一次会话产生的sessionId是独一无二的 52 | val sessionid = UUID.randomUUID().toString().replace("-", "") 53 | // 在yyyy-MM-dd后面添加一个随机的小时时间(0-23) 54 | val baseActionTime = date + " " + random.nextInt(23) 55 | // 每个(userid + sessionid)生成0-100条用户访问数据 56 | for (k <- 0 to random.nextInt(100)) { 57 | val pageid = random.nextInt(10) 58 | // 在yyyy-MM-dd HH后面添加一个随机的分钟时间和秒时间 59 | val actionTime = baseActionTime + ":" + StringUtil.fulfuill(String.valueOf(random.nextInt(59))) + ":" + StringUtil.fulfuill(String.valueOf(random.nextInt(59))) 60 | var searchKeyword: String = null 61 | var clickCategoryId: Long = -1L 62 | var clickProductId: Long = -1L 63 | var orderCategoryIds: String = null 64 | var orderProductIds: String = null 65 | var payCategoryIds: String = null 66 | var payProductIds: String = null 67 | val cityid = random.nextInt(10).toLong 68 | // 随机确定用户在当前session中的行为 69 | val action = actions(random.nextInt(4)) 70 | 71 | // 根据随机产生的用户行为action决定对应字段的值 72 | action match { 73 | case "search" => searchKeyword = searchKeywords(random.nextInt(10)) 74 | case "click" => clickCategoryId = random.nextInt(100).toLong 75 | clickProductId = String.valueOf(random.nextInt(100)).toLong 76 | case "order" => orderCategoryIds = random.nextInt(100).toString 77 | orderProductIds = random.nextInt(100).toString 78 | case "pay" => payCategoryIds = random.nextInt(100).toString 79 | payProductIds = random.nextInt(100).toString 80 | } 81 | 82 | rows += UserVisitAction(date, userid, sessionid, 83 | pageid, actionTime, searchKeyword, 84 | clickCategoryId, clickProductId, 85 | orderCategoryIds, orderProductIds, 86 | payCategoryIds, payProductIds, cityid) 87 | } 88 | } 89 | } 90 | rows.toArray 91 | } 92 | 93 | /** 94 | * 模拟用户信息表 95 | * 96 | * @return 97 | */ 98 | private def mockUserInfo(): Array[UserInfo] = { 99 | 100 | val rows = ArrayBuffer[UserInfo]() 101 | val sexes = Array("male", "female") 102 | val random = new Random() 103 | 104 | // 随机产生100个用户的个人信息 105 | for (i <- 0 to 100) { 106 | val userid = i 107 | val username = "user" + i 108 | val name = "name" + i 109 | val age = random.nextInt(60) 110 | val professional = "professional" + random.nextInt(100) 111 | val city = "city" + random.nextInt(100) 112 | val sex = sexes(random.nextInt(2)) 113 | rows += UserInfo(userid, username, name, age, 114 | professional, city, sex) 115 | } 116 | rows.toArray 117 | } 118 | 119 | /** 120 | * 模拟产品数据表 121 | * 122 | * @return 123 | */ 124 | private def mockProductInfo(): Array[ProductInfo] = { 125 | 126 | val rows = ArrayBuffer[ProductInfo]() 127 | val random = new Random() 128 | val productStatus = Array(0, 1) 129 | 130 | // 随机产生100个产品信息 131 | for (i <- 0 to 100) { 132 | val productId = i 133 | val productName = "product" + i 134 | val extendInfo = "{\"product_status\": " + productStatus(random.nextInt(2)) + "}" 135 | 136 | rows += ProductInfo(productId, productName, extendInfo) 137 | } 138 | 139 | rows.toArray 140 | } 141 | 142 | /** 143 | * 将DataFrame插入到Hive表中 144 | * 145 | * @param spark SparkSQL客户端 146 | * @param tableName 表名 147 | * @param dataDF DataFrame 148 | */ 149 | private def insertHive(spark: SparkSession, tableName: String, dataDF: DataFrame): Unit = { 150 | // spark.sql("DROP TABLE IF EXISTS " + tableName) 151 | dataDF.write.saveAsTable(tableName) 152 | //dataDF.write.parquet("hdfs://hadoop1:9000/shopAnalyze") 153 | } 154 | 155 | val USER_VISIT_ACTION_TABLE = "user_visit_action" 156 | val USER_INFO_TABLE = "user_info" 157 | val PRODUCT_INFO_TABLE = "product_info" 158 | 159 | /** 160 | * 主入口方法 161 | * 162 | * @param args 启动参数 163 | */ 164 | def main(args: Array[String]): Unit = { 165 | 166 | // 创建Spark配置 167 | val sparkConf = new SparkConf().setAppName("MockData").setMaster("local[*]"); 168 | 169 | 170 | // 创建Spark SQL 客户端 171 | val spark = SparkSession.builder().config(sparkConf).enableHiveSupport().getOrCreate() 172 | 173 | // 模拟数据 174 | val userVisitActionData = this.mockUserVisitActionData() 175 | val userInfoData = this.mockUserInfo() 176 | val productInfoData = this.mockProductInfo() 177 | 178 | // 将模拟数据装换为RDD 179 | val userVisitActionRdd = spark.sparkContext.makeRDD(userVisitActionData) 180 | val userInfoRdd = spark.sparkContext.makeRDD(userInfoData) 181 | val productInfoRdd = spark.sparkContext.makeRDD(productInfoData) 182 | 183 | // 加载SparkSQL的隐式转换支持 184 | import spark.implicits._ 185 | 186 | // 将用户访问数据装换为DF保存到Hive表中 187 | val userVisitActionDF = userVisitActionRdd.toDF() 188 | userVisitActionDF.show(); 189 | insertHive(spark, USER_VISIT_ACTION_TABLE, userVisitActionDF) 190 | 191 | val userInfoDF = userInfoRdd.toDF() 192 | //userInfoDF.show(); 193 | insertHive(spark, USER_INFO_TABLE, userInfoDF) 194 | 195 | // 将产品信息数据转换为DF保存到Hive表中 196 | val productInfoDF = productInfoRdd.toDF() 197 | insertHive(spark,PRODUCT_INFO_TABLE,productInfoDF); 198 | 199 | spark.close 200 | } 201 | 202 | } 203 | -------------------------------------------------------------------------------- /mock/src/main/java/scala/MockRealTimeData.scala: -------------------------------------------------------------------------------- 1 | package scala 2 | 3 | /* 4 | * Copyright (c) 2018. Atguigu Inc. All Rights Reserved. 5 | */ 6 | 7 | import java.util.Properties 8 | 9 | import commons.conf.ConfigurationManager 10 | import org.apache.kafka.clients.producer.{KafkaProducer, ProducerConfig, ProducerRecord} 11 | 12 | import scala.collection.mutable.ArrayBuffer 13 | import scala.util.Random 14 | 15 | object MockRealTimeData { 16 | 17 | /** 18 | * 模拟的数据 19 | * 时间点: 当前时间毫秒 20 | * userId: 0 - 99 21 | * 省份、城市 ID相同 : 1 - 9 22 | * adid: 0 - 19 23 | * ((0L,"北京","北京"),(1L,"上海","上海"),(2L,"南京","江苏省"),(3L,"广州","广东省"),(4L,"三亚","海南省"),(5L,"武汉","湖北省"),(6L,"长沙","湖南省"),(7L,"西安","陕西省"),(8L,"成都","四川省"),(9L,"哈尔滨","东北省")) 24 | * 格式 :timestamp province city userid adid 25 | * 某个时间点 某个省份 某个城市 某个用户 某个广告 26 | */ 27 | def generateMockData(): Array[String] = { 28 | val array = ArrayBuffer[String]() 29 | val random = new Random() 30 | // 模拟实时数据: 31 | // timestamp province city userid adid 32 | for (i <- 0 to 50) { 33 | 34 | val timestamp = System.currentTimeMillis() 35 | val province = random.nextInt(3) 36 | val city = province 37 | val adid = random.nextInt(3) 38 | val userid = random.nextInt(3) 39 | 40 | // 拼接实时数据 41 | array += timestamp + " " + province + " " + city + " " + userid + " " + adid 42 | } 43 | array.toArray 44 | } 45 | 46 | def createKafkaProducer(broker: String): KafkaProducer[String, String] = { 47 | 48 | // 创建配置对象 49 | val prop = new Properties() 50 | // 添加配置 51 | prop.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, broker) 52 | prop.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer") 53 | prop.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer") 54 | 55 | // 根据配置创建Kafka生产者 56 | new KafkaProducer[String, String](prop) 57 | } 58 | 59 | 60 | def main(args: Array[String]): Unit = { 61 | 62 | // 获取配置文件commerce.properties中的Kafka配置参数 63 | val broker = ConfigurationManager.config.getString("kafka.broker.list") 64 | val topic = ConfigurationManager.config.getString("kafka.topics") 65 | 66 | // 创建Kafka消费者 67 | val kafkaProducer = createKafkaProducer(broker) 68 | 69 | while (true) { 70 | // 随机产生实时数据并通过Kafka生产者发送到Kafka集群中 71 | for (item <- generateMockData()) { 72 | kafkaProducer.send(new ProducerRecord[String, String](topic, item)) 73 | } 74 | println("success"); 75 | Thread.sleep(3000) 76 | } 77 | } 78 | } 79 | -------------------------------------------------------------------------------- /mock/target/classes/scala/MockDataGenerate$$typecreator13$1.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/mock/target/classes/scala/MockDataGenerate$$typecreator13$1.class -------------------------------------------------------------------------------- /mock/target/classes/scala/MockDataGenerate$$typecreator21$1.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/mock/target/classes/scala/MockDataGenerate$$typecreator21$1.class -------------------------------------------------------------------------------- /mock/target/classes/scala/MockDataGenerate$$typecreator5$1.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/mock/target/classes/scala/MockDataGenerate$$typecreator5$1.class -------------------------------------------------------------------------------- /mock/target/classes/scala/MockDataGenerate$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/mock/target/classes/scala/MockDataGenerate$.class -------------------------------------------------------------------------------- /mock/target/classes/scala/MockDataGenerate.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/mock/target/classes/scala/MockDataGenerate.class -------------------------------------------------------------------------------- /mock/target/classes/scala/MockRealTimeData$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/mock/target/classes/scala/MockRealTimeData$.class -------------------------------------------------------------------------------- /mock/target/classes/scala/MockRealTimeData.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/mock/target/classes/scala/MockRealTimeData.class -------------------------------------------------------------------------------- /pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 4.0.0 6 | 7 | org.example 8 | shopAnalyze 9 | pom 10 | 1.0-SNAPSHOT 11 | 12 | commons 13 | session 14 | mock 15 | adverStat 16 | 17 | 18 | Maven 19 | 20 | http://maven.apache.org/ 21 | 2001 22 | 23 | 24 | 25 | website 26 | scp://webhost.company.com/www/website 27 | 28 | 29 | 30 | 31 | UTF-8 32 | 12 33 | 12 34 | 4.12 35 | 1.18.10 36 | 1.2.17 37 | 8.0.18 38 | 1.1.16 39 | 2.1.1 40 | 41 | 42 | 43 | 44 | 45 | 46 | com.alibaba.cloud 47 | spring-cloud-alibaba-dependencies 48 | 2.1.0.RELEASE 49 | pom 50 | import 51 | 52 | 53 | 54 | org.apache.maven.plugins 55 | maven-project-info-reports-plugin 56 | 3.0.0 57 | 58 | 59 | 60 | org.springframework.boot 61 | spring-boot-dependencies 62 | 2.2.2.RELEASE 63 | pom 64 | import 65 | 66 | 67 | 68 | org.springframework.cloud 69 | spring-cloud-dependencies 70 | Hoxton.SR1 71 | pom 72 | import 73 | 74 | 75 | com.alibaba.cloud 76 | spring-cloud-alibaba-dependencies 77 | 2.1.0.RELEASE 78 | pom 79 | import 80 | 81 | 82 | 83 | mysql 84 | mysql-connector-java 85 | ${mysql.version} 86 | runtime 87 | 88 | 89 | 90 | com.alibaba 91 | druid 92 | ${druid.version} 93 | 94 | 95 | org.mybatis.spring.boot 96 | mybatis-spring-boot-starter 97 | ${mybatis.spring.boot.version} 98 | 99 | 100 | 101 | junit 102 | junit 103 | ${junit.version} 104 | 105 | 106 | 107 | log4j 108 | log4j 109 | ${log4j.version} 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | org.springframework.boot 119 | spring-boot-maven-plugin 120 | 121 | true 122 | true 123 | 124 | 125 | 126 | 127 | -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/readme.md -------------------------------------------------------------------------------- /session/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | shopAnalyze 7 | org.example 8 | 1.0-SNAPSHOT 9 | 10 | 4.0.0 11 | 12 | session 13 | 14 | 15 | 16 | org.apache.hadoop 17 | hadoop-client 18 | 2.8.5 19 | 20 | 21 | com.fasterxml.jackson.core 22 | jackson-core 23 | 2.10.0 24 | 25 | 26 | com.fasterxml.jackson.core 27 | jackson-annotations 28 | 2.10.0 29 | 30 | 31 | 32 | com.fasterxml.jackson.core 33 | jackson-databind 34 | 2.10.0 35 | 36 | 37 | 38 | 39 | com.alibaba 40 | fastjson 41 | 1.2.36 42 | 43 | 44 | 45 | org.apache.hadoop 46 | hadoop-common 47 | 2.8.5 48 | 49 | 50 | org.apache.hadoop 51 | hadoop-hdfs 52 | 2.8.5 53 | 54 | 55 | commons-beanutils 56 | commons-beanutils 57 | 1.9.3 58 | 59 | 60 | org.apache.hadoop 61 | hadoop-yarn-common 62 | 2.8.5 63 | 64 | 65 | org.codehaus.janino 66 | janino 67 | 3.0.8 68 | 69 | 70 | org.apache.spark 71 | spark-sql_2.12 72 | 2.4.5 73 | 74 | 75 | mysql 76 | mysql-connector-java 77 | 8.0.20 78 | 79 | 80 | org.example 81 | commons 82 | ${project.version} 83 | 84 | 85 | 86 | org.apache.spark 87 | spark-core_2.12 88 | 2.4.5 89 | 90 | 91 | 92 | org.apache.spark 93 | spark-hive_2.12 94 | 2.4.5 95 | 96 | 97 | org.apache.spark 98 | spark-streaming_2.12 99 | 2.4.5 100 | 101 | 102 | org.apache.spark 103 | spark-streaming-kafka-0-10_2.12 104 | 2.4.3 105 | 106 | 107 | slf4j-log4j12 108 | org.slf4j 109 | 110 | 111 | 112 | 113 | org.apache.spark 114 | spark-sql_2.12 115 | 2.4.5 116 | 117 | 118 | -------------------------------------------------------------------------------- /session/session.iml: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /session/src/main/java/scala/sessionAccumulator.scala: -------------------------------------------------------------------------------- 1 | package scala 2 | 3 | import org.apache.spark.util.AccumulatorV2 4 | 5 | import scala.collection.mutable 6 | 7 | class sessionAccumulator extends AccumulatorV2[String,mutable.HashMap[String,Int]] { 8 | val countMap=new mutable.HashMap[String,Int](); 9 | override def isZero: Boolean = { 10 | countMap.isEmpty; 11 | } 12 | 13 | override def copy(): AccumulatorV2[String, mutable.HashMap[String, Int]] = { 14 | val acc=new sessionAccumulator; 15 | acc.countMap++=this.countMap; 16 | acc 17 | } 18 | 19 | override def reset(): Unit = { 20 | countMap.clear; 21 | } 22 | 23 | override def add(v: String): Unit = { 24 | if (!countMap.contains(v)){ 25 | countMap+=(v->0); 26 | } 27 | countMap.update(v,countMap(v)+1); 28 | } 29 | 30 | override def merge(other: AccumulatorV2[String, mutable.HashMap[String, Int]])={ 31 | other match{ 32 | case acc:sessionAccumulator=>acc.countMap.foldLeft(this.countMap){ 33 | case(map,(k,v))=>map+=(k->(map.getOrElse(k,0)+v)); 34 | } 35 | } 36 | } 37 | 38 | override def value: mutable.HashMap[String, Int] = { 39 | this.countMap; 40 | } 41 | } 42 | -------------------------------------------------------------------------------- /session/src/main/java/scala/sessionStat.scala: -------------------------------------------------------------------------------- 1 | package scala 2 | 3 | import java.util.UUID 4 | 5 | import com.alibaba.fastjson.{JSON, JSONObject} 6 | import commons.conf.ConfigurationManager 7 | import commons.constant.Constants 8 | import org.apache.spark.SparkConf 9 | import org.apache.spark.sql.SparkSession 10 | import server.{serverFive, serverFour, serverOne, serverThree, serverTwo} 11 | 12 | object sessionStat { 13 | 14 | 15 | def main(args: Array[String]): Unit = { 16 | //server 17 | val oneServer=new serverOne; 18 | val twoServer=new serverTwo; 19 | val threeServer=new serverThree; 20 | val fourServer=new serverFour; 21 | val fiveServer=new serverFive; 22 | //sparksession 23 | val conf=new SparkConf().setAppName("session").setMaster("local[*]"); 24 | val session=SparkSession.builder().config(conf).getOrCreate(); 25 | session.sparkContext.setLogLevel("ERROR"); 26 | //获取配置 27 | val str=ConfigurationManager.config.getString(Constants.TASK_PARAMS); 28 | val task:JSONObject=JSON.parseObject(str); 29 | //主键 30 | val taskUUID=UUID.randomUUID().toString; 31 | 32 | val filterInfo=getFilterFullResult(oneServer,session,task,taskUUID); 33 | 34 | //需求二 35 | //twoServer.GetextraSession(session,filterInfo,task,taskUUID); 36 | 37 | //需求三 38 | val actionRdd=oneServer.basicActions(session,task); 39 | val sessionId2ActionRDD = actionRdd.map{ 40 | item => (item.session_id, item) 41 | } 42 | val sessionId2FilterActionRDD=sessionId2ActionRDD.join(filterInfo).map { 43 | case (sessionId,(action,info))=>{ 44 | (sessionId,action); 45 | } 46 | } 47 | //val top10Category= threeServer.top10PopularCategories(session,taskUUID,sessionId2FilterActionRDD); 48 | //需求四 49 | 50 | //val top10SessionRDD=fourServer.top10ActiveSession(session,taskUUID,sessionId2FilterActionRDD,top10Category); 51 | 52 | //需求五 53 | fiveServer.getSkipRatio(session,sessionId2FilterActionRDD,taskUUID); 54 | 55 | } 56 | def getFilterFullResult(oneServer: serverOne, session: SparkSession, task: JSONObject,taskUUID:String) ={ 57 | //1.获取基本的action信息 58 | val basicActions=oneServer.basicActions(session,task); 59 | //2.根据session聚合信息 60 | val basicActionMap=basicActions.map(item=>{ 61 | val sessionId=item.session_id; 62 | (sessionId,item); 63 | }) 64 | val groupBasicActions=basicActionMap.groupByKey(); 65 | //3.根据每个用户的sessionId->actions,将actions统计成一条str信息 66 | val aggUserActions=oneServer.AggActionGroup(groupBasicActions); 67 | //4.读取hadoop文件,获取用户的基本信息 68 | val userInfo=oneServer.getUserInfo(session); 69 | //5.根据user_Id,将userInfo的信息插入到aggUserActions,形成更完整的信息 70 | val finalInfo=oneServer.AggInfoAndActions(aggUserActions,userInfo); 71 | finalInfo.cache(); 72 | //6.根据common模块里的限制条件过滤数据,跟新累加器 73 | val accumulator=new sessionAccumulator; 74 | session.sparkContext.register(accumulator); 75 | val FilterInfo=oneServer.filterInfo(finalInfo,task,accumulator); 76 | FilterInfo.count(); 77 | /* 78 | 目前为止,我们已经得到了所有符合条件的过滤总和信息,以及每个范围内的session数量(累加器), 79 | */ 80 | //7.计算每个范围内的session占比, 81 | val sessionRatioCount= oneServer.getSessionRatio(session,taskUUID,FilterInfo,accumulator.value); 82 | FilterInfo; 83 | } 84 | } 85 | -------------------------------------------------------------------------------- /session/src/main/java/server/SortKey.scala: -------------------------------------------------------------------------------- 1 | package server 2 | 3 | case class SortKey(clickCount:Long, orderCount:Long, payCount:Long) extends Ordered[SortKey]{ 4 | // this.compare(that) 5 | // this compare that 6 | // compare > 0 this > that 7 | // compare <0 this < that 8 | override def compare(that: SortKey): Int = { 9 | if(this.clickCount - that.clickCount != 0){ 10 | return (this.clickCount - that.clickCount).toInt 11 | }else if(this.orderCount - that.orderCount != 0){ 12 | return (this.orderCount - that.orderCount).toInt 13 | }else{ 14 | return (this.payCount - that.payCount).toInt 15 | } 16 | } 17 | } 18 | -------------------------------------------------------------------------------- /session/src/main/java/server/serverFive.scala: -------------------------------------------------------------------------------- 1 | package server 2 | 3 | import commons.constant.Constants 4 | import commons.model.UserVisitAction 5 | import commons.utils.{DateUtils, ParamUtils} 6 | import org.apache.spark.rdd.RDD 7 | import org.apache.spark.sql.SparkSession 8 | 9 | import scala.collection.mutable 10 | import scala.collection.mutable.ListBuffer 11 | 12 | class serverFive extends Serializable { 13 | 14 | def getSkipRatio(session: SparkSession, sessionId2FilterActionRDD: RDD[(String, UserVisitAction)], taskUUID: String)={ 15 | //1.获取目表页面 16 | val pageFlow=ParamUtils.getPageFlow(); 17 | 18 | //2.聚合用户信息,获取用户页面跳转统计---countByKey---(page1_page2, count) 19 | val sessionId2GroupRDD=sessionId2FilterActionRDD.groupByKey(); 20 | val skipCountRDD=getPageSKipCount(session,pageFlow,sessionId2GroupRDD ); 21 | val pageSplitCountMap=skipCountRDD.countByKey(); 22 | //3.计算比列 23 | getPagesSkipRatio(pageSplitCountMap,session,taskUUID); 24 | 25 | 26 | } 27 | def getPagesSkipRatio(pageSplitCountMap: collection.Map[String, Long], session: SparkSession, taskUUID: String) = { 28 | val sum=pageSplitCountMap.values.sum.toDouble; 29 | val ratios=pageSplitCountMap.map{ 30 | case(k,v)=>{ 31 | val ratio=v/sum; 32 | (k,ratio); 33 | } 34 | } 35 | ratios.foreach(println); 36 | } 37 | def getPageSKipCount(sparkSession: SparkSession, 38 | targetPageFlow: Array[String], 39 | sessionId2GroupRDD: RDD[(String, Iterable[UserVisitAction])]) = { 40 | sessionId2GroupRDD.flatMap{ 41 | case(sessionId,actions)=>{ 42 | val sortedActions=actions.toList.sortWith((item1,item2)=>{ 43 | DateUtils.parseTime(item1.action_time).getTimeitem.page_id); 46 | // pageArray.slice(0, pageArray.length - 1): [1,2,3,4,5,6] 47 | // pageArray.tail:[2,3,4,5,6,7] 48 | // zip: (1,2),(2,3)..... 49 | val splitPages=pages.slice(0,pages.size-1).zip(pages.tail).map{ 50 | case(page1,page2)=>{ 51 | page1+"-"+page2; 52 | } 53 | } 54 | 55 | val splitPagesFilter=splitPages.filter(item=>targetPageFlow.contains(item)).map(item=>(item,1L)); 56 | splitPagesFilter 57 | } 58 | } 59 | } 60 | 61 | 62 | } 63 | -------------------------------------------------------------------------------- /session/src/main/java/server/serverFour.scala: -------------------------------------------------------------------------------- 1 | package server 2 | 3 | import commons.constant.Constants 4 | import commons.model.{Top10Session, UserVisitAction} 5 | import commons.utils.StringUtil 6 | import org.apache.spark.rdd.RDD 7 | import org.apache.spark.sql.SparkSession 8 | 9 | import scala.collection.mutable 10 | 11 | class serverFour extends Serializable{ 12 | def top10ActiveSession(session: SparkSession, taskUUID: String, 13 | sessionId2FilterActionRDD: RDD[(String, UserVisitAction)], 14 | top10Category: Array[(SortKey, String)]) = { 15 | //1.获取top10热门商品的array; 16 | val top10Arr=top10Category.map{ 17 | case (sortKey,info)=>{ 18 | val cId= StringUtil.getFieldFromConcatString(info, "\\|", Constants.FIELD_CATEGORY_ID).toLong; 19 | cId 20 | } 21 | } 22 | //2.过滤数据 23 | val filterRDD=sessionId2FilterActionRDD.filter{ 24 | case (sessionId,action)=>{ 25 | val cId=action.click_category_id; 26 | top10Arr.contains(cId); 27 | } 28 | } 29 | //3.根据sessionId分组聚合,统计每个用户对每个商品的点击次数,最后结构为(categoryId,sessionId=count) 30 | val GroupFilterRDD=filterRDD.groupByKey(); 31 | val cid2SessionCountRDD=GroupFilterRDD.flatMap{ 32 | case(sessionId,actions)=>{ 33 | val countMap=new mutable.HashMap[Long,Long]; 34 | for(action<-actions){ 35 | val cId=action.click_category_id; 36 | if(!countMap.contains(cId)){ 37 | countMap+=(cId->0) 38 | } 39 | countMap.update(cId,countMap(cId)+1); 40 | } 41 | for((k,v)<-countMap) 42 | yield(k,session+"="+v); 43 | } 44 | } 45 | //4.groupByKey分组聚合 46 | val cid2GroupRDD=cid2SessionCountRDD.groupByKey(); 47 | //5.对每个cid对应的列表进行排序操作 48 | val top10ActiveSession=cid2GroupRDD.flatMap{ 49 | case (cid, iterableSessionCount) => 50 | // true: item1放在前面 51 | // flase: item2放在前面 52 | // item: sessionCount String "sessionId=count" 53 | val sortList = iterableSessionCount.toList.sortWith((item1, item2) => { 54 | item1.split("=")(1).toLong > item2.split("=")(1).toLong 55 | }).take(10) 56 | 57 | val top10Session = sortList.map{ 58 | // item : sessionCount String "sessionId=count" 59 | case item => 60 | val sessionId = item.split("=")(0) 61 | val count = item.split("=")(1).toLong 62 | Top10Session(taskUUID, cid, sessionId, count) 63 | } 64 | 65 | top10Session 66 | } 67 | top10ActiveSession.foreach(println); 68 | top10ActiveSession; 69 | //6.写入数据库 70 | /* import sparkSession.implicits._ 71 | top10SessionRDD.toDF().write 72 | .format("jdbc") 73 | .option("url", ConfigurationManager.config.getString(Constants.JDBC_URL)) 74 | .option("user", ConfigurationManager.config.getString(Constants.JDBC_USER)) 75 | .option("password", ConfigurationManager.config.getString(Constants.JDBC_PASSWORD)) 76 | .option("dbtable", "top10_session_0308") 77 | .mode(SaveMode.Append) 78 | .save()*/ 79 | 80 | 81 | } 82 | 83 | } 84 | -------------------------------------------------------------------------------- /session/src/main/java/server/serverOne.scala: -------------------------------------------------------------------------------- 1 | package server 2 | 3 | import java.util.Date 4 | 5 | import com.alibaba.fastjson.JSONObject 6 | import commons.conf.ConfigurationManager 7 | import commons.constant.Constants 8 | import commons.model.{SessionAggrStat, UserInfo, UserVisitAction} 9 | import commons.utils.{DateUtils, NumberUtils, StringUtil, ValidUtils} 10 | import org.apache.commons.lang.StringUtils 11 | import org.apache.spark.rdd.RDD 12 | import org.apache.spark.sql.{SaveMode, SparkSession} 13 | import org.spark_project.jetty.server.Authentication.User 14 | 15 | import scala.collection.mutable 16 | 17 | class serverOne extends Serializable { 18 | def getSessionRatio(sparkSession: SparkSession,taskUUID:String, FilterInfo: RDD[(String, String)], value:mutable.HashMap[String,Int]) = { 19 | val session_count = value.getOrElse(Constants.SESSION_COUNT, 1).toDouble 20 | 21 | val visit_length_1s_3s = value.getOrElse(Constants.TIME_PERIOD_1s_3s, 0) 22 | val visit_length_4s_6s = value.getOrElse(Constants.TIME_PERIOD_4s_6s, 0) 23 | val visit_length_7s_9s = value.getOrElse(Constants.TIME_PERIOD_7s_9s, 0) 24 | val visit_length_10s_30s = value.getOrElse(Constants.TIME_PERIOD_10s_30s, 0) 25 | val visit_length_30s_60s = value.getOrElse(Constants.TIME_PERIOD_30s_60s, 0) 26 | val visit_length_1m_3m = value.getOrElse(Constants.TIME_PERIOD_1m_3m, 0) 27 | val visit_length_3m_10m = value.getOrElse(Constants.TIME_PERIOD_3m_10m, 0) 28 | val visit_length_10m_30m = value.getOrElse(Constants.TIME_PERIOD_10m_30m, 0) 29 | val visit_length_30m = value.getOrElse(Constants.TIME_PERIOD_30m, 0) 30 | 31 | val step_length_1_3 = value.getOrElse(Constants.STEP_PERIOD_1_3, 0) 32 | val step_length_4_6 = value.getOrElse(Constants.STEP_PERIOD_4_6, 0) 33 | val step_length_7_9 = value.getOrElse(Constants.STEP_PERIOD_7_9, 0) 34 | val step_length_10_30 = value.getOrElse(Constants.STEP_PERIOD_10_30, 0) 35 | val step_length_30_60 = value.getOrElse(Constants.STEP_PERIOD_30_60, 0) 36 | val step_length_60 = value.getOrElse(Constants.STEP_PERIOD_60, 0) 37 | 38 | val visit_length_1s_3s_ratio = NumberUtils.formatDouble(visit_length_1s_3s / session_count, 2) 39 | val visit_length_4s_6s_ratio = NumberUtils.formatDouble(visit_length_4s_6s / session_count, 2) 40 | val visit_length_7s_9s_ratio = NumberUtils.formatDouble(visit_length_7s_9s / session_count, 2) 41 | val visit_length_10s_30s_ratio = NumberUtils.formatDouble(visit_length_10s_30s / session_count, 2) 42 | val visit_length_30s_60s_ratio = NumberUtils.formatDouble(visit_length_30s_60s / session_count, 2) 43 | val visit_length_1m_3m_ratio = NumberUtils.formatDouble(visit_length_1m_3m / session_count, 2) 44 | val visit_length_3m_10m_ratio = NumberUtils.formatDouble(visit_length_3m_10m / session_count, 2) 45 | val visit_length_10m_30m_ratio = NumberUtils.formatDouble(visit_length_10m_30m / session_count, 2) 46 | val visit_length_30m_ratio = NumberUtils.formatDouble(visit_length_30m / session_count, 2) 47 | 48 | val step_length_1_3_ratio = NumberUtils.formatDouble(step_length_1_3 / session_count, 2) 49 | val step_length_4_6_ratio = NumberUtils.formatDouble(step_length_4_6 / session_count, 2) 50 | val step_length_7_9_ratio = NumberUtils.formatDouble(step_length_7_9 / session_count, 2) 51 | val step_length_10_30_ratio = NumberUtils.formatDouble(step_length_10_30 / session_count, 2) 52 | val step_length_30_60_ratio = NumberUtils.formatDouble(step_length_30_60 / session_count, 2) 53 | val step_length_60_ratio = NumberUtils.formatDouble(step_length_60 / session_count, 2) 54 | 55 | //数据封装 56 | val stat = SessionAggrStat(taskUUID, session_count.toInt, visit_length_1s_3s_ratio, visit_length_4s_6s_ratio, visit_length_7s_9s_ratio, 57 | visit_length_10s_30s_ratio, visit_length_30s_60s_ratio, visit_length_1m_3m_ratio, 58 | visit_length_3m_10m_ratio, visit_length_10m_30m_ratio, visit_length_30m_ratio, 59 | step_length_1_3_ratio, step_length_4_6_ratio, step_length_7_9_ratio, 60 | step_length_10_30_ratio, step_length_30_60_ratio, step_length_60_ratio) 61 | 62 | val sessionRatioRDD = sparkSession.sparkContext.makeRDD(Array(stat)) 63 | 64 | //写入数据库 65 | import sparkSession.implicits._ 66 | sessionRatioRDD.toDF().write 67 | .format("jdbc") 68 | .option("url", ConfigurationManager.config.getString(Constants.JDBC_URL)) 69 | .option("user", ConfigurationManager.config.getString(Constants.JDBC_USER)) 70 | .option("password", ConfigurationManager.config.getString(Constants.JDBC_PASSWORD)) 71 | .option("dbtable", "session_stat_ratio_0416") 72 | .mode(SaveMode.Append) 73 | .save() 74 | sessionRatioRDD; 75 | } 76 | 77 | def filterInfo(finalInfo: RDD[(String, String)],task:JSONObject,accumulator:sessionAccumulator) = { 78 | //1.获取限制条件 79 | //获取限制条件的基本信息 80 | val startAge = task.get(Constants.PARAM_START_AGE); 81 | val endAge = task.get( Constants.PARAM_END_AGE); 82 | val professionals = task.get(Constants.PARAM_PROFESSIONALS) 83 | val cities = task.get(Constants.PARAM_CITIES) 84 | val sex = task.get(Constants.PARAM_SEX) 85 | val keywords =task.get(Constants.PARAM_KEYWORDS) 86 | val categoryIds = task.get(Constants.PARAM_CATEGORY_IDS) 87 | 88 | //拼接基本条件 89 | var filterInfo = (if(startAge != null) Constants.PARAM_START_AGE + "=" + startAge + "|" else "") + 90 | (if (endAge != null) Constants.PARAM_END_AGE + "=" + endAge + "|" else "") + 91 | (if (professionals != null) Constants.PARAM_PROFESSIONALS + "=" + professionals + "|" else "") + 92 | (if (cities != null) Constants.PARAM_CITIES + "=" + cities + "|" else "") + 93 | (if (sex != null) Constants.PARAM_SEX + "=" + sex + "|" else "") + 94 | (if (keywords != null) Constants.PARAM_KEYWORDS + "=" + keywords + "|" else "") + 95 | (if (categoryIds != null) Constants.PARAM_CATEGORY_IDS + "=" + categoryIds else "") 96 | 97 | if(filterInfo.endsWith("\\|")) 98 | filterInfo = filterInfo.substring(0, filterInfo.length - 1) 99 | 100 | finalInfo.filter{ 101 | case (sessionId,fullInfo)=>{ 102 | var success=true; 103 | if(!ValidUtils.between(fullInfo, Constants.FIELD_AGE, filterInfo, Constants.PARAM_START_AGE, Constants.PARAM_END_AGE)){ 104 | success = false 105 | }else if(!ValidUtils.in(fullInfo, Constants.FIELD_PROFESSIONAL, filterInfo, Constants.PARAM_PROFESSIONALS)){ 106 | success = false 107 | }else if(!ValidUtils.in(fullInfo, Constants.FIELD_CITY, filterInfo, Constants.PARAM_CITIES)){ 108 | success = false 109 | }else if(!ValidUtils.equal(fullInfo, Constants.FIELD_SEX, filterInfo, Constants.PARAM_SEX)){ 110 | success = false 111 | }else if(!ValidUtils.in(fullInfo, Constants.FIELD_SEARCH_KEYWORDS, filterInfo, Constants.PARAM_KEYWORDS)){ 112 | success = false 113 | }else if(!ValidUtils.in(fullInfo, Constants.FIELD_CLICK_CATEGORY_IDS, filterInfo, Constants.PARAM_CATEGORY_IDS)){ 114 | success = false 115 | } 116 | //跟新累加器 117 | if (success){ 118 | //先累加总的session数量 119 | accumulator.add(Constants.SESSION_COUNT); 120 | val visitLength=StringUtil.getFieldFromConcatString(fullInfo,"\\|",Constants.FIELD_VISIT_LENGTH).toLong; 121 | val stepLength=StringUtil.getFieldFromConcatString(fullInfo,"\\|",Constants.FIELD_STEP_LENGTH).toLong; 122 | 123 | calculateVisitLength(visitLength,accumulator); 124 | calculateStepLength(stepLength,accumulator); 125 | } 126 | success; 127 | } 128 | } 129 | } 130 | def calculateVisitLength(visitLength: Long, sessionStatisticAccumulator: sessionAccumulator) = { 131 | if(visitLength >= 1 && visitLength <= 3){ 132 | sessionStatisticAccumulator.add(Constants.TIME_PERIOD_1s_3s) 133 | }else if(visitLength >=4 && visitLength <= 6){ 134 | sessionStatisticAccumulator.add(Constants.TIME_PERIOD_4s_6s) 135 | }else if (visitLength >= 7 && visitLength <= 9) { 136 | sessionStatisticAccumulator.add(Constants.TIME_PERIOD_7s_9s) 137 | } else if (visitLength >= 10 && visitLength <= 30) { 138 | sessionStatisticAccumulator.add(Constants.TIME_PERIOD_10s_30s) 139 | } else if (visitLength > 30 && visitLength <= 60) { 140 | sessionStatisticAccumulator.add(Constants.TIME_PERIOD_30s_60s) 141 | } else if (visitLength > 60 && visitLength <= 180) { 142 | sessionStatisticAccumulator.add(Constants.TIME_PERIOD_1m_3m) 143 | } else if (visitLength > 180 && visitLength <= 600) { 144 | sessionStatisticAccumulator.add(Constants.TIME_PERIOD_3m_10m) 145 | } else if (visitLength > 600 && visitLength <= 1800) { 146 | sessionStatisticAccumulator.add(Constants.TIME_PERIOD_10m_30m) 147 | } else if (visitLength > 1800) { 148 | sessionStatisticAccumulator.add(Constants.TIME_PERIOD_30m) 149 | } 150 | } 151 | 152 | def calculateStepLength(stepLength: Long, sessionStatisticAccumulator: sessionAccumulator) = { 153 | if(stepLength >=1 && stepLength <=3){ 154 | sessionStatisticAccumulator.add(Constants.STEP_PERIOD_1_3) 155 | }else if (stepLength >= 4 && stepLength <= 6) { 156 | sessionStatisticAccumulator.add(Constants.STEP_PERIOD_4_6) 157 | } else if (stepLength >= 7 && stepLength <= 9) { 158 | sessionStatisticAccumulator.add(Constants.STEP_PERIOD_7_9) 159 | } else if (stepLength >= 10 && stepLength <= 30) { 160 | sessionStatisticAccumulator.add(Constants.STEP_PERIOD_10_30) 161 | } else if (stepLength > 30 && stepLength <= 60) { 162 | sessionStatisticAccumulator.add(Constants.STEP_PERIOD_30_60) 163 | } else if (stepLength > 60) { 164 | sessionStatisticAccumulator.add(Constants.STEP_PERIOD_60) 165 | } 166 | } 167 | def getUserInfo(session: SparkSession) = { 168 | import session.implicits._; 169 | val ds=session.read.parquet("hdfs://hadoop1:9000/data/user_Info").as[UserInfo].map(item=>(item.user_id,item)); 170 | ds.rdd; 171 | } 172 | 173 | def basicActions(session:SparkSession,task:JSONObject)={ 174 | import session.implicits._; 175 | val df=session.read.parquet("hdfs://hadoop1:9000/data/user_visit_action").as[UserVisitAction]; 176 | df.filter(item=>{ 177 | val date=item.action_time; 178 | val start=task.getString(Constants.PARAM_START_DATE); 179 | val end=task.getString(Constants.PARAM_END_DATE); 180 | date>=start&&date<=end; 181 | }) 182 | df.rdd; 183 | } 184 | 185 | def AggActionGroup(groupBasicActions: RDD[(String, Iterable[UserVisitAction])])={ 186 | groupBasicActions.map{ 187 | case (sessionId,actions)=>{ 188 | var userId = -1L 189 | 190 | var startTime:Date = null 191 | var endTime:Date = null 192 | 193 | var stepLength = 0 194 | 195 | val searchKeywords = new StringBuffer("") 196 | val clickCategories = new StringBuffer("") 197 | 198 | //循环遍历actions,更新信息 199 | for (action<-actions){ 200 | if(userId == -1L){ 201 | userId=action.user_id; 202 | } 203 | val time=DateUtils.parseTime(action.action_time); 204 | if (startTime==null||startTime.after(time))startTime=time; 205 | if (endTime==null||endTime.before(time))endTime=time; 206 | 207 | val key=action.search_keyword; 208 | 209 | if (!StringUtils.isEmpty(key) && !searchKeywords.toString.contains(key))searchKeywords.append(key+","); 210 | 211 | val click=action.click_category_id; 212 | if ( click!= -1L && clickCategories.toString.contains(click))searchKeywords.append(click+","); 213 | 214 | stepLength+=1; 215 | 216 | } 217 | // searchKeywords.toString.substring(0, searchKeywords.toString.length) 218 | val searchKw = StringUtil.trimComma(searchKeywords.toString) 219 | val clickCg = StringUtil.trimComma(clickCategories.toString) 220 | 221 | val visitLength = (endTime.getTime - startTime.getTime) / 1000 222 | 223 | val aggrInfo = Constants.FIELD_SESSION_ID + "=" + sessionId + "|" + 224 | Constants.FIELD_SEARCH_KEYWORDS + "=" + searchKw + "|" + 225 | Constants.FIELD_CLICK_CATEGORY_IDS + "=" + clickCg + "|" + 226 | Constants.FIELD_VISIT_LENGTH + "=" + visitLength + "|" + 227 | Constants.FIELD_STEP_LENGTH + "=" + stepLength + "|" + 228 | Constants.FIELD_START_TIME + "=" + DateUtils.formatTime(startTime) 229 | 230 | (userId, aggrInfo) 231 | 232 | } 233 | } 234 | } 235 | 236 | def AggInfoAndActions(aggUserActions: RDD[(Long, String)], userInfo: RDD[(Long, UserInfo)])={ 237 | //根据user_id建立映射关系===>用Join算子 238 | userInfo.join(aggUserActions).map{ 239 | case (userId,(userInfo: UserInfo,aggrInfo))=>{ 240 | val age = userInfo.age 241 | val professional = userInfo.professional 242 | val sex = userInfo.sex 243 | val city = userInfo.city 244 | val fullInfo = aggrInfo + "|" + 245 | Constants.FIELD_AGE + "=" + age + "|" + 246 | Constants.FIELD_PROFESSIONAL + "=" + professional + "|" + 247 | Constants.FIELD_SEX + "=" + sex + "|" + 248 | Constants.FIELD_CITY + "=" + city 249 | 250 | val sessionId = StringUtil.getFieldFromConcatString(aggrInfo, "\\|", Constants.FIELD_SESSION_ID) 251 | 252 | (sessionId, fullInfo) 253 | } 254 | } 255 | } 256 | 257 | } 258 | -------------------------------------------------------------------------------- /session/src/main/java/server/serverThree.scala: -------------------------------------------------------------------------------- 1 | package server 2 | 3 | import commons.constant.Constants 4 | import commons.model.{Top10Category, UserVisitAction} 5 | import commons.utils.StringUtil 6 | import org.apache.spark.rdd.RDD 7 | import org.apache.spark.sql.SparkSession 8 | 9 | import scala.collection.mutable.ArrayBuffer 10 | 11 | class serverThree extends Serializable { 12 | 13 | 14 | 15 | 16 | def top10PopularCategories(sparkSession: SparkSession, 17 | taskUUID: String, 18 | sessionId2FilterActionRDD: RDD[(String, UserVisitAction)])={ 19 | //1.将所有基本数据,转化成(cId,cId)格式的总数据 20 | var cid2CidRdd=sessionId2FilterActionRDD.flatMap{ 21 | case(sessionId,action: UserVisitAction)=>{ 22 | val categoryBuffer=new ArrayBuffer[(Long,Long)](); 23 | // 点击行为 24 | if(action.click_category_id != -1){ 25 | categoryBuffer += ((action.click_category_id, action.click_category_id)) 26 | }else if(action.order_category_ids != null){ 27 | for(orderCid <- action.order_category_ids.split(",")) 28 | categoryBuffer += ((orderCid.toLong, orderCid.toLong)) 29 | }else if(action.pay_category_ids != null){ 30 | for(payCid <- action.pay_category_ids.split(",")) 31 | categoryBuffer += ((payCid.toLong, payCid.toLong)) 32 | } 33 | categoryBuffer 34 | } 35 | } 36 | cid2CidRdd=cid2CidRdd.distinct(); 37 | // 第二步:统计品类的点击次数、下单次数、付款次数 38 | val cid2ClickCountRDD = getClickCount(sessionId2FilterActionRDD) 39 | 40 | val cid2OrderCountRDD = getOrderCount(sessionId2FilterActionRDD) 41 | 42 | val cid2PayCountRDD = getPayCount(sessionId2FilterActionRDD) 43 | 44 | //3.根据左连接,将总的数据cid2CidRdd和第二部得到的数据一个个进行连接,创造出cid:str 45 | //其中,str代表count=32|order=15....... 46 | val cid2FullCountRDD = getFullCount(cid2CidRdd,cid2ClickCountRDD,cid2OrderCountRDD,cid2PayCountRDD); 47 | 48 | //4.自定义排序器,将数据转化为(sortKey,info) 49 | val sortRDD=cid2FullCountRDD.map{ 50 | case (cId,info)=>{ 51 | val clickCount = StringUtil.getFieldFromConcatString(info, "\\|", Constants.FIELD_CLICK_COUNT).toLong 52 | val orderCount = StringUtil.getFieldFromConcatString(info, "\\|", Constants.FIELD_ORDER_COUNT).toLong 53 | val payCount = StringUtil.getFieldFromConcatString(info, "\\|", Constants.FIELD_PAY_COUNT).toLong 54 | 55 | val sortKey = SortKey(clickCount, orderCount, payCount) 56 | (sortKey, info) 57 | } 58 | } 59 | //5.排序 60 | val top10=sortRDD.sortByKey(false).take(10); 61 | //6.封装数据,写进数据库 62 | val top10CategoryRDD = sparkSession.sparkContext.makeRDD(top10).map{ 63 | case (sortKey, countInfo) => 64 | val cid = StringUtil.getFieldFromConcatString(countInfo, "\\|", Constants.FIELD_CATEGORY_ID).toLong 65 | val clickCount = sortKey.clickCount 66 | val orderCount = sortKey.orderCount 67 | val payCount = sortKey.payCount 68 | Top10Category(taskUUID, cid, clickCount, orderCount, payCount) 69 | } 70 | 71 | //保存到数据库 72 | /* import sparkSession.implicits._ 73 | top10CategoryRDD.toDF().write 74 | .format("jdbc") 75 | .option("url", ConfigurationManager.config.getString(Constants.JDBC_URL)) 76 | .option("user", ConfigurationManager.config.getString(Constants.JDBC_USER)) 77 | .option("password", ConfigurationManager.config.getString(Constants.JDBC_PASSWORD)) 78 | .option("dbtable", "top10_category_0308") 79 | .mode(SaveMode.Append) 80 | .save*/ 81 | top10 82 | 83 | } 84 | 85 | def getFullCount(cid2CidRDD: RDD[(Long, Long)], cid2ClickCountRDD: RDD[(Long, Long)], cid2OrderCountRDD: RDD[(Long, Long)], cid2PayCountRDD: RDD[(Long, Long)]) = { 86 | val cid2ClickInfoRDD=cid2CidRDD.leftOuterJoin(cid2ClickCountRDD).map{ 87 | case (cId,(categoryId,option))=>{ 88 | val clickCount=if (option.isDefined)option.getOrElse(0); 89 | val aggrCount = Constants.FIELD_CATEGORY_ID + "=" + cId + "|" + 90 | Constants.FIELD_CLICK_COUNT + "=" + clickCount 91 | 92 | (cId, aggrCount) 93 | } 94 | } 95 | val cid2OrderInfoRDD = cid2ClickInfoRDD.leftOuterJoin(cid2OrderCountRDD).map{ 96 | case (cid, (clickInfo, option)) => 97 | val orderCount = if(option.isDefined) option.get else 0 98 | val aggrInfo = clickInfo + "|" + 99 | Constants.FIELD_ORDER_COUNT + "=" + orderCount 100 | 101 | (cid, aggrInfo) 102 | } 103 | 104 | val cid2PayInfoRDD = cid2OrderInfoRDD.leftOuterJoin(cid2PayCountRDD).map{ 105 | case (cid, (orderInfo, option)) => 106 | val payCount = if(option.isDefined) option.get else 0 107 | val aggrInfo = orderInfo + "|" + 108 | Constants.FIELD_PAY_COUNT + "=" + payCount 109 | (cid, aggrInfo) 110 | } 111 | cid2PayInfoRDD; 112 | 113 | } 114 | 115 | 116 | def getClickCount(sessionId2FilterActionRDD: RDD[(String, UserVisitAction)])={ 117 | val clickFilterRDD=sessionId2FilterActionRDD.filter{ 118 | case (sessionId,action: UserVisitAction)=>{ 119 | action.click_category_id != -1L; 120 | } 121 | } 122 | val clickNumRDD = clickFilterRDD.map{ 123 | case (sessionId, action) => (action.click_category_id, 1L) 124 | } 125 | 126 | clickNumRDD.reduceByKey(_+_) 127 | } 128 | def getOrderCount(sessionId2FilterActionRDD: RDD[(String, UserVisitAction)])={ 129 | val orderFilterRDD=sessionId2FilterActionRDD.filter(item=>item._2.order_category_ids!=null) 130 | val orderNumRDD=orderFilterRDD.flatMap{ 131 | case (sessionId,action)=>{ 132 | 133 | for(id<-action.order_category_ids.split(",")){ 134 | 135 | } 136 | action.order_category_ids.split(",").map(item=>(item.toLong,1L)); 137 | } 138 | } 139 | orderNumRDD.reduceByKey(_+_); 140 | } 141 | def getPayCount(sessionId2FilterActionRDD: RDD[(String, UserVisitAction)]) = { 142 | val payFilterRDD = sessionId2FilterActionRDD.filter(item => item._2.pay_category_ids != null) 143 | 144 | val payNumRDD = payFilterRDD.flatMap{ 145 | case (sid, action) => 146 | action.pay_category_ids.split(",").map(item => (item.toLong, 1L)) 147 | } 148 | 149 | payNumRDD.reduceByKey(_+_) 150 | } 151 | 152 | } 153 | -------------------------------------------------------------------------------- /session/src/main/java/server/serverTwo.scala: -------------------------------------------------------------------------------- 1 | package server 2 | 3 | import com.alibaba.fastjson.JSONObject 4 | import commons.conf.ConfigurationManager 5 | import commons.constant.Constants 6 | import commons.model.SessionRandomExtract 7 | import commons.utils.{DateUtils, StringUtil} 8 | import org.apache.spark.rdd.RDD 9 | import org.apache.spark.sql.{SaveMode, SparkSession} 10 | 11 | import scala.collection.mutable 12 | import scala.collection.mutable.{ArrayBuffer, ListBuffer} 13 | import scala.util.Random 14 | 15 | class serverTwo extends Serializable { 16 | 17 | 18 | def generateRandomIndexList(extractDay: Int, oneDay: Long, hourCountMap: mutable.HashMap[String, Long], hourListMap: mutable.HashMap[String, ListBuffer[Int]])={ 19 | //计算每个小时要抽取多少条数据 20 | for ((hour,cnt)<-hourCountMap){ 21 | val curHour=((cnt/oneDay)*extractDay).toInt; 22 | val Random=new Random(); 23 | hourListMap.get(hour) match { 24 | case None => hourListMap(hour)=new ListBuffer[Int]; 25 | for (i<-0 until curHour.toInt){ 26 | var index=Random.nextInt(cnt.toInt); 27 | while(hourListMap(hour).contains(index)){ 28 | index=Random.nextInt(cnt.toInt); 29 | } 30 | hourListMap(hour).append(index); 31 | } 32 | 33 | case Some(value) => 34 | for (i<-0 until curHour.toInt){ 35 | var index=Random.nextInt(cnt.toInt); 36 | while(hourListMap(hour).contains(index)){ 37 | index=Random.nextInt(cnt.toInt); 38 | } 39 | hourListMap(hour).append(index); 40 | 41 | } 42 | } 43 | } 44 | } 45 | 46 | def GetextraSession(session: SparkSession, filterInfo: RDD[(String,String)], task: JSONObject, taskUUID: String)={ 47 | //1.数据格式转化成(date,info) 48 | val dateHour2FullInfoRDD=filterInfo.map{ 49 | case (sessionId,info)=>{ 50 | val date1=StringUtil.getFieldFromConcatString(info, "\\|", Constants.FIELD_START_TIME) 51 | val date=DateUtils.getDateHour(date1); 52 | (date,info); 53 | } 54 | } 55 | //2.统计同一时间总共的session数量,结果为map结构 56 | val hourCountMap=dateHour2FullInfoRDD.countByKey(); 57 | 58 | //3.将数据转化为date->map(hour,count)类型 59 | val dataHourCount=new mutable.HashMap[String,mutable.HashMap[String,Long]]; 60 | for ((k,v)<-hourCountMap){ 61 | val day=k.split("_")(0); 62 | val hour=k.split("_")(1); 63 | dataHourCount.get(day) match { 64 | case None =>dataHourCount(day)=new mutable.HashMap[String,Long]; 65 | dataHourCount(day)+=(hour->v); 66 | case Some(value) => 67 | dataHourCount(day)+=(hour->v); 68 | } 69 | } 70 | //4.获取抽取session的索引,用map(date,map(hour,list))来存储 71 | val ExtractIndexListMap=new mutable.HashMap[String,mutable.HashMap[String,ListBuffer[Int]]]; 72 | val sumday=dataHourCount.size; 73 | val extractDay=100/sumday;//平均每天 74 | 75 | for ((day,map)<-dataHourCount){ 76 | val oneDay=map.values.sum; 77 | ExtractIndexListMap.get(day) match { 78 | case None => ExtractIndexListMap(day)=new mutable.HashMap[String, ListBuffer[Int]] 79 | generateRandomIndexList(extractDay, oneDay, map, ExtractIndexListMap(day)) 80 | case Some(value) => 81 | generateRandomIndexList(extractDay, oneDay, map, ExtractIndexListMap(day)) 82 | } 83 | } 84 | /* 85 | 到目前,我们已经得到了: 86 | 1.每一个小时里总共有多少条session->dataHourCount 87 | 2.每一个小时要抽取的session的索引->ExtractIndexListMap 88 | */ 89 | //5.根据ExtractIndexListMap抽取session 90 | val dateHour2GroupRDD = dateHour2FullInfoRDD.groupByKey() 91 | val extractSessionRDD=dateHour2GroupRDD.flatMap{ 92 | case (dateHour,iterableFullInfo)=>{ 93 | val day = dateHour.split("_")(0) 94 | val hour = dateHour.split("_")(1) 95 | val indexList=ExtractIndexListMap.get(day).get(hour); 96 | val extractSessionArrayBuffer = new ArrayBuffer[SessionRandomExtract]() 97 | 98 | var index = 0 99 | 100 | for(fullInfo <- iterableFullInfo){ 101 | if(indexList.contains(index)){ 102 | val sessionId = StringUtil.getFieldFromConcatString(fullInfo, "\\|", Constants.FIELD_SESSION_ID) 103 | val startTime = StringUtil.getFieldFromConcatString(fullInfo, "\\|",Constants.FIELD_START_TIME) 104 | val searchKeywords = StringUtil.getFieldFromConcatString(fullInfo, "\\|", Constants.FIELD_SEARCH_KEYWORDS) 105 | val clickCategories = StringUtil.getFieldFromConcatString(fullInfo, "\\|", Constants.FIELD_CLICK_CATEGORY_IDS) 106 | 107 | val extractSession = SessionRandomExtract(taskUUID , sessionId, startTime, searchKeywords, clickCategories) 108 | 109 | extractSessionArrayBuffer += extractSession 110 | } 111 | index += 1 112 | } 113 | extractSessionArrayBuffer 114 | 115 | } 116 | } 117 | extractSessionRDD.foreach(println); 118 | //6.写进数据库 119 | /*import session.implicits._; 120 | extractSessionRDD.toDF().write 121 | .format("jdbc") 122 | .option("url", ConfigurationManager.config.getString(Constants.JDBC_URL)) 123 | .option("user",ConfigurationManager.config.getString(Constants.JDBC_USER)) 124 | .option("password", ConfigurationManager.config.getString(Constants.JDBC_PASSWORD)) 125 | .option("dbtable", "session_extract_0308") 126 | .mode(SaveMode.Append) 127 | .save()*/ 128 | } 129 | 130 | } 131 | -------------------------------------------------------------------------------- /session/target/classes/META-INF/session.kotlin_module: -------------------------------------------------------------------------------- 1 |  -------------------------------------------------------------------------------- /session/target/classes/scala/sessionAccumulator.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/session/target/classes/scala/sessionAccumulator.class -------------------------------------------------------------------------------- /session/target/classes/scala/sessionStat$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/session/target/classes/scala/sessionStat$.class -------------------------------------------------------------------------------- /session/target/classes/scala/sessionStat.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/session/target/classes/scala/sessionStat.class -------------------------------------------------------------------------------- /session/target/classes/server/SortKey$.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/session/target/classes/server/SortKey$.class -------------------------------------------------------------------------------- /session/target/classes/server/SortKey.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/session/target/classes/server/SortKey.class -------------------------------------------------------------------------------- /session/target/classes/server/serverFive.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/session/target/classes/server/serverFive.class -------------------------------------------------------------------------------- /session/target/classes/server/serverFour.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/session/target/classes/server/serverFour.class -------------------------------------------------------------------------------- /session/target/classes/server/serverOne$$typecreator4$1.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/session/target/classes/server/serverOne$$typecreator4$1.class -------------------------------------------------------------------------------- /session/target/classes/server/serverOne$$typecreator4$2.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/session/target/classes/server/serverOne$$typecreator4$2.class -------------------------------------------------------------------------------- /session/target/classes/server/serverOne$$typecreator5$1.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/session/target/classes/server/serverOne$$typecreator5$1.class -------------------------------------------------------------------------------- /session/target/classes/server/serverOne$$typecreator5$2.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/session/target/classes/server/serverOne$$typecreator5$2.class -------------------------------------------------------------------------------- /session/target/classes/server/serverOne.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/session/target/classes/server/serverOne.class -------------------------------------------------------------------------------- /session/target/classes/server/serverThree.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/session/target/classes/server/serverThree.class -------------------------------------------------------------------------------- /session/target/classes/server/serverTwo.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yongyupei/spark-shopAnalyze/6631b38a4858ceec792e72f4cfc05f1e00f8d90e/session/target/classes/server/serverTwo.class --------------------------------------------------------------------------------