├── .gitignore ├── README.md ├── configFiles ├── core-site.xml ├── hdfs-site.xml ├── hive-site.xml └── oldCluster │ ├── core-site.xml │ ├── hdfs-site.xml │ └── hive-site.xml ├── monitor_camera_info ├── monitor_flow_action ├── pom.xml └── src ├── main ├── java │ └── com │ │ └── traffic │ │ ├── load │ │ └── data │ │ │ ├── MockData.java │ │ │ └── MockRealTimeData.java │ │ ├── producedate2hive │ │ ├── Data2File.java │ │ └── Data2Hive.java │ │ └── spark │ │ ├── arearoadflow │ │ ├── AreaTop3RoadFlowAnalyze.java │ │ ├── ConcatStringStringUDF.java │ │ ├── GroupConcatDistinctUDAF.java │ │ ├── MonitorOneStepConvertRateAnalyze.java │ │ ├── RandomPrefixUDF.java │ │ └── RemoveRandomPrefixUDF.java │ │ ├── conf │ │ └── ConfigurationManager.java │ │ ├── constant │ │ └── Constants.java │ │ ├── dao │ │ ├── IAreaDao.java │ │ ├── ICarTrackDAO.java │ │ ├── IMonitorDAO.java │ │ ├── IRandomExtractDAO.java │ │ ├── ITaskDAO.java │ │ ├── IWithTheCarDAO.java │ │ ├── factory │ │ │ └── DAOFactory.java │ │ └── impl │ │ │ ├── AreaDaoImpl.java │ │ │ ├── CarTrackDAOImpl.java │ │ │ ├── MonitorDAOImpl.java │ │ │ ├── RandomExtractDAOImpl.java │ │ │ ├── TaskDAOImpl.java │ │ │ └── WithTheCarDAOImpl.java │ │ ├── domain │ │ ├── Area.java │ │ ├── CarInfoPer5M.java │ │ ├── CarTrack.java │ │ ├── MonitorState.java │ │ ├── RandomExtractCar.java │ │ ├── RandomExtractMonitorDetail.java │ │ ├── Task.java │ │ ├── Top10SpeedPerMonitor.java │ │ ├── TopNMonitor2CarCount.java │ │ └── TopNMonitorDetailInfo.java │ │ ├── jdbc │ │ └── JDBCHelper.java │ │ ├── rtmroad │ │ └── RoadRealTimeAnalyze.java │ │ ├── skynet │ │ ├── MonitorAndCameraStateAccumulator.java │ │ ├── MonitorAndCameraStateAccumulator2.java │ │ ├── MonitorCarTrack.java │ │ ├── MonitorFlowAnalyze.java │ │ ├── RandomExtractCars.java │ │ ├── SelfDefineAccumulator.java │ │ ├── SpeedSortKey.java │ │ └── WithTheCarAnalyze.java │ │ └── util │ │ ├── DateUtils.java │ │ ├── NumberUtils.java │ │ ├── ParamUtils.java │ │ ├── SparkUtils.java │ │ └── StringUtils.java ├── resources │ └── my.properties └── scala │ └── com │ └── traffic │ ├── spark │ ├── areaRoadFlow │ │ ├── AreaTop3RoadFlowAnalyzeScala1.scala │ │ ├── AreaTop3RoadFlowAnalyzeScala2.scala │ │ ├── GroupConcatDistinctUDAFScala.scala │ │ ├── MonitorOneStepConvertRateAnalyzeScala.scala │ │ ├── TestGroupByKey.scala │ │ └── Test_DF_DS_RDD_Speed.scala │ ├── rtmroad │ │ ├── RedisClient.scala │ │ ├── RoadRealTimeAnalyzeScala1.scala │ │ └── RoadRealTimeAnalyzeScala2.scala │ └── skynet │ │ ├── MonitorFlowAnalyzeScala.scala │ │ └── SpeedSortKeyScala.scala │ ├── test │ ├── MockDataByMysql.scala │ ├── MockDataScala.scala │ └── MyMockRealTimeDataScala.scala │ └── util │ ├── SelfDateCarCountScala.scala │ ├── SelfDateCarInfosScala.scala │ ├── SelfDefineAccumulatorScala.scala │ └── SparkUtilsScala.scala └── test └── resources ├── Spark.mm ├── Spark调优.docx ├── hive ├── createHiveTab.sql └── 提交hive运行的命令.txt ├── img ├── 卡扣流量转换率.jpg ├── 卡扣监控.jpg ├── 双重聚合.jpg ├── 抽取车辆.jpg ├── 提高shuffle并行度.jpg ├── 数据处理.jpg ├── 数据本地化级别.jpg ├── 数据本地化级别调优.jpg ├── 调节堆外内存.jpg ├── 车辆碰撞.jpg ├── 车辆高速通过的卡扣topn.jpg ├── 过程.jpg ├── 采样倾斜key并分拆join操作.jpg └── 随机抽取车辆.jpg ├── mysql └── traffic.sql ├── 任务提交.jpg ├── 卡扣监控.jpg ├── 大数据综合业务平台.pdf ├── 数据处理.jpg ├── 车流量监控项目.pdf ├── 车流量监控项目v1.2.pdf └── 项目.txt /.gitignore: -------------------------------------------------------------------------------- 1 | *.iml 2 | target 3 | .idea 4 | *.log 5 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 车流量监控 2 | 3 | ## 前言 4 | 5 | 目的 6 | 1.对业务场景抽象,熟练Spark编码 7 | 2.增加自定义累加器,自定义UDF 8 | 3.Spark 优化方式 9 | 10 | 项目 11 | 数据处理架构 12 | 13 | 14 | ## 模块介绍 15 | 16 | * 卡扣流量分析 Spark Core 17 | 18 | * 卡扣车流量转化率 Spark Core 19 | 20 | * 各区域车流量最高top5的道路统计 SparkSQL 21 | 22 | * 稽查布控,道路实时拥堵统计 SparkStreaming 23 | 24 | ## hive表 25 | `monitor_flow_action`表 26 | – date 日期 天 27 | – monitor_id 卡口号 28 | – camera_id 摄像头编号 29 | – car 车牌 30 | – action_time 某个摄像头拍摄时间 s 31 | – speed 通过卡扣的速度 32 | – road_id 道路id 33 | – area_id 区域ID 34 | 35 | `monitor_camera_info`表 某一个卡扣对应的摄像头编号 36 | – monitor_id:卡扣编号 37 | – camera_id:摄像头编号 38 | 39 | 数据加载hive中 40 | 1). 创建表,加载数据load data `Data2File` 41 | > hive -f createHiveTab.sql 42 | 43 | 2). 集群中提交spark作业,使用代码生成到hive `Data2Hive` 44 | 45 | ## 大数据开发流程 46 | * 数据调研(对底层的数据的表结构进行调研,分析和研究) 47 | * 需求分析(与PM讨论需求,画原型图 axure) 48 | * 基于讨论出来的结果做出技术方案(某个难点用什么技术,数据库选型) 49 | * 具体实施 50 | 51 | ## 功能点 52 | * 根据使用者(平台使用者)指定的某些条件,筛选出指定的一批卡扣信息(比如根据区域、时间筛选) 53 | 54 | * 检测卡扣状态,对于筛选出来的所有的卡口(不代表一个摄像头)信息统计 55 | • 卡口正常数 56 | • 异常数 57 | • camera的正常数 58 | • camera的异常数 59 | • camera的详细信息( monitor_id:camera_id) 60 | 61 | * 车流量最多的TonN卡扣号 62 | • 获取每一个卡扣的详细信息( Top5 ) 63 | 64 | * 随机抽取N个车辆信息,对这些数据可以进行多维度分析(因为随机抽取出来的N个车辆信息可以很权威的代表整个 65 | 区域的车辆) 66 | 67 | * 计算出经常高速通过的TopN卡口 (查看哪些卡扣经常被高速通过,高速,中速,正常,低速 根据三个速度段进行四 68 | 次排序,高速通过的车辆数相同就比较中速通过的车辆数,以此来推) 69 | 70 | * 跟车分析 71 | 72 | ## 需求分析 73 | 74 | ### 按条件筛选卡扣信息 75 | • 可以指定 不同的条件,时间范围、区域范围、卡扣号等 可以灵活的分析不同区域的卡扣信息 76 | 77 | ### 监测卡扣状态 78 | • 对符合条件的卡扣信息,可以动态的检查每一个卡扣的状态,查看卡扣是否正常工作,也可以查看摄像头 79 | 80 | ### 车流量最多的TonN卡扣 81 | • 查看哪些卡扣的车流量最高,为什么会出现这么高的车流量。分析原因,例如今天出城的车辆非常多,啥原因,今天进 82 | 城的车辆非常多,啥原因? 要造反? 这个功能点里面也会拿到具体的车辆的信息,分析一下是京牌车造成的还是外地 83 | 车牌? 84 | 85 | ### 在符合条件的卡扣信息中随机抽取N个车辆信息 86 | 87 | • 随机抽取N辆车的信息,可以权威的代表整个区域的车辆,这时候可以分析这些车的轨迹,看一下在不同的时间点车辆 88 | 的流动方向。以便于道路的规划。 89 | 90 | ### 计算出经常高速通过的TopN卡口 91 | 92 | • 统计出是否存在飙车现象,或者经常进行超速行驶,可以在此处安装违章拍摄设备 93 | 94 | ### 跟车分析 95 | • 计算出所有车是否被跟踪过,然后将结果存储在MySQL中,以便后期进行查询 96 | 97 | ## 项目分析 98 | monitor_flow_action 监控数据表 99 | 100 | monitor_camera_info 卡扣与摄像头基本关系表 101 | 102 | ### 1.卡扣监控 103 | 104 | #### 统计: 正常的卡扣个数,异常的卡扣个数,正常的摄像头个数,异常的摄像头个数,异常的摄像头详细信息 105 | 106 | #### 正常卡扣个数: 107 | 108 | monitor_camera_info 基本关系表中卡扣与摄像头的关系与在monitor_flow_action 监控数据表 中,卡扣与摄像头的关系完全对应上 109 | 0001:11111,22222 110 | 0001 11111 xxx 111 | 0001 22222 xxx 112 | 113 | RDD思路-正常的卡扣数为例: 114 | monitor_flow_action表 -> RDD -> RDD - RDD 115 | monitor_camera_info表 -> RDD -> RDD 116 | 117 | #### 异常的卡扣个数: 118 | 119 | 1.monitor_camera_info 基本关系表中 卡扣 与摄像头的关系,在监控的数据表中 一条都没有对应。 120 | 121 | 2.monitor_camera_info 基本关系表中 卡扣 与摄像头的关系,在监控的数据表中 部分数据有对应。 122 | 123 | #### 正常的摄像头个数: 124 | 125 | #### 异常的摄像头个数: 126 | 127 | #### 异常的摄像头详细信息:0001:11111,22222,33333 128 | 129 | ~0004:76789,27449,87911,61106,45624,37726,09506 130 | ~0001:70037,23828,34361,92206,76657,26608 131 | ~0003:36687,99260,49613,97165 132 | ~0006:82302,11645,73565,36440 133 | ~0002:60478,07738,53139,75127,16494,48312 134 | ~0008:34144,27504,83395,62222,49656,18640 135 | ~0007:19179,72906,55656,60720,74161,85939,51743,40565,13972,79216,35128,27369,84616,09553 136 | ~0000:67157,85327,08658,57407,64297,15568,31898,36621 137 | ~0005:09761,12853,91031,33015,52841,15425,45548,36528 138 | 139 | #### 注意: 140 | 141 | 求个数: 累加器实现(并行 分布式) 142 | 143 | 异常的摄像头信息,用累加器实现,无非拼的是字符串 144 | 145 | 更新累加器与take使用时,take算子可以触发多个job执行,可以造成累加器重复计算。 146 | 147 | ./spark-submit --master spark://node1:7077,node2:7077 --jars ../lib/fastjson-1.2.11.jar,../lib/mysql-connector-java-5.1.6.jar --class MonitorFlowAnalyze ../lib/Test.jar 1 148 | 149 | 150 | ~0001:13846,54785,51995,64341,45994,32228,82054,87746 151 | ~0003:38780,08844,03281,07183,50318,87000,16722,11604,26508,45523,46380 152 | ~0007:61833,19140,38387 153 | ~0005:63920,23464,37389,01219,96765,24844,32101,24141~ 154 | ~0004:60778,35444,35403,68811,73819,81893 155 | ~0006:09621,67028,96375,60036,91237,53743,10305 156 | ~0002:24694,01172,25945,79625,83215,72235,26855 157 | ~0008:24630,40432,96808,78708,28294 158 | ~0000:68070,12865,49505,26035,36931,38053,91868 159 | 160 | ### 2.通过车辆数最多的topN卡扣 161 | 162 | ### 3.统计topN卡扣下经过的所有车辆详细信息 163 | 164 | ### 4.车辆通过速度相对比较快的topN卡扣 165 | 车速: 166 | 120=0002->0003->0001->0002->0004->0005->0001 217 | 0001,0002----卡扣0001到卡扣0002 的车流量转化率:通过卡扣0001又通过卡扣0002的次数/通过卡扣0001的次数 2/3 218 | 0001,0002,0003 ---- 卡扣0001,0002到0003的车辆转换率:通过卡扣0001,0002,0003的次数 /通过卡扣0001,0002 219 | 0001,0002,0003,0004 -----卡扣0001,0002,0003到0004的车辆转换率:通过卡扣0001,0002,0003,0004的次数 /通过卡扣0001,0002,0003 220 | 0001,0002,0003,0004,0005 -----卡扣0001,0002,0003,0004到0005的车辆转换率:通过卡扣0001,0002,0003,0004,0005的次数 /通过卡扣0001,0002,0003,0004的次数 221 | 手动输入卡扣号: 222 | 0001,0002,0003,0004,0005 223 | 求: 224 | 0001,0002 225 | 0001,0002,0003 226 | 0001,0002,0003,0004 227 | 0001,0002,0003,0004,0005 228 | 229 | 粤A11111: 230 | ("0001",100) 231 | ("0001,0002",30) 232 | ("0001,0002,0003",10) 233 | 粤B22222: 234 | ("0001",200) 235 | ("0001,0002",100) 236 | ("0001,0002,0003",70) 237 | ("0001,0002,0003,0004",10) 238 | 239 | ### 9.实时道路拥堵情况 240 | 计算一段时间内卡扣下通过的车辆的平均速度。 241 | 这段时间不能太短,也不能太长。就计算当前时间的前五分钟 当前卡扣下通过所有车辆的平均速度。 242 | 每隔5s 计算一次当前卡扣过去5分钟 所有车辆的平均速度。 243 | 244 | SparkStreaming 窗口函数 245 | window lenth:5min 246 | slide interval:5s 247 | 248 | ### 10.动态改变广播变量 249 | `transform` `foreachRDD` 250 | 251 | 252 | ### 11.统计每个区域中车辆最多的前3道路 253 | 道路车辆:道路中的每个卡扣经过的车辆累加 254 | 255 | 天河区 元岗路1 0001=30,0002=50,0003=100,0004=20 200 256 | 天河区 元岗路2 0005=50,0006=100 150 257 | 天河区 元岗路3 100 258 | 越秀区 xxx1 200 259 | 越秀区 xxx2 150 260 | 越秀区 xxx3 100 261 | 262 | SparkSQL 263 | Hive 表 --t1 : 264 | monitor_id car road_id area_id 265 | 266 | 267 | ----- 268 | 269 | areaId area_name road_id monitor_id car ------ tmp_car_flow_basic 270 | 271 | 272 | sql: 273 | select area_name,road_id,count(car) as car_count,UDAF(monitor_id) as monitor_infos from t1 group by area_name,road_id ---- tmp_area_road_flow_count 274 | 275 | 开窗函数:row_number() over (partition by xxx order by xxx ) rank 276 | 277 | select area_name,road_id,car_count,monitor_infos, row_number() over (partition by area_name order by car_count desc ) rank from tmp_area_road_flow_count ---- tmp 278 | 279 | select area_name,road_id,car_count,monitor_infos from tmp where rank <=3 280 | 281 | 282 | ----------------------------------------------------------------------- 283 | 总sql: 284 | select 285 | area_name,road_id,car_count,monitor_infos 286 | from 287 | ( 288 | select 289 | area_name,road_id,car_count,monitor_infos, row_number() over (partition by area_id order by carCount desc ) rank 290 | from 291 | ( 292 | select 293 | area_name,road_id,count(car) as car_count ,UDAF(monitor_id) as monitor_infos 294 | from 295 | t1 296 | group by area_name,road_id 297 | ) t2 298 | ) t3 299 | where rank <=3 300 | 301 | =================================================================================================================== 302 | sql: 303 | select prefix_area_name_road_id,count(car) as car_count,UDAF(monitor_id) as monitor_infos from t1 group by prefix_area_name_road_id ---- tmp_area_road_flow_count 304 | 305 | 306 | select area_name,road_id,car_count,monitor_infos, row_number() over (partition by area_name order by car_count desc ) rank from tmp_area_road_flow_count ---- tmp 307 | 308 | select area_name,road_id,car_count,monitor_infos from tmp where rank <=3 309 | 310 | 311 | ----------------------------------------------------------------------- 312 | 总sql: 313 | select 314 | area_name,road_id,car_count,monitor_infos 315 | from 316 | ( 317 | select 318 | area_name,road_id,car_count,monitor_infos, row_number() over (partition by area_id order by carCount desc ) rank 319 | from 320 | ( 321 | select 322 | area_name,road_id,count(car) as car_count ,UDAF(monitor_id) as monitor_infos 323 | from 324 | t1 325 | group by area_name,road_id 326 | ) t2 327 | ) t3 328 | where rank <=3 329 | ### 车辆轨迹 330 | 331 | 332 | 统计卡扣0001下所有车辆的轨迹 -- take(20) 333 | 334 | 335 | 各区域车流量最高topN的道路统计 336 | 1.会将小于spark.sql.autoBroadcastJoinThreshold值(默认为10M)的表广播到executor节点,不走shuffle过程,更加高效。 337 | sqlContext.setConf("spark.sql.autoBroadcastJoinThreshold", "20971520"); //单位:字节 338 | 2.在Hive中执行sql文件: 339 | hive –f sql.sql 340 | 3.提交命令: 341 | --master spark://node1:7077,node2:7077 342 | --jars ../lib/mysql-connector-java-5.1.6.jar,../lib/fastjson-1.2.11.jar 343 | --driver-class-path ../lib/mysql-connector-java-5.1.6.jar:../lib/fastjson-1.2.11.jar 344 | ../lib/Test.jar 345 | 4 346 | 347 | 3.缉查布控,道路实时拥堵统计 348 | 动态改变广播变量的值:可以通过transform和foreachRDD 349 | 350 | 351 | 352 | 屏蔽过多黄色警告,忽略java类方法的参数 与注释; 353 | File -> Settings -> Editor -> Inspections -> java ->javadoc: 354 | 参数不一致的屏蔽: 355 | Declaration has problems in Javadoc refere 红色 改成 waring黄色 356 | 参数没有注释: 357 | Dangling Javadoc comment 去掉勾选 358 | Declaration has Javadoc problems 去掉勾选 359 | -------------------------------------------------------------------------------- /configFiles/core-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 16 | 17 | 18 | 19 | 20 | 21 | fs.defaultFS 22 | hdfs://mycluster 23 | 24 | 25 | 29 | hadoop.tmp.dir 30 | /opt/data/hadoop/ 31 | 32 | 33 | 34 | 35 | ha.zookeeper.quorum 36 | c7node3:2181,c7node4:2181,c7node5:2181 37 | 38 | 39 | 40 | -------------------------------------------------------------------------------- /configFiles/hdfs-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | dfs.nameservices 23 | mycluster 24 | 25 | 26 | 27 | dfs.permissions.enabled 28 | false 29 | 30 | 31 | 32 | dfs.ha.namenodes.mycluster 33 | nn1,nn2 34 | 35 | 36 | 37 | dfs.namenode.rpc-address.mycluster.nn1 38 | c7node1:8020 39 | 40 | 41 | 42 | dfs.namenode.rpc-address.mycluster.nn2 43 | c7node2:8020 44 | 45 | 46 | 47 | dfs.namenode.http-address.mycluster.nn1 48 | c7node1:50070 49 | 50 | 51 | 52 | dfs.namenode.http-address.mycluster.nn2 53 | c7node2:50070 54 | 55 | 56 | 57 | 58 | dfs.namenode.shared.edits.dir 59 | qjournal://c7node3:8485;c7node4:8485;c7node5:8485/mycluster 60 | 61 | 62 | 63 | 64 | dfs.client.failover.proxy.provider.mycluster 65 | org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider 66 | 67 | 68 | 69 | 70 | dfs.ha.fencing.methods 71 | sshfence 72 | 73 | 74 | 75 | dfs.ha.fencing.ssh.private-key-files 76 | /root/.ssh/id_rsa 77 | 78 | 79 | 80 | 81 | dfs.journalnode.edits.dir 82 | /opt/data/journal/node/local/data 83 | 84 | 85 | 86 | 87 | dfs.ha.automatic-failover.enabled 88 | true 89 | 90 | 91 | 92 | -------------------------------------------------------------------------------- /configFiles/hive-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 18 | 19 | 20 | hive.metastore.uris 21 | thrift://c7node1:9083 22 | 23 | 24 | 25 | -------------------------------------------------------------------------------- /configFiles/oldCluster/core-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 16 | 17 | 18 | 19 | 20 | 21 | fs.defaultFS 22 | hdfs://mycluster 23 | 24 | 25 | 29 | hadoop.tmp.dir 30 | /opt/data/hadoop/ 31 | 32 | 33 | 34 | 35 | ha.zookeeper.quorum 36 | mynode3:2181,mynode4:2181,mynode5:2181 37 | 38 | 39 | 40 | -------------------------------------------------------------------------------- /configFiles/oldCluster/hdfs-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | dfs.nameservices 23 | mycluster 24 | 25 | 26 | 27 | dfs.permissions.enabled 28 | false 29 | 30 | 31 | 32 | dfs.ha.namenodes.mycluster 33 | nn1,nn2 34 | 35 | 36 | 37 | dfs.namenode.rpc-address.mycluster.nn1 38 | mynode1:8020 39 | 40 | 41 | 42 | dfs.namenode.rpc-address.mycluster.nn2 43 | mynode2:8020 44 | 45 | 46 | 47 | dfs.namenode.http-address.mycluster.nn1 48 | mynode1:50070 49 | 50 | 51 | 52 | dfs.namenode.http-address.mycluster.nn2 53 | mynode2:50070 54 | 55 | 56 | 57 | 58 | dfs.namenode.shared.edits.dir 59 | qjournal://mynode3:8485;mynode4:8485;mynode5:8485/mycluster 60 | 61 | 62 | 63 | 64 | dfs.client.failover.proxy.provider.mycluster 65 | org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider 66 | 67 | 68 | 69 | 70 | dfs.ha.fencing.methods 71 | sshfence 72 | 73 | 74 | 75 | dfs.ha.fencing.ssh.private-key-files 76 | /root/.ssh/id_rsa 77 | 78 | 79 | 80 | 81 | dfs.journalnode.edits.dir 82 | /opt/data/journal/node/local/data 83 | 84 | 85 | 86 | 87 | dfs.ha.automatic-failover.enabled 88 | true 89 | 90 | 91 | 92 | 93 | -------------------------------------------------------------------------------- /configFiles/oldCluster/hive-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | hive.metastore.uris 4 | thrift://mynode1:9083 5 | 6 | 7 | 8 | 11 | -------------------------------------------------------------------------------- /pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 5 | 4.0.0 6 | 7 | com.leelovejava.bigData 8 | TrafficTeach 9 | 1.0-SNAPSHOT 10 | 11 | TrafficTeach 12 | http://www.example.com 13 | 14 | 15 | UTF-8 16 | 1.8 17 | 1.8 18 | 19 | 20 | 21 | 22 | junit 23 | junit 24 | 4.11 25 | test 26 | 27 | 28 | 29 | org.apache.spark 30 | spark-core_2.11 31 | 2.3.1 32 | 33 | 34 | 35 | org.apache.spark 36 | spark-sql_2.11 37 | 2.3.1 38 | 39 | 40 | 41 | org.apache.spark 42 | spark-hive_2.11 43 | 2.3.1 44 | 45 | 46 | 47 | mysql 48 | mysql-connector-java 49 | 5.1.47 50 | 51 | 52 | 53 | org.apache.spark 54 | spark-streaming_2.11 55 | 2.3.1 56 | 57 | 58 | 59 | 60 | org.apache.spark 61 | spark-streaming-kafka-0-10_2.11 62 | 2.3.1 63 | 64 | 65 | 66 | org.apache.kafka 67 | kafka-clients 68 | 0.10.0.0 69 | 70 | 71 | 72 | redis.clients 73 | jedis 74 | 2.6.1 75 | 76 | 77 | 78 | 79 | org.scala-lang 80 | scala-library 81 | 2.11.7 82 | 83 | 84 | org.scala-lang 85 | scala-compiler 86 | 2.11.7 87 | 88 | 89 | org.scala-lang 90 | scala-reflect 91 | 2.11.7 92 | 93 | 94 | log4j 95 | log4j 96 | 1.2.12 97 | 98 | 99 | com.google.collections 100 | google-collections 101 | 1.0 102 | 103 | 104 | 105 | 106 | com.alibaba 107 | fastjson 108 | 1.2.11 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | src/main/resources 118 | 119 | 120 | *.properties 121 | 122 | 123 | 124 | *.xml 125 | *.yaml 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | maven-assembly-plugin 134 | 135 | 136 | jar-with-dependencies 137 | 138 | 139 | 140 | com.traffic.producedate2hive.Data2Hive 141 | 142 | 143 | 144 | 145 | 146 | make-assembly 147 | package 148 | 149 | single 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/load/data/MockData.java: -------------------------------------------------------------------------------- 1 | package com.traffic.load.data; 2 | 3 | import com.traffic.spark.util.DateUtils; 4 | import com.traffic.spark.util.StringUtils; 5 | import org.apache.spark.api.java.JavaRDD; 6 | import org.apache.spark.api.java.JavaSparkContext; 7 | import org.apache.spark.sql.Dataset; 8 | import org.apache.spark.sql.Row; 9 | import org.apache.spark.sql.RowFactory; 10 | import org.apache.spark.sql.SparkSession; 11 | import org.apache.spark.sql.types.DataTypes; 12 | import org.apache.spark.sql.types.StructType; 13 | 14 | import java.util.*; 15 | import java.util.Map.Entry; 16 | 17 | 18 | /** 19 | * 模拟数据 数据格式如下: 20 | *

21 | * 日期 卡口ID 摄像头编号 车牌号 拍摄时间 车速 道路ID 区域ID 22 | * date monitor_id camera_id car action_time speed road_id area_id 23 | *

24 | * monitor_flow_action 25 | * monitor_camera_info 26 | * 27 | * @author Administrator 28 | */ 29 | public class MockData { 30 | public static void mock(JavaSparkContext sc, SparkSession spark) { 31 | List dataList = new ArrayList(); 32 | Random random = new Random(); 33 | 34 | String[] locations = new String[]{"鲁", "京", "京", "京", "沪", "京", "京", "深", "京", "京"}; 35 | // String[] areas = new String[]{"海淀区","朝阳区","昌平区","东城区","西城区","丰台区","顺义区","大兴区"}; 36 | // date :获取当天时间 如:2018-01-01 37 | String date = DateUtils.getTodayDate(); 38 | 39 | /** 40 | * 模拟3000个车辆 41 | */ 42 | for (int i = 0; i < 3000; i++) { 43 | // 模拟车牌号:如:京A00001 65-A 26个字母 44 | String car = locations[random.nextInt(10)] + (char) (65 + random.nextInt(26)) + StringUtils.fulFuill(5, random.nextInt(100000) + ""); 45 | 46 | //baseActionTime 模拟24小时 47 | String baseActionTime = date + " " + StringUtils.fulFuill(random.nextInt(24) + "");//2018-01-01 01 48 | /** 49 | * 这里的for循环模拟每辆车经过不同的卡扣不同的摄像头 数据 50 | */ 51 | for (int j = 0; j < (random.nextInt(300) + 1); j++) { 52 | // 模拟每个车辆每被30个摄像头拍摄后 时间上累计加1小时。这样做使数据更加真实。 53 | if (j % 30 == 0 && j != 0) { 54 | baseActionTime = date + " " + StringUtils.fulFuill((Integer.parseInt(baseActionTime.split(" ")[1]) + 1) + ""); 55 | } 56 | // 模拟areaId 【一共8个区域】 57 | String areaId = StringUtils.fulFuill(2, random.nextInt(8) + 1 + ""); 58 | 59 | // 模拟道路id 【1~50 个道路】 60 | String roadId = random.nextInt(50) + 1 + ""; 61 | 62 | // 模拟9个卡扣monitorId,0补全4位 63 | String monitorId = StringUtils.fulFuill(4, random.nextInt(9) + ""); 64 | 65 | // 模拟摄像头id cameraId 66 | String cameraId = StringUtils.fulFuill(5, random.nextInt(100000) + ""); 67 | 68 | // 模拟经过此卡扣开始时间 ,如:2018-01-01 20:09:10 69 | String actionTime = baseActionTime + ":" 70 | + StringUtils.fulFuill(random.nextInt(60) + "") + ":" 71 | + StringUtils.fulFuill(random.nextInt(60) + ""); 72 | 73 | // 模拟速度 74 | String speed = (random.nextInt(260) + 1) + ""; 75 | 76 | Row row = RowFactory.create(date, monitorId, cameraId, car, actionTime, speed, roadId, areaId); 77 | dataList.add(row); 78 | } 79 | } 80 | 81 | /** 82 | * 2018-4-20 1 22 京A1234 83 | * 2018-4-20 1 23 京A1234 84 | * 1 【22,23】 85 | * 1 【22,23,24】 86 | */ 87 | 88 | JavaRDD rowRdd = sc.parallelize(dataList); 89 | 90 | StructType cameraFlowSchema = DataTypes.createStructType(Arrays.asList( 91 | DataTypes.createStructField("date", DataTypes.StringType, true), 92 | DataTypes.createStructField("monitor_id", DataTypes.StringType, true), 93 | DataTypes.createStructField("camera_id", DataTypes.StringType, true), 94 | DataTypes.createStructField("car", DataTypes.StringType, true), 95 | DataTypes.createStructField("action_time", DataTypes.StringType, true), 96 | DataTypes.createStructField("speed", DataTypes.StringType, true), 97 | DataTypes.createStructField("road_id", DataTypes.StringType, true), 98 | DataTypes.createStructField("area_id", DataTypes.StringType, true) 99 | )); 100 | 101 | // 动态创建DataSet 102 | Dataset ds = spark.createDataFrame(rowRdd, cameraFlowSchema); 103 | 104 | // 默认打印出来df里面的20行数据 105 | System.out.println("----打印 车辆信息数据----"); 106 | ds.show(); 107 | ds.createOrReplaceTempView("monitor_flow_action"); 108 | 109 | /** 110 | * monitorAndCameras key:monitor_id 111 | * value:hashSet(camera_id) 112 | * 基于生成的数据,生成对应的卡扣号和摄像头对应基本表 113 | * Set去重 114 | */ 115 | Map> monitorAndCameras = new HashMap<>(); 116 | 117 | int index = 0; 118 | for (Row row : dataList) { 119 | // row.getString(1) monitor_id 120 | Set sets = monitorAndCameras.get(row.getString(1)); 121 | if (sets == null) { 122 | sets = new HashSet<>(); 123 | monitorAndCameras.put((String) row.getString(1), sets); 124 | } 125 | // 这里每隔1000条数据随机插入一条数据,模拟出来标准表中卡扣对应摄像头的数据比模拟数据中多出来的摄像头。这个摄像头的数据不一定会在车辆数据中有。即可以看出卡扣号下有坏的摄像头。 126 | index++; 127 | if (index % 1000 == 0) { 128 | sets.add(StringUtils.fulFuill(5, random.nextInt(100000) + "")); 129 | } 130 | //row.getString(2) camera_id 131 | String cameraId = row.getString(2); 132 | sets.add(cameraId); 133 | } 134 | 135 | dataList.clear(); 136 | 137 | Set>> entrySet = monitorAndCameras.entrySet(); 138 | for (Entry> entry : entrySet) { 139 | String monitor_id = entry.getKey(); 140 | Set sets = entry.getValue(); 141 | Row row; 142 | for (String camera_id : sets) { 143 | row = RowFactory.create(monitor_id, camera_id); 144 | dataList.add(row); 145 | } 146 | } 147 | 148 | StructType monitorSchema = DataTypes.createStructType(Arrays.asList( 149 | DataTypes.createStructField("monitor_id", DataTypes.StringType, true), 150 | DataTypes.createStructField("camera_id", DataTypes.StringType, true) 151 | )); 152 | 153 | 154 | rowRdd = sc.parallelize(dataList); 155 | Dataset monitorDF = spark.createDataFrame(rowRdd, monitorSchema); 156 | monitorDF.createOrReplaceTempView("monitor_camera_info"); 157 | System.out.println("----打印 卡扣号对应摄像头号 数据----"); 158 | monitorDF.show(); 159 | } 160 | } 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/load/data/MockRealTimeData.java: -------------------------------------------------------------------------------- 1 | package com.traffic.load.data; 2 | 3 | import com.traffic.spark.util.DateUtils; 4 | import com.traffic.spark.util.StringUtils; 5 | import org.apache.kafka.clients.producer.KafkaProducer; 6 | import org.apache.kafka.clients.producer.ProducerRecord; 7 | 8 | import java.util.Properties; 9 | import java.util.Random; 10 | 11 | /** 12 | * 模拟实时的数据 13 | * 14 | * @author leelovejava 15 | * @date 2019-07-31 16 | */ 17 | public class MockRealTimeData extends Thread { 18 | 19 | private static final Random random = new Random(); 20 | private static final String[] locations = new String[]{"鲁", "京", "京", "京", "沪", "京", "京", "深", "京", "京"}; 21 | private KafkaProducer producer; 22 | 23 | public MockRealTimeData() { 24 | producer = new KafkaProducer<>(createProducerConfig()); 25 | } 26 | 27 | private Properties createProducerConfig() { 28 | Properties props = new Properties(); 29 | props.put("bootstrap.servers", "node1:9092,node2:9092,node3:9092"); 30 | props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); 31 | props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); 32 | return props; 33 | } 34 | 35 | public void run() { 36 | System.out.println("正在生产数据 ... ... "); 37 | while (true) { 38 | String date = DateUtils.getTodayDate(); 39 | String baseActionTime = date + " " + StringUtils.fulFuill(random.nextInt(24) + ""); 40 | baseActionTime = date + " " + StringUtils.fulFuill((Integer.parseInt(baseActionTime.split(" ")[1]) + 1) + ""); 41 | String actionTime = baseActionTime + ":" + StringUtils.fulFuill(random.nextInt(60) + "") + ":" + StringUtils.fulFuill(random.nextInt(60) + ""); 42 | String monitorId = StringUtils.fulFuill(4, random.nextInt(9) + ""); 43 | String car = locations[random.nextInt(10)] + (char) (65 + random.nextInt(26)) + StringUtils.fulFuill(5, random.nextInt(99999) + ""); 44 | String speed = random.nextInt(260) + ""; 45 | String roadId = random.nextInt(50) + 1 + ""; 46 | String cameraId = StringUtils.fulFuill(5, random.nextInt(9999) + ""); 47 | String areaId = StringUtils.fulFuill(2, random.nextInt(8) + ""); 48 | producer.send(new ProducerRecord<>("RoadRealTimeLog", date + "\t" + monitorId + "\t" + cameraId + "\t" + car + "\t" + actionTime + "\t" + speed + "\t" + roadId + "\t" + areaId)); 49 | 50 | try { 51 | Thread.sleep(50); 52 | } catch (InterruptedException e) { 53 | e.printStackTrace(); 54 | } 55 | } 56 | } 57 | 58 | /** 59 | * 启动Kafka Producer 60 | * 61 | * @param args 62 | */ 63 | public static void main(String[] args) { 64 | MockRealTimeData producer = new MockRealTimeData(); 65 | producer.start(); 66 | } 67 | 68 | } 69 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/producedate2hive/Data2File.java: -------------------------------------------------------------------------------- 1 | package com.traffic.producedate2hive; 2 | 3 | import com.traffic.spark.util.DateUtils; 4 | import com.traffic.spark.util.StringUtils; 5 | import org.apache.spark.sql.Row; 6 | import org.apache.spark.sql.RowFactory; 7 | 8 | import java.io.*; 9 | import java.util.*; 10 | import java.util.Map.Entry; 11 | 12 | /** 13 | * 数据保存到文件 14 | */ 15 | public class Data2File { 16 | public static String MONITOR_FLOW_ACTION = "./monitor_flow_action"; 17 | public static String MONITOR_CAMERA_INFO = "./monitor_camera_info"; 18 | 19 | public static void main(String[] args) { 20 | CreateFile(MONITOR_FLOW_ACTION); 21 | CreateFile(MONITOR_CAMERA_INFO); 22 | System.out.println("running... ..."); 23 | mock(); 24 | System.out.println("finished"); 25 | } 26 | 27 | /** 28 | * 创建文件 29 | * 30 | * @param pathFileName 31 | */ 32 | public static Boolean CreateFile(String pathFileName) { 33 | try { 34 | File file = new File(pathFileName); 35 | if (file.exists()) { 36 | file.delete(); 37 | } 38 | boolean createNewFile = file.createNewFile(); 39 | System.out.println("create file " + pathFileName + " success!"); 40 | return createNewFile; 41 | } catch (IOException e) { 42 | e.printStackTrace(); 43 | } 44 | return false; 45 | } 46 | 47 | /** 48 | * 向文件中写入数据 49 | * 50 | * @param pathFileName 51 | * @param newContent 52 | */ 53 | public static void WriteDataToFile(String pathFileName, String newContent) { 54 | FileOutputStream fos; 55 | OutputStreamWriter osw; 56 | PrintWriter pw; 57 | try { 58 | //产生一行模拟数据 59 | String content = newContent; 60 | File file = new File(pathFileName); 61 | fos = new FileOutputStream(file, true); 62 | osw = new OutputStreamWriter(fos, "UTF-8"); 63 | pw = new PrintWriter(osw); 64 | pw.write(content + "\n"); 65 | //注意关闭的先后顺序,先打开的后关闭,后打开的先关闭 66 | pw.close(); 67 | osw.close(); 68 | fos.close(); 69 | } catch (IOException e) { 70 | e.printStackTrace(); 71 | } 72 | } 73 | 74 | 75 | /** 76 | * 生成模拟数据 77 | */ 78 | public static void mock() { 79 | List dataList = new ArrayList(); 80 | Random random = new Random(); 81 | 82 | String[] locations = new String[]{"鲁", "京", "京", "京", "沪", "京", "京", "深", "京", "京"}; 83 | String date = DateUtils.getTodayDate(); 84 | 85 | /** 86 | * 模拟3000个车辆 87 | */ 88 | for (int i = 0; i < 3000; i++) { 89 | String car = locations[random.nextInt(10)] + (char) (65 + random.nextInt(26)) + StringUtils.fulFuill(5, random.nextInt(100000) + ""); 90 | 91 | //baseActionTime 模拟24小时 92 | String baseActionTime = date + " " + StringUtils.fulFuill(random.nextInt(24) + ""); 93 | /** 94 | * 这里的for循环模拟每辆车经过不同的卡扣不同的摄像头 数据。 95 | */ 96 | for (int j = 0; j < random.nextInt(300) + 1; j++) { 97 | // 模拟每个车辆每被30个摄像头拍摄后 时间上累计加1小时。这样做使数据更加真实。 98 | if (j % 30 == 0 && j != 0) { 99 | baseActionTime = date + " " + StringUtils.fulFuill((Integer.parseInt(baseActionTime.split(" ")[1]) + 1) + ""); 100 | } 101 | 102 | // 模拟经过此卡扣开始时间 ,如:2017-10-01 20:09:10 103 | String actionTime = baseActionTime + ":" 104 | + StringUtils.fulFuill(random.nextInt(60) + "") + ":" 105 | + StringUtils.fulFuill(random.nextInt(60) + ""); 106 | 107 | // 模拟9个卡扣monitorId 108 | String monitorId = StringUtils.fulFuill(4, random.nextInt(9) + ""); 109 | 110 | // 模拟速度 111 | String speed = random.nextInt(260) + 1 + ""; 112 | 113 | // 模拟道路id 【1~50 个道路】 114 | String roadId = random.nextInt(50) + 1 + ""; 115 | 116 | // 模拟摄像头id cameraId 117 | String cameraId = StringUtils.fulFuill(5, random.nextInt(100000) + ""); 118 | 119 | // 模拟areaId 【一共8个区域】 120 | String areaId = StringUtils.fulFuill(2, random.nextInt(8) + 1 + ""); 121 | 122 | 123 | // 将数据写入到文件中 124 | String content = date + "\t" + monitorId + "\t" + cameraId + "\t" + car + "\t" + actionTime + "\t" + speed + "\t" + roadId + "\t" + areaId; 125 | WriteDataToFile(MONITOR_FLOW_ACTION, content); 126 | Row row = RowFactory.create(date, monitorId, cameraId, car, actionTime, speed, roadId, areaId); 127 | dataList.add(row); 128 | } 129 | } 130 | 131 | /** 132 | * 生成 monitor_id 对应camera_id表 133 | */ 134 | Map> monitorAndCameras = new HashMap<>(); 135 | 136 | int index = 0; 137 | for (Row row : dataList) { 138 | //row.getString(1) monitor_id 139 | Set sets = monitorAndCameras.get(row.getString(1)); 140 | if (sets == null) { 141 | sets = new HashSet<>(); 142 | monitorAndCameras.put((String) row.getString(1), sets); 143 | } 144 | index++; 145 | //这里每隔1000条数据随机插入一条数据,模拟出来标准表中卡扣对应摄像头的数据。这个摄像头的数据不一定会在车辆数据中有。 146 | if (index % 1000 == 0) { 147 | sets.add(StringUtils.fulFuill(5, random.nextInt(100000) + "")); 148 | } 149 | //row.getString(2) camera_id 150 | sets.add(row.getString(2)); 151 | } 152 | 153 | Set>> entrySet = monitorAndCameras.entrySet(); 154 | for (Entry> entry : entrySet) { 155 | String monitor_id = entry.getKey(); 156 | Set sets = entry.getValue(); 157 | for (String val : sets) { 158 | // 将数据写入到文件 159 | String content = monitor_id + "\t" + val; 160 | WriteDataToFile(MONITOR_CAMERA_INFO, content); 161 | } 162 | } 163 | } 164 | } 165 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/producedate2hive/Data2Hive.java: -------------------------------------------------------------------------------- 1 | package com.traffic.producedate2hive; 2 | 3 | import org.apache.spark.SparkConf; 4 | import org.apache.spark.api.java.JavaSparkContext; 5 | import org.apache.spark.sql.SparkSession; 6 | 7 | /** 8 | * 1.代码方式向hive中创建数据 9 | * 2.后面可以有sql文件执行直接在hive中创建数据库表 10 | * 11 | * hive-site.xml放到$SPARK_HOME/conf下 12 | */ 13 | public class Data2Hive { 14 | public static void main(String[] args) { 15 | SparkConf conf = new SparkConf(); 16 | conf.setAppName("traffic2hive"); 17 | JavaSparkContext sc = new JavaSparkContext(conf); 18 | 19 | // HiveContext在Spark2.0 过期, Use SparkSession.builder.enableHiveSupport 20 | // HiveContext是SQLContext的子类。 21 | // HiveContext hiveContext = new HiveContext(sc); 22 | SparkSession hiveContext = SparkSession.builder().config(conf).enableHiveSupport().getOrCreate(); 23 | 24 | 25 | hiveContext.sql("USE traffic"); 26 | hiveContext.sql("DROP TABLE IF EXISTS monitor_flow_action"); 27 | // 在hive中创建monitor_flow_action表 28 | hiveContext.sql("CREATE TABLE IF NOT EXISTS monitor_flow_action " 29 | + "(date STRING,monitor_id STRING,camera_id STRING,car STRING,action_time STRING,speed STRING,road_id STRING,area_id STRING) " 30 | + "row format delimited fields terminated by '\t' "); 31 | hiveContext.sql("load data local inpath '/root/test/monitor_flow_action' into table monitor_flow_action"); 32 | 33 | // 在hive中创建monitor_camera_info表 34 | hiveContext.sql("DROP TABLE IF EXISTS monitor_camera_info"); 35 | hiveContext.sql("CREATE TABLE IF NOT EXISTS monitor_camera_info (monitor_id STRING, camera_id STRING) row format delimited fields terminated by '\t'"); 36 | hiveContext.sql("LOAD DATA " 37 | + "LOCAL INPATH '/root/test/monitor_camera_info'" 38 | + "INTO TABLE monitor_camera_info"); 39 | 40 | System.out.println("========data2hive finish========"); 41 | sc.stop(); 42 | } 43 | } 44 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/arearoadflow/ConcatStringStringUDF.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.arearoadflow; 2 | 3 | import org.apache.spark.sql.api.java.UDF3; 4 | 5 | /** 6 | * 将两个字段拼接起来(使用指定的分隔符) 7 | * @author Administrator 8 | * 海淀区:建材城西路 9 | */ 10 | public class ConcatStringStringUDF implements UDF3 { 11 | private static final long serialVersionUID = 1L; 12 | 13 | @Override 14 | public String call(String area_name, String road_id, String split) throws Exception { 15 | return area_name + split + road_id; 16 | } 17 | 18 | } 19 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/arearoadflow/GroupConcatDistinctUDAF.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.arearoadflow; 2 | 3 | import com.traffic.spark.util.StringUtils; 4 | import org.apache.spark.sql.Row; 5 | import org.apache.spark.sql.expressions.MutableAggregationBuffer; 6 | import org.apache.spark.sql.expressions.UserDefinedAggregateFunction; 7 | import org.apache.spark.sql.types.DataType; 8 | import org.apache.spark.sql.types.DataTypes; 9 | import org.apache.spark.sql.types.StructType; 10 | 11 | import java.util.Arrays; 12 | import java.util.Map; 13 | import java.util.Map.Entry; 14 | 15 | /** 16 | * 组内拼接去重函数(group_concat_distinct()) 17 | *

18 | * 技术点:自定义UDAF聚合函数 19 | * 20 | * @author Administrator 21 | */ 22 | public class GroupConcatDistinctUDAF extends UserDefinedAggregateFunction { 23 | 24 | private static final long serialVersionUID = -2510776241322950505L; 25 | 26 | // 指定输入数据的字段与类型 27 | private StructType inputSchema = DataTypes.createStructType(Arrays.asList( 28 | DataTypes.createStructField("carInfo", DataTypes.StringType, true))); 29 | // 指定缓冲数据的字段与类型 30 | private StructType bufferSchema = DataTypes.createStructType(Arrays.asList( 31 | DataTypes.createStructField("bufferInfo", DataTypes.StringType, true))); 32 | // 指定返回类型 33 | private DataType dataType = DataTypes.StringType; 34 | // 指定是否是确定性的 35 | private boolean deterministic = true; 36 | 37 | /** 38 | * 输入数据的类型 39 | */ 40 | @Override 41 | public StructType inputSchema() { 42 | return inputSchema; 43 | } 44 | 45 | /** 46 | * 聚合操作的数据类型 47 | */ 48 | @Override 49 | public StructType bufferSchema() { 50 | return bufferSchema; 51 | } 52 | 53 | @Override 54 | public boolean deterministic() { 55 | return deterministic; 56 | } 57 | 58 | /** 59 | * 初始化 60 | * 可以认为是,你自己在内部指定一个初始的值 61 | */ 62 | @Override 63 | public void initialize(MutableAggregationBuffer buffer) { 64 | buffer.update(0, ""); 65 | } 66 | 67 | /** 68 | * 更新 69 | * 可以认为是,一个一个地将组内的字段值传递进来 70 | * 实现拼接的逻辑 71 | */ 72 | @Override 73 | public void update(MutableAggregationBuffer buffer, Row input) { 74 | // 缓冲中的已经拼接过的monitor信息小字符串 75 | String bufferMonitorInfo = buffer.getString(0);//|A=2|B=1 76 | // 刚刚传递进来的某个monitor信息 77 | String inputMonitorInfo = input.getString(0); 78 | 79 | String[] split = inputMonitorInfo.split("\\|"); 80 | String monitorId = ""; 81 | int addNum = 1; 82 | for (String currMonitorid : split) { 83 | if (currMonitorid.indexOf("=") != -1) { 84 | monitorId = currMonitorid.split("=")[0]; 85 | addNum = Integer.parseInt(currMonitorid.split("=")[1]); 86 | } else { 87 | monitorId = currMonitorid; 88 | } 89 | String oldVS = StringUtils.getFieldFromConcatString(bufferMonitorInfo, "\\|", monitorId); 90 | if (oldVS == null) { 91 | bufferMonitorInfo += "|" + monitorId + "=" + addNum; 92 | } else { 93 | bufferMonitorInfo = StringUtils.setFieldInConcatString(bufferMonitorInfo, "\\|", monitorId, Integer.parseInt(oldVS) + addNum + ""); 94 | } 95 | buffer.update(0, bufferMonitorInfo); 96 | } 97 | } 98 | 99 | /** 100 | * 合并 101 | * update操作,可能是针对一个分组内的部分数据,在某个节点上发生的 102 | * 但是可能一个分组内的数据,会分布在多个节点上处理 103 | * 此时就要用merge操作,将各个节点上分布式拼接好的串,合并起来 104 | *

105 | * 海淀区 建材城西路 106 | * merge1:|0001=100|0002=20|0003=4 107 | * merge2:|0001=200|0002=30|0003=3|0004=100 108 | *

109 | * bufferMonitorInfo1 = 0001=300|0002=50|0003=7|0004=100 110 | */ 111 | @Override 112 | public void merge(MutableAggregationBuffer buffer1, Row buffer2) { 113 | //缓存中的monitor信息这个大字符串 114 | String bufferMonitorInfo1 = buffer1.getString(0); 115 | //传进来 116 | String bufferMonitorInfo2 = buffer2.getString(0); 117 | 118 | // 等于是把buffer2里面的数据都拆开来更新 119 | for (String monitorInfo : bufferMonitorInfo2.split("\\|")) { 120 | Map map = StringUtils.getKeyValuesFromConcatString(monitorInfo, "\\|"); 121 | /** 122 | * bufferMonitorInfo1 = 0001=300|0002=50|0003=7|0009=1000 123 | */ 124 | for (Entry entry : map.entrySet()) { 125 | String monitor = entry.getKey(); 126 | int carCount = Integer.parseInt(entry.getValue()); 127 | String oldVS = StringUtils.getFieldFromConcatString(bufferMonitorInfo1, "\\|", monitor); 128 | //当没有获取到本次monitor对应的值时 129 | if (oldVS == null) { 130 | if ("".equals(bufferMonitorInfo1)) { 131 | //当第一次聚合的时候,没有初始的传进来的bufferMonitorInfo1,默认为"" 132 | bufferMonitorInfo1 += monitor + "=" + carCount; 133 | } else { 134 | //当上一次传进来的字符串不包含本次的monitor时,就拼上 135 | bufferMonitorInfo1 += "|" + monitor + "=" + carCount; 136 | } 137 | } else { 138 | int oldVal = Integer.valueOf(oldVS); 139 | oldVal += carCount; 140 | bufferMonitorInfo1 = StringUtils.setFieldInConcatString(bufferMonitorInfo1, "\\|", monitor, oldVal + ""); 141 | } 142 | buffer1.update(0, bufferMonitorInfo1); 143 | } 144 | } 145 | } 146 | 147 | @Override 148 | public DataType dataType() { 149 | return dataType; 150 | } 151 | 152 | /** 153 | * evaluate方法返回数据的类型要和dateType的类型一致,不一致就会报错 154 | */ 155 | @Override 156 | public Object evaluate(Row row) { 157 | return row.getString(0); 158 | } 159 | 160 | } 161 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/arearoadflow/RandomPrefixUDF.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.arearoadflow; 2 | 3 | import java.util.Random; 4 | 5 | import org.apache.spark.sql.api.java.UDF2; 6 | 7 | public class RandomPrefixUDF implements UDF2{ 8 | 9 | /** 10 | * 11 | */ 12 | private static final long serialVersionUID = 1L; 13 | 14 | @Override 15 | public String call(String area_name_road_id, Integer ranNum) throws Exception { 16 | Random random = new Random(); 17 | int prefix = random.nextInt(ranNum); 18 | return prefix+"_"+area_name_road_id; 19 | } 20 | 21 | } 22 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/arearoadflow/RemoveRandomPrefixUDF.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.arearoadflow; 2 | 3 | import org.apache.spark.sql.api.java.UDF1; 4 | 5 | public class RemoveRandomPrefixUDF implements UDF1 { 6 | private static final long serialVersionUID = 1L; 7 | 8 | /** 9 | * 1_海淀区:建材城西路 10 | * 11 | * @param val 12 | * @return 13 | * @throws Exception 14 | */ 15 | @Override 16 | public String call(String val) throws Exception { 17 | return val.split("_")[1]; 18 | } 19 | 20 | } 21 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/conf/ConfigurationManager.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.conf; 2 | 3 | import java.io.InputStream; 4 | import java.util.Properties; 5 | 6 | /** 7 | * 配置管理组件 8 | * 9 | * 1、配置管理组件可以复杂,也可以很简单,对于简单的配置管理组件来说,只要开发一个类,可以在第一次访问它的 10 | * 时候,就从对应的properties文件中,读取配置项,并提供外界获取某个配置key对应的value的方法 11 | * 2、如果是特别复杂的配置管理组件,那么可能需要使用一些软件设计中的设计模式,比如单例模式、解释器模式 12 | * 可能需要管理多个不同的properties,甚至是xml类型的配置文件 13 | * 我们这里的话,就是开发一个简单的配置管理组件,就可以了 14 | * 15 | * @author Administrator 16 | * 17 | */ 18 | public class ConfigurationManager { 19 | 20 | /** 21 | * Properties对象使用private来修饰,就代表了其是类私有的 22 | * 那么外界的代码,就不能直接通过ConfigurationManager.prop这种方式获取到Properties对象 23 | * 之所以这么做,是为了避免外界的代码不小心错误的更新了Properties中某个key对应的value 24 | * 从而导致整个程序的状态错误,乃至崩溃 25 | */ 26 | private static Properties prop = new Properties(); 27 | 28 | /** 29 | * 静态代码块 30 | * 31 | * Java中,每一个类第一次使用的时候,就会被Java虚拟机(JVM)中的类加载器,去从磁盘上的.class文件中 32 | * 加载出来,然后为每个类都会构建一个Class对象,就代表了这个类 33 | * 34 | * 每个类在第一次加载的时候,都会进行自身的初始化,那么类初始化的时候,会执行哪些操作的呢? 35 | * 就由每个类内部的static {}构成的静态代码块决定,我们自己可以在类中开发静态代码块 36 | * 类第一次使用的时候,就会加载,加载的时候,就会初始化类,初始化类的时候就会执行类的静态代码块 37 | * 38 | * 因此,对于我们的配置管理组件,就在静态代码块中,编写读取配置文件的代码 39 | * 这样的话,第一次外界代码调用这个ConfigurationManager类的静态方法的时候,就会加载配置文件中的数据 40 | * 41 | * 而且,放在静态代码块中,还有一个好处,就是类的初始化在整个JVM生命周期内,有且仅有一次,也就是说 42 | * 配置文件只会加载一次,然后以后就是重复使用,效率比较高;不用反复加载多次 43 | */ 44 | static { 45 | try { 46 | // 通过一个“类名.class”的方式,就可以获取到这个类在JVM中对应的Class对象 47 | // 然后再通过这个Class对象的getClassLoader()方法,就可以获取到当初加载这个类的JVM 48 | // 中的类加载器(ClassLoader),然后调用ClassLoader的getResourceAsStream()这个方法 49 | // 就可以用类加载器,去加载类加载路径中的指定的文件 50 | // 最终可以获取到一个,针对指定文件的输入流(InputStream) 51 | InputStream in = ConfigurationManager.class 52 | .getClassLoader().getResourceAsStream("my.properties"); 53 | 54 | // 调用Properties的load()方法,给它传入一个文件的InputStream输入流 55 | // 即可将文件中的符合“key=value”格式的配置项,都加载到Properties对象中 56 | // 加载过后,此时,Properties对象中就有了配置文件中所有的key-value对了 57 | // 然后外界其实就可以通过Properties对象获取指定key对应的value 58 | prop.load(in); 59 | } catch (Exception e) { 60 | e.printStackTrace(); 61 | } 62 | } 63 | 64 | /** 65 | * 获取指定key对应的value 66 | * 67 | * 第一次外界代码,调用ConfigurationManager类的getProperty静态方法时,JVM内部会发现 68 | * ConfigurationManager类还不在JVM的内存中 69 | * 70 | * 此时JVM,就会使用自己的ClassLoader(类加载器),去对应的类所在的磁盘文件(.class文件)中 71 | * 去加载ConfigurationManager类,到JVM内存中来,并根据类内部的信息,去创建一个Class对象 72 | * Class对象中,就包含了类的元信息,包括类有哪些field(Properties prop);有哪些方法(getProperty) 73 | * 74 | * 加载ConfigurationManager类的时候,还会初始化这个类,那么此时就执行类的static静态代码块 75 | * 此时咱们自己编写的静态代码块中的代码,就会加载my.properites文件的内容,到Properties对象中来 76 | * 77 | * 下一次外界代码,再调用ConfigurationManager的getProperty()方法时,就不会再次加载类,不会再次初始化 78 | * 类,和执行静态代码块了,所以也印证了,我们上面所说的,类只会加载一次,配置文件也仅仅会加载一次 79 | * 80 | * @param key 81 | * @return value 82 | */ 83 | public static String getProperty(String key) { 84 | return prop.getProperty(key); 85 | } 86 | 87 | /** 88 | * 获取整数类型的配置项 89 | * @param key 90 | * @return value 91 | */ 92 | public static Integer getInteger(String key) { 93 | String value = getProperty(key); 94 | try { 95 | return Integer.valueOf(value); 96 | } catch (Exception e) { 97 | e.printStackTrace(); 98 | } 99 | return 0; 100 | } 101 | 102 | /** 103 | * 获取布尔类型的配置项 104 | * @param key 105 | * @return value 106 | */ 107 | public static Boolean getBoolean(String key) { 108 | String value = getProperty(key); 109 | try { 110 | return Boolean.valueOf(value); 111 | } catch (Exception e) { 112 | e.printStackTrace(); 113 | } 114 | return false; 115 | } 116 | 117 | /** 118 | * 获取Long类型的配置项 119 | * @param key 120 | * @return 121 | */ 122 | public static Long getLong(String key) { 123 | String value = getProperty(key); 124 | try { 125 | return Long.valueOf(value); 126 | } catch (Exception e) { 127 | e.printStackTrace(); 128 | } 129 | return 0L; 130 | } 131 | 132 | } 133 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/constant/Constants.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.constant; 2 | 3 | /** 4 | * 常量接口 5 | *

6 | * 接口中声明的成员变量默认都是 public static final 的,必须显示的初始化。因而在常量声明时可以省略这些修饰符。 7 | * 8 | * @author Administrator 9 | */ 10 | public interface Constants { 11 | /** 12 | * 1. 项目配置相关的常量 13 | */ 14 | public static final String JDBC_DRIVER = "jdbc.driver"; 15 | String JDBC_DATASOURCE_SIZE = "jdbc.datasource.size"; 16 | String JDBC_URL = "jdbc.url"; 17 | String JDBC_USER = "jdbc.user"; 18 | String JDBC_PASSWORD = "jdbc.password"; 19 | String JDBC_URL_PROD = "jdbc.url.prod"; 20 | String JDBC_USER_PROD = "jdbc.user.prod"; 21 | String JDBC_PASSWORD_PROD = "jdbc.password.prod"; 22 | String SPARK_LOCAL = "spark.local"; 23 | String SPARK_LOCAL_TASKID_MONITOR = "spark.local.taskId.monitorFlow"; 24 | String SPARK_LOCAL_TASKID_EXTRACT_CAR = "spark.local.taskId.extractCar"; 25 | String SPARK_LOCAL_WITH_THE_CAR = "spark.local.taskId.withTheCar"; 26 | String SPARK_LOCAL_TASKID_TOPN_MONITOR_FLOW = "spark.local.taskid.tpn.road.flow"; 27 | String SPARK_LOCAL_TASKID_MONITOR_ONE_STEP_CONVERT = "spark.local.taskid.road.one.step.convert"; 28 | String KAFKA_METADATA_BROKER_LIST = "kafka.metadata.broker.list"; 29 | String KAFKA_TOPICS = "kafka.topics"; 30 | 31 | /** 32 | * 2. Spark作业相关的常量 33 | */ 34 | String SPARK_APP_NAME = "MonitorFlowAnalyze"; 35 | /** 36 | * 摄像头总数 37 | */ 38 | String FIELD_CAMERA_COUNT = "cameraCount"; 39 | /** 40 | * 摄像头 41 | */ 42 | String FIELD_CAMERA_IDS = "cameraIds"; 43 | /** 44 | * 车辆总数 45 | */ 46 | String FIELD_CAR_COUNT = "carCount"; 47 | String FIELD_NORMAL_MONITOR_COUNT = "normalMonitorCount"; 48 | String FIELD_NORMAL_CAMERA_COUNT = "normalCameraCount"; 49 | String FIELD_ABNORMAL_MONITOR_COUNT = "abnormalMonitorCount"; 50 | String FIELD_ABNORMAL_CAMERA_COUNT = "abnormalCameraCount"; 51 | String FIELD_ABNORMAL_MONITOR_CAMERA_INFOS = "abnormalMonitorCameraInfos"; 52 | String FIELD_TOP_NUM = "topNum"; 53 | String FIELD_DATE_HOUR = "dateHour"; 54 | String FIELD_CAR_TRACK = "carTrack"; 55 | String FIELD_DATE = "dateHour"; 56 | String FIELD_CAR = "car"; 57 | String FIELD_CARS = "cars"; 58 | String FIELD_MONITOR = "monitor"; 59 | String FIELD_MONITOR_ID = "monitorId"; 60 | String FIELD_ACTION_TIME = "actionTime"; 61 | String FIELD_EXTRACT_NUM = "extractNum"; 62 | /** 63 | * 低速行驶 64 | */ 65 | String FIELD_SPEED_0_60 = "0_60"; 66 | /** 67 | * 正常行驶 68 | */ 69 | String FIELD_SPEED_60_90 = "60_90"; 70 | /** 71 | * 中速行驶 72 | */ 73 | String FIELD_SPEED_90_120 = "90_120"; 74 | /** 75 | * 高速行驶 76 | */ 77 | String FIELD_SPEED_120_MAX = "120_max"; 78 | String FIELD_AREA_ID = "areaId"; 79 | String FIELD_AREA_NAME = "areaName"; 80 | 81 | 82 | /** 83 | * 3. 任务相关的常量 84 | */ 85 | String PARAM_START_DATE = "startDate"; 86 | String PARAM_END_DATE = "endDate"; 87 | String PARAM_MONITOR_FLOW = "roadFlow"; 88 | 89 | 90 | } 91 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/dao/IAreaDao.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.dao; 2 | 3 | import com.traffic.spark.domain.Area; 4 | 5 | import java.util.List; 6 | 7 | 8 | public interface IAreaDao { 9 | List findAreaInfo(); 10 | } 11 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/dao/ICarTrackDAO.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.dao; 2 | 3 | import com.traffic.spark.domain.CarTrack; 4 | 5 | import java.util.List; 6 | 7 | 8 | public interface ICarTrackDAO { 9 | 10 | /** 11 | * 批量插入车辆轨迹信息 12 | * @param carTracks 13 | */ 14 | void insertBatchCarTrack(List carTracks); 15 | } 16 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/dao/IMonitorDAO.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.dao; 2 | 3 | import com.traffic.spark.domain.MonitorState; 4 | import com.traffic.spark.domain.TopNMonitor2CarCount; 5 | import com.traffic.spark.domain.TopNMonitorDetailInfo; 6 | 7 | import java.util.List; 8 | 9 | /** 10 | * 卡口流量监控管理DAO接口 11 | * @author root 12 | * 13 | */ 14 | public interface IMonitorDAO { 15 | /** 16 | * 卡口流量topN批量插入到数据库 17 | * @param topNMonitor2CarCounts 18 | */ 19 | void insertBatchTopN(List topNMonitor2CarCounts); 20 | 21 | /** 22 | * 卡口下车辆具体信息插入到数据库 23 | * @param monitorDetailInfos 24 | */ 25 | void insertBatchMonitorDetails(List monitorDetailInfos); 26 | 27 | 28 | /** 29 | * 卡口状态信息插入到数据库 30 | * @param monitorState 31 | */ 32 | void insertMonitorState(MonitorState monitorState); 33 | 34 | void insertBatchTop10Details(List topNMonitorDetailInfos); 35 | } 36 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/dao/IRandomExtractDAO.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.dao; 2 | 3 | import com.traffic.spark.domain.RandomExtractCar; 4 | import com.traffic.spark.domain.RandomExtractMonitorDetail; 5 | 6 | import java.util.List; 7 | 8 | /** 9 | * 随机抽取car信息管理DAO类 10 | * @author root 11 | * 12 | */ 13 | public interface IRandomExtractDAO { 14 | void insertBatchRandomExtractCar(List carRandomExtracts); 15 | 16 | void insertBatchRandomExtractDetails(List r); 17 | 18 | } 19 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/dao/ITaskDAO.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.dao; 2 | 3 | 4 | import com.traffic.spark.domain.Task; 5 | 6 | /** 7 | * 任务管理DAO接口 8 | * 9 | * @author root 10 | */ 11 | public interface ITaskDAO { 12 | 13 | /** 14 | * 根据task的主键查询指定的任务 15 | * 16 | * @param taskId 17 | * @return 18 | */ 19 | Task findTaskById(long taskId); 20 | } 21 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/dao/IWithTheCarDAO.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.dao; 2 | 3 | public interface IWithTheCarDAO { 4 | void updateTestData(String param); 5 | } 6 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/dao/factory/DAOFactory.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.dao.factory; 2 | 3 | 4 | import com.traffic.spark.dao.*; 5 | import com.traffic.spark.dao.impl.*; 6 | 7 | /** 8 | * DAO工厂类 9 | * @author root 10 | * 11 | */ 12 | public class DAOFactory { 13 | 14 | 15 | public static ITaskDAO getTaskDAO(){ 16 | return new TaskDAOImpl(); 17 | } 18 | 19 | public static IMonitorDAO getMonitorDAO(){ 20 | return new MonitorDAOImpl(); 21 | } 22 | 23 | public static IRandomExtractDAO getRandomExtractDAO(){ 24 | return new RandomExtractDAOImpl(); 25 | } 26 | 27 | public static ICarTrackDAO getCarTrackDAO(){ 28 | return new CarTrackDAOImpl(); 29 | } 30 | 31 | public static IWithTheCarDAO getWithTheCarDAO(){ 32 | return new WithTheCarDAOImpl(); 33 | } 34 | 35 | public static IAreaDao getAreaDao() { 36 | return new AreaDaoImpl(); 37 | 38 | } 39 | } 40 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/dao/impl/AreaDaoImpl.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.dao.impl; 2 | 3 | import com.traffic.spark.dao.IAreaDao; 4 | import com.traffic.spark.domain.Area; 5 | import com.traffic.spark.jdbc.JDBCHelper; 6 | 7 | import java.sql.ResultSet; 8 | import java.util.ArrayList; 9 | import java.util.List; 10 | 11 | public class AreaDaoImpl implements IAreaDao { 12 | 13 | public List findAreaInfo() { 14 | final List areas = new ArrayList(); 15 | 16 | String sql = "SELECT * FROM area_info"; 17 | 18 | JDBCHelper jdbcHelper = JDBCHelper.getInstance(); 19 | jdbcHelper.executeQuery(sql, null, new JDBCHelper.QueryCallback() { 20 | 21 | public void process(ResultSet rs) throws Exception { 22 | if (rs.next()) { 23 | String areaId = rs.getString(1); 24 | String areaName = rs.getString(2); 25 | areas.add(new Area(areaId, areaName)); 26 | } 27 | } 28 | }); 29 | return areas; 30 | } 31 | 32 | } 33 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/dao/impl/CarTrackDAOImpl.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.dao.impl; 2 | 3 | import com.traffic.spark.dao.ICarTrackDAO; 4 | import com.traffic.spark.domain.CarTrack; 5 | import com.traffic.spark.jdbc.JDBCHelper; 6 | 7 | import java.util.ArrayList; 8 | import java.util.List; 9 | 10 | public class CarTrackDAOImpl implements ICarTrackDAO { 11 | @Override 12 | public void insertBatchCarTrack(List carTracks) { 13 | JDBCHelper jdbcHelper = JDBCHelper.getInstance(); 14 | String sql = "INSERT INTO car_track VALUES(?,?,?,?)"; 15 | List params = new ArrayList<>(); 16 | for (CarTrack c : carTracks) { 17 | /** 18 | * 添加到车辆轨迹表中 19 | */ 20 | params.add(new Object[]{c.getTaskId(), c.getDate(), c.getCar(), c.getTrack()}); 21 | 22 | /** 23 | * 添加到段时间内的车辆信息表中 24 | */ 25 | // long taskId = c.getTaskId(); 26 | // String car = c.getCar(); 27 | // String track = c.getTrack(); 28 | // Map timeAndMonitor = StringUtils.getKeyValuesFromConcatString(track, "\\|"); 29 | // List insertList = new ArrayList<>(); 30 | // List updateList = new ArrayList<>(); 31 | // for (Entry entry : timeAndMonitor.entrySet()) { 32 | // String monitorId = entry.getKey(); 33 | // String dateTime = entry.getValue(); 34 | // String timeRange = DateUtils.getRangeTime(dateTime); 35 | // 36 | // String sqlText = "SELECT * FROM monitor_range_time_car WHERE task_id = ? AND monitor_id = ? AND range_time = ?"; 37 | // Object[] selarams = new Object[]{taskId ,monitorId,timeRange}; 38 | // final CarInfoPer5M carInfoPer5M = new CarInfoPer5M(); 39 | // jdbcHelper.executeQuery(sqlText, selarams, new QueryCallback() { 40 | // @Override 41 | // public void process(ResultSet rs) throws Exception { 42 | // if(rs.next()){ 43 | // carInfoPer5M.setCars(rs.getString(4)); 44 | // } 45 | // } 46 | // }); 47 | // carInfoPer5M.setTaskId(taskId); 48 | // carInfoPer5M.setMonitorId(monitorId); 49 | // carInfoPer5M.setRangeTime(timeRange); 50 | // if(carInfoPer5M.getCars() != null){ 51 | // String cars = carInfoPer5M.getCars(); 52 | // cars += "|"+car+"="+dateTime; 53 | // carInfoPer5M.setCars(cars); 54 | // updateList.add(carInfoPer5M); 55 | // }else{ 56 | // carInfoPer5M.setCars(car+"="+dateTime); 57 | // insertList.add(carInfoPer5M); 58 | // } 59 | // } 60 | // 61 | // String insertSQL = "INSERT INTO monitor_range_time_car VALUES(?,?,?,?)"; 62 | // List insertParams = new ArrayList<>(); 63 | // for (CarInfoPer5M carInfoPer5M : insertList) { 64 | // insertParams.add(new Object[]{carInfoPer5M.getTaskId(),carInfoPer5M.getMonitorId(),carInfoPer5M.getRangeTime(),carInfoPer5M.getCars()}); 65 | // } 66 | // jdbcHelper.executeBatch(insertSQL, insertParams); 67 | // 68 | // String updateSQL = "UPDATE monitor_range_time_car SET cars = ? WHERE task_id = ? AND monitor_id = ? AND range_time = ?"; 69 | // for (CarInfoPer5M carInfoPer5M : updateList) { 70 | // Object[] updateParam = new Object[]{carInfoPer5M.getCars(),carInfoPer5M.getTaskId(),carInfoPer5M.getMonitorId(),carInfoPer5M.getRangeTime()}; 71 | // jdbcHelper.executeUpdate(updateSQL, updateParam); 72 | // } 73 | } 74 | jdbcHelper.executeBatch(sql, params); 75 | } 76 | 77 | } 78 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/dao/impl/MonitorDAOImpl.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.dao.impl; 2 | 3 | import com.traffic.spark.dao.IMonitorDAO; 4 | import com.traffic.spark.domain.MonitorState; 5 | import com.traffic.spark.domain.TopNMonitor2CarCount; 6 | import com.traffic.spark.domain.TopNMonitorDetailInfo; 7 | import com.traffic.spark.jdbc.JDBCHelper; 8 | 9 | import java.util.ArrayList; 10 | import java.util.List; 11 | 12 | /** 13 | * 卡口流量监控管理DAO实现类 14 | * @author root 15 | * 16 | */ 17 | 18 | public class MonitorDAOImpl implements IMonitorDAO { 19 | 20 | @Override 21 | //向数据库表 topn_monitor_car_count 中插入车流量最多的TopN数据 22 | public void insertBatchTopN(List topNMonitor2CarCounts) { 23 | JDBCHelper jdbcHelper = JDBCHelper.getInstance(); 24 | String sql = "INSERT INTO topn_monitor_car_count VALUES(?,?,?)"; 25 | List params = new ArrayList<>(); 26 | for (TopNMonitor2CarCount topNMonitor2CarCount : topNMonitor2CarCounts) { 27 | params.add(new Object[]{topNMonitor2CarCount.getTaskId(),topNMonitor2CarCount.getMonitorId(),topNMonitor2CarCount.getCarCount()}); 28 | } 29 | jdbcHelper.executeBatch(sql , params); 30 | } 31 | 32 | @Override 33 | //将topN的卡扣车流量明细数据 存入topn_monitor_detail_info 表中 34 | public void insertBatchMonitorDetails(List monitorDetailInfos) { 35 | JDBCHelper jdbcHelper = JDBCHelper.getInstance(); 36 | String sql = "INSERT INTO topn_monitor_detail_info VALUES(?,?,?,?,?,?,?,?)"; 37 | List params = new ArrayList<>(); 38 | for(TopNMonitorDetailInfo m : monitorDetailInfos){ 39 | params.add(new Object[]{m.getTaskId(),m.getDate(),m.getMonitorId(),m.getCameraId(),m.getCar(),m.getActionTime(),m.getSpeed(),m.getRoadId()}); 40 | } 41 | jdbcHelper.executeBatch(sql, params); 42 | } 43 | 44 | @Override 45 | //向数据库表monitor_state中添加累加器累计的各个值 46 | public void insertMonitorState(MonitorState monitorState) { 47 | JDBCHelper jdbcHelper = JDBCHelper.getInstance(); 48 | String sql = "INSERT INTO monitor_state VALUES(?,?,?,?,?,?)"; 49 | Object[] param = new Object[]{ 50 | monitorState.getTaskId(), 51 | monitorState.getNormalMonitorCount(), 52 | monitorState.getNormalCameraCount(), 53 | monitorState.getAbnormalMonitorCount(), 54 | monitorState.getAbnormalCameraCount(), 55 | monitorState.getAbnormalMonitorCameraInfos()}; 56 | List params = new ArrayList<>(); 57 | params.add(param); 58 | jdbcHelper.executeBatch(sql, params); 59 | } 60 | 61 | @Override 62 | public void insertBatchTop10Details(List topNMonitorDetailInfos) { 63 | JDBCHelper jdbcHelper = JDBCHelper.getInstance(); 64 | String sql = "INSERT INTO top10_speed_detail VALUES(?,?,?,?,?,?,?,?)"; 65 | List params = new ArrayList<>(); 66 | for(TopNMonitorDetailInfo m : topNMonitorDetailInfos){ 67 | params.add(new Object[]{m.getTaskId(),m.getDate(),m.getMonitorId(),m.getCameraId(),m.getCar(),m.getActionTime(),m.getSpeed(),m.getRoadId()}); 68 | } 69 | jdbcHelper.executeBatch(sql, params); 70 | } 71 | 72 | 73 | } 74 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/dao/impl/RandomExtractDAOImpl.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.dao.impl; 2 | 3 | import com.traffic.spark.dao.IRandomExtractDAO; 4 | import com.traffic.spark.domain.RandomExtractCar; 5 | import com.traffic.spark.domain.RandomExtractMonitorDetail; 6 | import com.traffic.spark.jdbc.JDBCHelper; 7 | 8 | import java.util.ArrayList; 9 | import java.util.List; 10 | 11 | /** 12 | * 随机抽取car信息管理DAO实现类 13 | * @author root 14 | * 15 | */ 16 | public class RandomExtractDAOImpl implements IRandomExtractDAO { 17 | 18 | @Override 19 | public void insertBatchRandomExtractCar(List carRandomExtracts) { 20 | JDBCHelper jdbcHelper = JDBCHelper.getInstance(); 21 | String sql = "INSERT INTO random_extract_car VALUES(?,?,?,?)"; 22 | List params = new ArrayList<>(); 23 | for (RandomExtractCar carRandomExtract : carRandomExtracts) { 24 | params.add(new Object[]{carRandomExtract.getTaskId(),carRandomExtract.getCar(),carRandomExtract.getDate(),carRandomExtract.getDateHour()}); 25 | } 26 | jdbcHelper.executeBatch(sql , params); 27 | } 28 | 29 | @Override 30 | public void insertBatchRandomExtractDetails(List randomExtractMonitorDetails) { 31 | JDBCHelper jdbcHelper = JDBCHelper.getInstance(); 32 | String sql = "INSERT INTO random_extract_car_detail_info VALUES(?,?,?,?,?,?,?,?)"; 33 | List params = new ArrayList<>(); 34 | for(RandomExtractMonitorDetail r : randomExtractMonitorDetails){ 35 | params.add(new Object[]{r.getTaskId(),r.getDate(),r.getMonitorId(),r.getCameraId(),r.getCar(),r.getActionTime(),r.getSpeed(),r.getRoadId()}); 36 | } 37 | jdbcHelper.executeBatch(sql, params); 38 | } 39 | 40 | } 41 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/dao/impl/TaskDAOImpl.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.dao.impl; 2 | 3 | import com.traffic.spark.dao.ITaskDAO; 4 | import com.traffic.spark.domain.Task; 5 | import com.traffic.spark.jdbc.JDBCHelper; 6 | 7 | import java.sql.ResultSet; 8 | 9 | /** 10 | * 任务管理DAO实现类 11 | * 12 | * @author root 13 | */ 14 | public class TaskDAOImpl implements ITaskDAO { 15 | @Override 16 | public Task findTaskById(long taskId) { 17 | final Task task = new Task(); 18 | 19 | String sql = "SELECT * FROM task WHERE task_id = ?"; 20 | 21 | Object[] params = new Object[]{taskId}; 22 | 23 | JDBCHelper jdbcHelper = JDBCHelper.getInstance(); 24 | jdbcHelper.executeQuery(sql, params, new JDBCHelper.QueryCallback() { 25 | 26 | @Override 27 | public void process(ResultSet rs) throws Exception { 28 | if (rs.next()) { 29 | long taskid = rs.getLong(1); 30 | String taskName = rs.getString(2); 31 | String createTime = rs.getString(3); 32 | String startTime = rs.getString(4); 33 | String finishTime = rs.getString(5); 34 | String taskType = rs.getString(6); 35 | String taskStatus = rs.getString(7); 36 | String taskParam = rs.getString(8); 37 | 38 | task.setTaskId(taskid); 39 | task.setTaskName(taskName); 40 | task.setCreateTime(createTime); 41 | task.setStartTime(startTime); 42 | task.setFinishTime(finishTime); 43 | task.setTaskType(taskType); 44 | task.setTaskStatus(taskStatus); 45 | task.setTaskParams(taskParam); 46 | } 47 | } 48 | }); 49 | return task; 50 | } 51 | 52 | } 53 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/dao/impl/WithTheCarDAOImpl.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.dao.impl; 2 | 3 | 4 | import com.traffic.spark.constant.Constants; 5 | import com.traffic.spark.dao.IWithTheCarDAO; 6 | import com.traffic.spark.jdbc.JDBCHelper; 7 | import com.traffic.spark.util.DateUtils; 8 | 9 | public class WithTheCarDAOImpl implements IWithTheCarDAO { 10 | 11 | @Override 12 | public void updateTestData(String cars) { 13 | JDBCHelper jdbcHelper = JDBCHelper.getInstance(); 14 | String sql = "UPDATE task set task_param = ? WHERE task_id = 3"; 15 | Object[] params = new Object[]{"{\"startDate\":[\"" + DateUtils.getTodayDate() + "\"],\"endDate\":[\"" + DateUtils.getTodayDate() + "\"],\"" + Constants.FIELD_CARS + "\":[\"" + cars + "\"]}"}; 16 | jdbcHelper.executeUpdate(sql, params); 17 | } 18 | 19 | } 20 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/domain/Area.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.domain; 2 | 3 | public class Area { 4 | private String areaId; 5 | private String areaName; 6 | public String getAreaId() { 7 | return areaId; 8 | } 9 | public void setAreaId(String areaId) { 10 | this.areaId = areaId; 11 | } 12 | public String getAreaName() { 13 | return areaName; 14 | } 15 | public void setAreaName(String areaName) { 16 | this.areaName = areaName; 17 | } 18 | 19 | public Area(String areaId, String areaName) { 20 | super(); 21 | this.areaId = areaId; 22 | this.areaName = areaName; 23 | } 24 | public Area() { 25 | super(); 26 | } 27 | } 28 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/domain/CarInfoPer5M.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.domain; 2 | 3 | public class CarInfoPer5M { 4 | private long taskId; 5 | private String monitorId; 6 | private String rangeTime; 7 | private String cars; 8 | public long getTaskId() { 9 | return taskId; 10 | } 11 | public void setTaskId(long taskId) { 12 | this.taskId = taskId; 13 | } 14 | public String getMonitorId() { 15 | return monitorId; 16 | } 17 | public void setMonitorId(String monitorId) { 18 | this.monitorId = monitorId; 19 | } 20 | public String getRangeTime() { 21 | return rangeTime; 22 | } 23 | public void setRangeTime(String rangeTime) { 24 | this.rangeTime = rangeTime; 25 | } 26 | public String getCars() { 27 | return cars; 28 | } 29 | public void setCars(String cars) { 30 | this.cars = cars; 31 | } 32 | public CarInfoPer5M(long taskId, String monitorId, String rangeTime, String cars) { 33 | super(); 34 | this.taskId = taskId; 35 | this.monitorId = monitorId; 36 | this.rangeTime = rangeTime; 37 | this.cars = cars; 38 | } 39 | public CarInfoPer5M() { 40 | super(); 41 | } 42 | } 43 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/domain/CarTrack.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.domain; 2 | 3 | /** 4 | * 保存车辆轨迹信息 domain 5 | * @author root 6 | * 7 | */ 8 | public class CarTrack { 9 | private long taskId; 10 | private String date; 11 | private String car; 12 | private String track; 13 | 14 | public String getDate() { 15 | return date; 16 | } 17 | public void setDate(String date) { 18 | this.date = date; 19 | } 20 | 21 | public long getTaskId() { 22 | return taskId; 23 | } 24 | public void setTaskId(long taskId) { 25 | this.taskId = taskId; 26 | } 27 | public String getCar() { 28 | return car; 29 | } 30 | public void setCar(String car) { 31 | this.car = car; 32 | } 33 | public String getTrack() { 34 | return track; 35 | } 36 | public void setTrack(String track) { 37 | this.track = track; 38 | } 39 | public CarTrack(long taskId, String date, String car, String track) { 40 | super(); 41 | this.taskId = taskId; 42 | this.date = date; 43 | this.car = car; 44 | this.track = track; 45 | } 46 | public CarTrack() { 47 | super(); 48 | } 49 | } 50 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/domain/MonitorState.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.domain; 2 | 3 | /** 4 | * 卡口状态 5 | * @author root 6 | * 7 | */ 8 | public class MonitorState { 9 | 10 | private long taskId; 11 | private String normalMonitorCount;//正常的卡扣个数 12 | private String normalCameraCount;//正常的摄像头个数 13 | private String abnormalMonitorCount;//不正常的卡扣个数 14 | private String abnormalCameraCount;//不正常的摄像头个数 15 | private String abnormalMonitorCameraInfos;//不正常的摄像头详细信息 16 | 17 | public long getTaskId() { 18 | return taskId; 19 | } 20 | public void setTaskId(long taskId) { 21 | this.taskId = taskId; 22 | } 23 | public String getNormalMonitorCount() { 24 | return normalMonitorCount; 25 | } 26 | public void setNormalMonitorCount(String normalMonitorCount) { 27 | this.normalMonitorCount = normalMonitorCount; 28 | } 29 | public String getNormalCameraCount() { 30 | return normalCameraCount; 31 | } 32 | public void setNormalCameraCount(String normalCameraCount) { 33 | this.normalCameraCount = normalCameraCount; 34 | } 35 | public String getAbnormalMonitorCount() { 36 | return abnormalMonitorCount; 37 | } 38 | public void setAbnormalMonitorCount(String abnormalMonitorCount) { 39 | this.abnormalMonitorCount = abnormalMonitorCount; 40 | } 41 | public String getAbnormalCameraCount() { 42 | return abnormalCameraCount; 43 | } 44 | public void setAbnormalCameraCount(String abnormalCameraCount) { 45 | this.abnormalCameraCount = abnormalCameraCount; 46 | } 47 | 48 | public MonitorState(long taskId, String normalMonitorCount, String normalCameraCount, String abnormalMonitorCount, String abnormalCameraCount, String abnormalMonitorCameraInfos) { 49 | super(); 50 | this.taskId = taskId; 51 | this.normalMonitorCount = normalMonitorCount; 52 | this.normalCameraCount = normalCameraCount; 53 | this.abnormalMonitorCount = abnormalMonitorCount; 54 | this.abnormalCameraCount = abnormalCameraCount; 55 | this.abnormalMonitorCameraInfos = abnormalMonitorCameraInfos; 56 | } 57 | public String getAbnormalMonitorCameraInfos() { 58 | return abnormalMonitorCameraInfos; 59 | } 60 | public void setAbnormalMonitorCameraInfos(String abnormalMonitorCameraInfos) { 61 | this.abnormalMonitorCameraInfos = abnormalMonitorCameraInfos; 62 | } 63 | public MonitorState() { 64 | super(); 65 | } 66 | } 67 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/domain/RandomExtractCar.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.domain; 2 | 3 | /** 4 | * 随机抽取出来的car信息 domain 5 | * @author root 6 | * 7 | */ 8 | public class RandomExtractCar { 9 | private long taskId; 10 | private String car; 11 | private String date; 12 | private String dateHour; 13 | 14 | public String getCar() { 15 | return car; 16 | } 17 | public void setCar(String car) { 18 | this.car = car; 19 | } 20 | 21 | public String getDateHour() { 22 | return dateHour; 23 | } 24 | public void setDateHour(String dateHour) { 25 | this.dateHour = dateHour; 26 | } 27 | public long getTaskId() { 28 | return taskId; 29 | } 30 | public void setTaskId(long taskId) { 31 | this.taskId = taskId; 32 | } 33 | 34 | public RandomExtractCar(long taskId, String car, String date, String dateHour) { 35 | super(); 36 | this.taskId = taskId; 37 | this.car = car; 38 | this.date = date; 39 | this.dateHour = dateHour; 40 | } 41 | public String getDate() { 42 | return date; 43 | } 44 | public void setDate(String date) { 45 | this.date = date; 46 | } 47 | public RandomExtractCar() { 48 | super(); 49 | } 50 | } 51 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/domain/RandomExtractMonitorDetail.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.domain; 2 | 3 | public class RandomExtractMonitorDetail { 4 | private long taskId; 5 | private String date; 6 | private String monitorId; 7 | private String cameraId; 8 | private String car; 9 | private String actionTime; 10 | private String speed; 11 | private String roadId; 12 | public long getTaskId() { 13 | return taskId; 14 | } 15 | public void setTaskId(long taskId) { 16 | this.taskId = taskId; 17 | } 18 | public String getDate() { 19 | return date; 20 | } 21 | public void setDate(String date) { 22 | this.date = date; 23 | } 24 | public String getMonitorId() { 25 | return monitorId; 26 | } 27 | public void setMonitorId(String monitorId) { 28 | this.monitorId = monitorId; 29 | } 30 | public String getCameraId() { 31 | return cameraId; 32 | } 33 | public void setCameraId(String cameraId) { 34 | this.cameraId = cameraId; 35 | } 36 | public String getCar() { 37 | return car; 38 | } 39 | public void setCar(String car) { 40 | this.car = car; 41 | } 42 | public String getActionTime() { 43 | return actionTime; 44 | } 45 | public void setActionTime(String actionTime) { 46 | this.actionTime = actionTime; 47 | } 48 | public String getSpeed() { 49 | return speed; 50 | } 51 | public void setSpeed(String speed) { 52 | this.speed = speed; 53 | } 54 | public String getRoadId() { 55 | return roadId; 56 | } 57 | public void setRoadId(String roadId) { 58 | this.roadId = roadId; 59 | } 60 | public RandomExtractMonitorDetail(long taskId, String date, String monitorId, String cameraId, String car, String actionTime, String speed, String roadId) { 61 | super(); 62 | this.taskId = taskId; 63 | this.date = date; 64 | this.monitorId = monitorId; 65 | this.cameraId = cameraId; 66 | this.car = car; 67 | this.actionTime = actionTime; 68 | this.speed = speed; 69 | this.roadId = roadId; 70 | } 71 | public RandomExtractMonitorDetail() { 72 | super(); 73 | } 74 | } 75 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/domain/Task.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.domain; 2 | 3 | import java.io.Serializable; 4 | 5 | /** 6 | * Taskr任务domain 7 | * @author root 8 | * 9 | */ 10 | public class Task implements Serializable { 11 | 12 | /** 13 | * 14 | */ 15 | private static final long serialVersionUID = 1L; 16 | 17 | private long taskId; 18 | private String taskName; 19 | private String createTime; 20 | private String startTime; 21 | private String finishTime; 22 | private String taskType; 23 | private String taskStatus; 24 | private String taskParams; 25 | 26 | public Task() { 27 | 28 | } 29 | public Task(long taskId, String taskName, String createTime, String startTime, String finishTime, String taskType, String taskStatus, String taskParams) { 30 | super(); 31 | this.taskId = taskId; 32 | this.taskName = taskName; 33 | this.createTime = createTime; 34 | this.startTime = startTime; 35 | this.finishTime = finishTime; 36 | this.taskType = taskType; 37 | this.taskStatus = taskStatus; 38 | this.taskParams = taskParams; 39 | } 40 | public long getTaskId() { 41 | return taskId; 42 | } 43 | public void setTaskId(long taskId) { 44 | this.taskId = taskId; 45 | } 46 | public String getTaskName() { 47 | return taskName; 48 | } 49 | public void setTaskName(String taskName) { 50 | this.taskName = taskName; 51 | } 52 | public String getCreateTime() { 53 | return createTime; 54 | } 55 | public void setCreateTime(String createTime) { 56 | this.createTime = createTime; 57 | } 58 | public String getStartTime() { 59 | return startTime; 60 | } 61 | public void setStartTime(String startTime) { 62 | this.startTime = startTime; 63 | } 64 | public String getFinishTime() { 65 | return finishTime; 66 | } 67 | public void setFinishTime(String finishTime) { 68 | this.finishTime = finishTime; 69 | } 70 | public String getTaskType() { 71 | return taskType; 72 | } 73 | public void setTaskType(String taskType) { 74 | this.taskType = taskType; 75 | } 76 | public String getTaskStatus() { 77 | return taskStatus; 78 | } 79 | public void setTaskStatus(String taskStatus) { 80 | this.taskStatus = taskStatus; 81 | } 82 | public String getTaskParams() { 83 | return taskParams; 84 | } 85 | public void setTaskParams(String taskParams) { 86 | this.taskParams = taskParams; 87 | } 88 | public static long getSerialversionuid() { 89 | return serialVersionUID; 90 | } 91 | } 92 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/domain/Top10SpeedPerMonitor.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.domain; 2 | 3 | public class Top10SpeedPerMonitor { 4 | 5 | } 6 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/domain/TopNMonitor2CarCount.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.domain; 2 | 3 | /** 4 | * 获取车流量排名前N的卡口 5 | * @author root 6 | * 7 | */ 8 | public class TopNMonitor2CarCount { 9 | private long taskId; 10 | private String monitorId; 11 | private int carCount; 12 | 13 | public String getMonitorId() { 14 | return monitorId; 15 | } 16 | public void setMonitorId(String monitorId) { 17 | this.monitorId = monitorId; 18 | } 19 | 20 | public TopNMonitor2CarCount(long taskId, String monitorId, int carCount) { 21 | super(); 22 | this.taskId = taskId; 23 | this.monitorId = monitorId; 24 | this.carCount = carCount; 25 | } 26 | public int getCarCount() { 27 | return carCount; 28 | } 29 | public void setCarCount(int carCount) { 30 | this.carCount = carCount; 31 | } 32 | public long getTaskId() { 33 | return taskId; 34 | } 35 | public void setTaskId(long taskId) { 36 | this.taskId = taskId; 37 | } 38 | public TopNMonitor2CarCount() { 39 | 40 | } 41 | } 42 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/domain/TopNMonitorDetailInfo.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.domain; 2 | 3 | 4 | /** 5 | * 卡口的明细数据domain 6 | * @author root 7 | * 8 | */ 9 | public class TopNMonitorDetailInfo { 10 | private long taskId; 11 | private String date; 12 | private String monitorId; 13 | private String cameraId; 14 | private String car; 15 | private String actionTime; 16 | private String speed; 17 | private String roadId; 18 | public long getTaskId() { 19 | return taskId; 20 | } 21 | public void setTaskId(long taskId) { 22 | this.taskId = taskId; 23 | } 24 | public String getDate() { 25 | return date; 26 | } 27 | public void setDate(String date) { 28 | this.date = date; 29 | } 30 | public String getMonitorId() { 31 | return monitorId; 32 | } 33 | public void setMonitorId(String monitorId) { 34 | this.monitorId = monitorId; 35 | } 36 | public String getCameraId() { 37 | return cameraId; 38 | } 39 | public void setCameraId(String cameraId) { 40 | this.cameraId = cameraId; 41 | } 42 | public String getCar() { 43 | return car; 44 | } 45 | public void setCar(String car) { 46 | this.car = car; 47 | } 48 | public String getActionTime() { 49 | return actionTime; 50 | } 51 | public void setActionTime(String actionTime) { 52 | this.actionTime = actionTime; 53 | } 54 | public String getSpeed() { 55 | return speed; 56 | } 57 | public void setSpeed(String speed) { 58 | this.speed = speed; 59 | } 60 | public String getRoadId() { 61 | return roadId; 62 | } 63 | public void setRoadId(String roadId) { 64 | this.roadId = roadId; 65 | } 66 | public TopNMonitorDetailInfo(long taskId, String date, String monitorId, String cameraId, String car, String actionTime, String speed, String roadId) { 67 | super(); 68 | this.taskId = taskId; 69 | this.date = date; 70 | this.monitorId = monitorId; 71 | this.cameraId = cameraId; 72 | this.car = car; 73 | this.actionTime = actionTime; 74 | this.speed = speed; 75 | this.roadId = roadId; 76 | } 77 | public TopNMonitorDetailInfo() { 78 | super(); 79 | } 80 | } 81 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/jdbc/JDBCHelper.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.jdbc; 2 | 3 | 4 | import com.traffic.spark.conf.ConfigurationManager; 5 | import com.traffic.spark.constant.Constants; 6 | 7 | import java.sql.Connection; 8 | import java.sql.DriverManager; 9 | import java.sql.PreparedStatement; 10 | import java.sql.ResultSet; 11 | import java.util.LinkedList; 12 | import java.util.List; 13 | 14 | 15 | /** 16 | * JDBC辅助组件 17 | * 18 | * 在正式的项目的代码编写过程中,是完全严格按照大公司的coding标准来的 19 | * 也就是说,在代码中,是不能出现任何hard code(硬编码)的字符 20 | * 比如“张三”、“com.mysql.jdbc.Driver” 21 | * 所有这些东西,都需要通过常量来封装和使用 22 | * 23 | * @author Administrator 24 | * 25 | */ 26 | public class JDBCHelper { 27 | 28 | // 第一步:在静态代码块中,直接加载数据库的驱动 29 | // 加载驱动,不是直接简单的,使用com.mysql.jdbc.Driver就可以了 30 | // 之所以说,不要硬编码,他的原因就在于这里 31 | // 32 | // com.mysql.jdbc.Driver只代表了MySQL数据库的驱动 33 | // 那么,如果有一天,我们的项目底层的数据库要进行迁移,比如迁移到Oracle 34 | // 或者是DB2、SQLServer 35 | // 那么,就必须很费劲的在代码中,找,找到硬编码了com.mysql.jdbc.Driver的地方,然后改成其他数据库的驱动类的类名 36 | // 所以正规项目,是不允许硬编码的,那样维护成本很高 37 | // 38 | // 通常,我们都是用一个常量接口中的某个常量,来代表一个值 39 | // 然后在这个值改变的时候,只要改变常量接口中的常量对应的值就可以了 40 | // 41 | // 项目,要尽量做成可配置的 42 | // 就是说,我们的这个数据库驱动,更进一步,也不只是放在常量接口中就可以了 43 | // 最好的方式,是放在外部的配置文件中,跟代码彻底分离 44 | // 常量接口中,只是包含了这个值对应的key的名字 45 | static { 46 | try { 47 | String driver = ConfigurationManager.getProperty(Constants.JDBC_DRIVER); 48 | Class.forName(driver); 49 | } catch (Exception e) { 50 | e.printStackTrace(); 51 | } 52 | } 53 | 54 | // 第二步,实现JDBCHelper的单例化 55 | // 为什么要实现单例化呢?因为它的内部要封装一个简单的内部的数据库连接池 56 | // 为了保证数据库连接池有且仅有一份,所以就通过单例的方式 57 | // 保证JDBCHelper只有一个实例,实例中只有一份数据库连接池 58 | private static JDBCHelper instance = null; 59 | 60 | /** 61 | * 获取单例 62 | * @return 单例 63 | */ 64 | public static JDBCHelper getInstance() { 65 | if(instance == null) { 66 | synchronized(JDBCHelper.class) { 67 | if(instance == null) { 68 | instance = new JDBCHelper(); 69 | } 70 | } 71 | } 72 | return instance; 73 | } 74 | 75 | // 数据库连接池 76 | private LinkedList datasource = new LinkedList(); 77 | 78 | /** 79 | * 80 | * 第三步:实现单例的过程中,创建唯一的数据库连接池 81 | * 82 | * 私有化构造方法 83 | * 84 | * JDBCHelper在整个程序运行声明周期中,只会创建一次实例 85 | * 在这一次创建实例的过程中,就会调用JDBCHelper()构造方法 86 | * 此时,就可以在构造方法中,去创建自己唯一的一个数据库连接池 87 | * 88 | */ 89 | private JDBCHelper() { 90 | // 首先第一步,获取数据库连接池的大小,就是说,数据库连接池中要放多少个数据库连接 91 | // 这个,可以通过在配置文件中配置的方式,来灵活的设定 92 | int datasourceSize = ConfigurationManager.getInteger( 93 | Constants.JDBC_DATASOURCE_SIZE); 94 | 95 | // 然后创建指定数量的数据库连接,并放入数据库连接池中 96 | for(int i = 0; i < datasourceSize; i++) { 97 | boolean local = ConfigurationManager.getBoolean(Constants.SPARK_LOCAL); 98 | String url = null; 99 | String user = null; 100 | String password = null; 101 | 102 | if(local) { 103 | url = ConfigurationManager.getProperty(Constants.JDBC_URL); 104 | user = ConfigurationManager.getProperty(Constants.JDBC_USER); 105 | password = ConfigurationManager.getProperty(Constants.JDBC_PASSWORD); 106 | } else { 107 | url = ConfigurationManager.getProperty(Constants.JDBC_URL_PROD); 108 | user = ConfigurationManager.getProperty(Constants.JDBC_USER_PROD); 109 | password = ConfigurationManager.getProperty(Constants.JDBC_PASSWORD_PROD); 110 | } 111 | 112 | try { 113 | Connection conn = DriverManager.getConnection(url, user, password); 114 | //向链表头中放入一个元素 115 | datasource.push(conn); 116 | } catch (Exception e) { 117 | e.printStackTrace(); 118 | } 119 | } 120 | } 121 | 122 | /** 123 | * 第四步,提供获取数据库连接的方法 124 | * 有可能,你去获取的时候,这个时候,连接都被用光了,你暂时获取不到数据库连接 125 | * 所以我们要自己编码实现一个简单的等待机制,去等待获取到数据库连接 126 | * 127 | */ 128 | public synchronized Connection getConnection() { 129 | while(datasource.size() == 0) { 130 | try { 131 | Thread.sleep(10); 132 | } catch (InterruptedException e) { 133 | e.printStackTrace(); 134 | } 135 | } 136 | //检索并移除此列表的头元素(第一个元素) 137 | return datasource.poll(); 138 | } 139 | 140 | /** 141 | * 第五步:开发增删改查的方法 142 | * 1、执行增删改SQL语句的方法 143 | * 2、执行查询SQL语句的方法 144 | * 3、批量执行SQL语句的方法 145 | */ 146 | 147 | /** 148 | * 执行增删改SQL语句,返回影响的行数 149 | * @param sql 150 | * @param params 151 | * @return 影响的行数 152 | */ 153 | public int executeUpdate(String sql, Object[] params) { 154 | int rtn = 0; 155 | Connection conn = null; 156 | PreparedStatement pstmt = null; 157 | 158 | try { 159 | conn = getConnection(); 160 | conn.setAutoCommit(false); 161 | 162 | pstmt = conn.prepareStatement(sql); 163 | 164 | if(params != null && params.length > 0) { 165 | for(int i = 0; i < params.length; i++) { 166 | pstmt.setObject(i + 1, params[i]); 167 | } 168 | } 169 | 170 | rtn = pstmt.executeUpdate(); 171 | 172 | conn.commit(); 173 | } catch (Exception e) { 174 | e.printStackTrace(); 175 | } finally { 176 | if(conn != null) { 177 | datasource.push(conn); 178 | } 179 | } 180 | 181 | return rtn; 182 | } 183 | 184 | /** 185 | * 执行查询SQL语句 186 | * @param sql 187 | * @param params 188 | * @param callback 189 | */ 190 | public void executeQuery(String sql, Object[] params, 191 | QueryCallback callback) { 192 | Connection conn = null; 193 | PreparedStatement pstmt = null; 194 | ResultSet rs = null; 195 | 196 | try { 197 | conn = getConnection(); 198 | pstmt = conn.prepareStatement(sql); 199 | 200 | if(params != null && params.length > 0) { 201 | for(int i = 0; i < params.length; i++) { 202 | pstmt.setObject(i + 1, params[i]); 203 | } 204 | } 205 | 206 | rs = pstmt.executeQuery(); 207 | 208 | callback.process(rs); 209 | } catch (Exception e) { 210 | e.printStackTrace(); 211 | } finally { 212 | if(conn != null) { 213 | datasource.push(conn); 214 | } 215 | } 216 | } 217 | 218 | /** 219 | * 批量执行SQL语句 220 | * 221 | * 批量执行SQL语句,是JDBC中的一个高级功能 222 | * 默认情况下,每次执行一条SQL语句,就会通过网络连接,向MySQL发送一次请求 223 | * 224 | * 但是,如果在短时间内要执行多条结构完全一模一样的SQL,只是参数不同 225 | * 虽然使用PreparedStatement这种方式,可以只编译一次SQL,提高性能,但是,还是对于每次SQL 226 | * 都要向MySQL发送一次网络请求 227 | * 228 | * 可以通过批量执行SQL语句的功能优化这个性能 229 | * 一次性通过PreparedStatement发送多条SQL语句,比如100条、1000条,甚至上万条 230 | * 执行的时候,也仅仅编译一次就可以 231 | * 这种批量执行SQL语句的方式,可以大大提升性能 232 | * 233 | * @param sql 234 | * @param paramsList 235 | * @return 每条SQL语句影响的行数 236 | */ 237 | public int[] executeBatch(String sql, List paramsList) { 238 | int[] rtn = null; 239 | Connection conn = null; 240 | PreparedStatement pstmt = null; 241 | 242 | try { 243 | conn = getConnection(); 244 | 245 | // 第一步:使用Connection对象,取消自动提交 246 | conn.setAutoCommit(false); 247 | 248 | pstmt = conn.prepareStatement(sql); 249 | 250 | // 第二步:使用PreparedStatement.addBatch()方法加入批量的SQL参数 251 | if(paramsList != null && paramsList.size() > 0) { 252 | for(Object[] params : paramsList) { 253 | for(int i = 0; i < params.length; i++) { 254 | pstmt.setObject(i + 1, params[i]); 255 | } 256 | pstmt.addBatch(); 257 | } 258 | } 259 | 260 | // 第三步:使用PreparedStatement.executeBatch()方法,执行批量的SQL语句 261 | rtn = pstmt.executeBatch(); 262 | 263 | // 最后一步:使用Connection对象,提交批量的SQL语句 264 | conn.commit(); 265 | } catch (Exception e) { 266 | e.printStackTrace(); 267 | } finally { 268 | if(conn != null) { 269 | datasource.push(conn); 270 | } 271 | } 272 | 273 | return rtn; 274 | } 275 | 276 | /** 277 | * 静态内部类:查询回调接口 278 | * @author Administrator 279 | * 280 | */ 281 | public static interface QueryCallback { 282 | 283 | /** 284 | * 处理查询结果 285 | * @param rs 286 | * @throws Exception 287 | */ 288 | void process(ResultSet rs) throws Exception; 289 | 290 | } 291 | 292 | } 293 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/skynet/MonitorAndCameraStateAccumulator.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.skynet; 2 | 3 | import com.traffic.spark.constant.Constants; 4 | import com.traffic.spark.util.StringUtils; 5 | import org.apache.spark.AccumulatorParam; 6 | 7 | /** 8 | * 自定义累加器要实现AccumulatorParam接口 9 | * 10 | * @author root 11 | */ 12 | public class MonitorAndCameraStateAccumulator implements AccumulatorParam { 13 | 14 | /** 15 | * 16 | */ 17 | private static final long serialVersionUID = 1L; 18 | 19 | 20 | /** 21 | * 初始化RDD每个分区的值 22 | * init:调用SparkContext.accululator时传递的initialValue,就是"" 23 | * return返回累加器每个分区中的初始值。 24 | */ 25 | @Override 26 | public String zero(String init) { 27 | /** 28 | * "normalMonitorCount=0|normalCameraCount=0|abnormalMonitorCount=0|abnormalCameraCount=0|abnormalMonitorCameraInfos=' '" 29 | */ 30 | return Constants.FIELD_NORMAL_MONITOR_COUNT + "=0|" 31 | + Constants.FIELD_NORMAL_CAMERA_COUNT + "=0|" 32 | + Constants.FIELD_ABNORMAL_MONITOR_COUNT + "=0|" 33 | + Constants.FIELD_ABNORMAL_CAMERA_COUNT + "=0|" 34 | + Constants.FIELD_ABNORMAL_MONITOR_CAMERA_INFOS + "= " + init; 35 | } 36 | 37 | 38 | /** 39 | * v1就是上次累加后的结果,第一次调用的时候就是zero方法return的值,v2是传进来的字符串 40 | *

41 | * v1:normalMonitorCount=0|normalCameraCount=0|abnormalMonitorCount=0|abnormalCameraCount=0|abnormalMonitorCameraInfos=' ' 42 | * v2:abnormalMonitorCount=1|abnormalCameraCount=3|abnormalMonitorCameraInfos="0002":07553,07554,07556 43 | **/ 44 | @Override 45 | public String addAccumulator(String v1, String v2) { 46 | return myAdd(v1, v2); 47 | } 48 | 49 | /** 50 | * addAccumulator方法之后,最后会执行这个方法,将每个分区最后的value加到初始化的值。 51 | * 这里的initValue就是我们初始化的值那个“”。v2是已经经过addAccumulator这个方法累加后每个分区处理的值。 52 | */ 53 | @Override 54 | public String addInPlace(String initValue, String v2) { 55 | // System.out.println("initValue ="+initValue); 56 | // System.out.println("v2 ="+v2); 57 | return myAdd(initValue, v2); 58 | } 59 | 60 | /** 61 | * @param v1 连接串,上次累加后的结果 62 | * @param v2 本次累加传入的值 63 | * @return 更新以后的连接串 64 | *

65 | * v1:normalMonitorCount=0|normalCameraCount=0|abnormalMonitorCount=1|abnormalCameraCount=3| 66 | * abnormalMonitorCameraInfos= ~"0002":07553,07554,07556~"0008":11111,22222~"0004":07553,07554,07556~"0000":12891,13024 67 | * v2:normalMonitorCount=8|normalCameraCount=3|abnormalMonitorCount=2|abnormalCameraCount=4| 68 | * abnormalMonitorCameraInfos= 69 | */ 70 | private String myAdd(String v1, String v2) { 71 | if (StringUtils.isEmpty(v1)) { 72 | return v2; 73 | } 74 | String[] valArr = v2.split("\\|"); 75 | for (String string : valArr) { 76 | String[] fieldAndValArr = string.split("="); 77 | String field = fieldAndValArr[0]; 78 | String value = fieldAndValArr[1]; 79 | String oldVal = StringUtils.getFieldFromConcatString(v1, "\\|", field); 80 | if (oldVal != null) { 81 | //只有这个字段是string,所以单独拿出来 82 | if (Constants.FIELD_ABNORMAL_MONITOR_CAMERA_INFOS.equals(field)) { 83 | if (value.startsWith(" ~")) { 84 | value = value.substring(2); 85 | } 86 | v1 = StringUtils.setFieldInConcatString(v1, "\\|", field, oldVal + "~" + value); 87 | } else { 88 | //其余都是int类型,直接加减就可以 89 | int newVal = Integer.parseInt(oldVal) + Integer.parseInt(value); 90 | v1 = StringUtils.setFieldInConcatString(v1, "\\|", field, String.valueOf(newVal)); 91 | } 92 | } 93 | } 94 | return v1; 95 | } 96 | } 97 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/skynet/MonitorAndCameraStateAccumulator2.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.skynet; 2 | 3 | import org.apache.spark.util.AccumulatorV2; 4 | 5 | /** 6 | * Spark2 自定义累加器 7 | * @author tianhao 8 | */ 9 | public class MonitorAndCameraStateAccumulator2 extends AccumulatorV2 { 10 | /** 11 | * 判断是否为初始值 12 | * 13 | * @return 14 | */ 15 | @Override 16 | public boolean isZero() { 17 | return false; 18 | } 19 | 20 | 21 | /** 22 | * 拷贝累加器 23 | * 24 | * @return 25 | */ 26 | @Override 27 | public AccumulatorV2 copy() { 28 | return null; 29 | } 30 | 31 | /** 32 | * 重置累加器中的值 33 | */ 34 | @Override 35 | public void reset() { 36 | 37 | } 38 | 39 | /** 40 | * 获取累加器中的值 41 | * 42 | * @return 43 | */ 44 | @Override 45 | public String value() { 46 | return null; 47 | } 48 | 49 | /** 50 | * 对各个task的累加器进行合并 51 | * 52 | * @param other 53 | */ 54 | @Override 55 | public void merge(AccumulatorV2 other) { 56 | 57 | } 58 | 59 | @Override 60 | public void add(String v) { 61 | 62 | } 63 | } 64 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/skynet/MonitorCarTrack.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.skynet; 2 | 3 | import com.alibaba.fastjson.JSONObject; 4 | import com.traffic.load.data.MockData; 5 | import com.traffic.spark.conf.ConfigurationManager; 6 | import com.traffic.spark.constant.Constants; 7 | import com.traffic.spark.dao.ITaskDAO; 8 | import com.traffic.spark.dao.factory.DAOFactory; 9 | import com.traffic.spark.domain.Task; 10 | import com.traffic.spark.util.DateUtils; 11 | import com.traffic.spark.util.ParamUtils; 12 | import com.traffic.spark.util.SparkUtils; 13 | import org.apache.spark.SparkConf; 14 | import org.apache.spark.api.java.JavaPairRDD; 15 | import org.apache.spark.api.java.JavaRDD; 16 | import org.apache.spark.api.java.JavaSparkContext; 17 | import org.apache.spark.api.java.function.Function; 18 | import org.apache.spark.api.java.function.PairFunction; 19 | import org.apache.spark.api.java.function.VoidFunction; 20 | import org.apache.spark.broadcast.Broadcast; 21 | import org.apache.spark.sql.Row; 22 | import org.apache.spark.sql.SparkSession; 23 | import scala.Tuple2; 24 | 25 | import java.util.*; 26 | 27 | /* 28 | 京F88622 : 0006-->0003-->0008-->0006-->0001-->0008-->0007-->0000-->0007-->0001-->0002-->0004-->0006-->0005-->0005-->0003-->0007-->0000-->0004-->0001-->0001-->0004 29 | 深E66902 : 0007-->0007-->0004-->0005-->0005-->0001-->0004-->0003-->0007 30 | 京W11471 : 0007-->0007-->0001-->0006-->0001-->0005-->0003-->0007-->0001-->0003-->0006-->0008-->0006-->0005-->0003-->0002-->0000-->0006-->0006-->0003-->0000-->0000-->0004-->0000-->0005-->0003-->0001-->0000-->0000-->0000-->0000-->0007-->0000-->0000-->0008 31 | 京C49161 : 0001-->0007-->0004-->0003-->0001-->0008-->0007-->0001-->0007-->0004-->0002-->0004-->0005-->0002-->0005-->0006-->0004-->0003-->0001-->0000-->0000-->0002-->0008-->0005-->0007-->0007-->0000 32 | 33 | */ 34 | public class MonitorCarTrack { 35 | 36 | public static void main(String[] args) { 37 | 38 | /** 39 | * 判断应用程序是否在本地执行 40 | */ 41 | JavaSparkContext sc = null; 42 | SparkSession spark = null; 43 | Boolean onLocal = ConfigurationManager.getBoolean(Constants.SPARK_LOCAL); 44 | 45 | if(onLocal){ 46 | // 构建Spark运行时的环境参数 47 | SparkConf conf = new SparkConf() 48 | .setAppName(Constants.SPARK_APP_NAME) 49 | // .set("spark.sql.shuffle.partitions", "300") 50 | // .set("spark.default.parallelism", "100") 51 | // .set("spark.storage.memoryFraction", "0.5") 52 | // .set("spark.shuffle.consolidateFiles", "true") 53 | // .set("spark.shuffle.file.buffer", "64") 54 | // .set("spark.shuffle.memoryFraction", "0.3") 55 | // .set("spark.reducer.maxSizeInFlight", "96") 56 | // .set("spark.shuffle.io.maxRetries", "60") 57 | // .set("spark.shuffle.io.retryWait", "60") 58 | // .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") 59 | // .registerKryoClasses(new Class[]{SpeedSortKey.class}) 60 | ; 61 | /** 62 | * 设置spark运行时的master 根据配置文件来决定的 63 | */ 64 | conf.setMaster("local"); 65 | sc = new JavaSparkContext(conf); 66 | 67 | spark = SparkSession.builder().getOrCreate(); 68 | /** 69 | * 基于本地测试生成模拟测试数据,如果在集群中运行的话,直接操作Hive中的表就可以 70 | * 本地模拟数据注册成一张临时表 71 | * monitor_flow_action 数据表:监控车流量所有数据 72 | * monitor_camera_info 标准表:卡扣对应摄像头标准表 73 | */ 74 | MockData.mock(sc, spark); 75 | }else{ 76 | System.out.println("++++++++++++++++++++++++++++++++++++++开启hive的支持"); 77 | spark = SparkSession.builder().enableHiveSupport().getOrCreate(); 78 | spark.sql("use traffic"); 79 | } 80 | 81 | // SparkConf sparkConf = new SparkConf().setAppName(Constants.SPARK_APP_NAME); 82 | // SparkUtils.setMaster(sparkConf); 83 | // 84 | // JavaSparkContext sc = new JavaSparkContext(sparkConf); 85 | // SparkSession spark = SparkUtils.getSQLContext(sc); 86 | // 87 | // if (ConfigurationManager.getBoolean(Constants.SPARK_LOCAL)) { 88 | // MockData.mock(sc, spark); 89 | // } else { 90 | // spark.sql("use traffic"); 91 | // } 92 | 93 | 94 | 95 | 96 | long taskId = ParamUtils.getTaskIdFromArgs(args, Constants.SPARK_LOCAL_TASKID_MONITOR); 97 | if (taskId == 0L) { 98 | System.out.println("args is null"); 99 | System.exit(-1); 100 | } 101 | 102 | ITaskDAO taskDAO = DAOFactory.getTaskDAO(); 103 | Task task = taskDAO.findTaskById(taskId); 104 | 105 | if (task == null) { 106 | System.exit(-1); 107 | } 108 | // 处理从task表中获得的参数 109 | JSONObject taskParamsJsonObject = JSONObject.parseObject(task.getTaskParams()); 110 | 111 | // 获得row类型的RDD : 2018-01-23 0001 91631 京U16332 2018-01-23 00:27:58 202 5 06 112 | JavaRDD cameraRDD = SparkUtils.getCameraRDDByDateRange(spark, taskParamsJsonObject); 113 | cameraRDD.cache(); 114 | 115 | // 获取map类型RDD: monitorID -> car 0001 : 京U16332 116 | JavaPairRDD monitor2CarRDD = getMonitor2CarRDD(cameraRDD); 117 | 118 | // 过滤掉monitorId不等于0001的数据 0001 : 京U16332 119 | JavaPairRDD filteredMonitor2CarRDD = getFilteredMonitor2CarRDD(monitor2CarRDD); 120 | 121 | // 去掉monitorId, 只保留car 京U16332 122 | JavaRDD carRDD = getCarRDD(filteredMonitor2CarRDD); 123 | 124 | // 去重,然后获得car到集合中 [京U16332,京U16332...] 125 | List carList = carRDD.distinct().collect(); 126 | 127 | // 使得executor端能够获得carList 128 | final Broadcast> carListBroadcast = sc.broadcast(carList); 129 | 130 | // print the reuslt 131 | // carList.forEach(System.out::println); 132 | 133 | // 开启另一条支线 134 | // 获得map类型RDD: car -> row 京U16332 : 2018-01-23 0001 91631 京U16332 2018-01-23 00:27:58 202 5 06 135 | JavaPairRDD car2RowRDD = getCar2RowRDD(cameraRDD); 136 | 137 | // 保留 0001卡扣下通过车辆的 信息 138 | JavaPairRDD filterCar2RowRDD = getFilteredCar2RowRDD(car2RowRDD, carListBroadcast); 139 | 140 | // car相同的放在一组中 141 | JavaPairRDD> car2RowsRDD = filterCar2RowRDD.groupByKey(); 142 | 143 | // 按时间排序,获得car -> monitor_ids 京U53611 : 0002-->0003-->0007-->0001-->0008-->0005-->0003-->0004-->0003 144 | JavaPairRDD car2MonitorsRDD = getCar2MonitorsRDD(car2RowsRDD); 145 | 146 | // 打印结果 147 | car2MonitorsRDD.foreach(new VoidFunction>() { 148 | /** 149 | * 150 | */ 151 | private static final long serialVersionUID = 1L; 152 | 153 | @Override 154 | public void call(Tuple2 stringStringTuple2) throws Exception { 155 | String carId = stringStringTuple2._1; 156 | String track = stringStringTuple2._2; 157 | System.out.println(carId + "\t: " + track); 158 | } 159 | }); 160 | } 161 | 162 | private static JavaPairRDD getCar2MonitorsRDD(JavaPairRDD> car2RowsRDD) { 163 | return car2RowsRDD.mapToPair(new PairFunction>, String, String>() { 164 | private static final long serialVersionUID = 1L; 165 | @Override 166 | public Tuple2 call(Tuple2> stringIterableTuple2) throws Exception { 167 | String carId = stringIterableTuple2._1; 168 | Iterator iter = stringIterableTuple2._2.iterator(); 169 | List rows = new ArrayList<>(); 170 | while (iter.hasNext()) { 171 | rows.add(iter.next()); 172 | } 173 | 174 | Collections.sort(rows, new Comparator() { 175 | @Override 176 | public int compare(Row o1, Row o2) { 177 | if (DateUtils.before(o1.getAs("action_time")+"", o2.getAs("action_time")+"")) { 178 | return -1; 179 | } else 180 | return 1; 181 | } 182 | }); 183 | StringBuilder stringBuilder = new StringBuilder(); 184 | for (Row row : rows) { 185 | stringBuilder.append((String)row.getAs("monitor_id")); 186 | stringBuilder.append("-->"); 187 | } 188 | return new Tuple2<>(carId, stringBuilder.substring(0, stringBuilder.length() - 3)); 189 | } 190 | }); 191 | } 192 | 193 | private static JavaPairRDD getFilteredCar2RowRDD(JavaPairRDD car2RowRDD, final Broadcast> carListBroadcast) { 194 | return car2RowRDD.filter(new Function, Boolean>() { 195 | private static final long serialVersionUID = 1L; 196 | @Override 197 | public Boolean call(Tuple2 v1) throws Exception { 198 | String carId = v1._1; 199 | List carList = carListBroadcast.value(); 200 | return carList.contains(carId); 201 | } 202 | }); 203 | } 204 | 205 | private static JavaPairRDD getCar2RowRDD(JavaRDD cameraRDD) { 206 | return cameraRDD.mapToPair(new PairFunction() { 207 | private static final long serialVersionUID = 1L; 208 | @Override 209 | public Tuple2 call(Row row) throws Exception { 210 | return new Tuple2<>((String)row.getAs("car"), row); 211 | } 212 | }); 213 | } 214 | 215 | private static JavaRDD getCarRDD(JavaPairRDD monitor2CarRDD) { 216 | return monitor2CarRDD.map(new Function, String>() { 217 | private static final long serialVersionUID = 1L; 218 | @Override 219 | public String call(Tuple2 v1) throws Exception { 220 | return v1._2; 221 | } 222 | }); 223 | } 224 | 225 | private static JavaPairRDD getFilteredMonitor2CarRDD(JavaPairRDD monitor2CarRDD) { 226 | return monitor2CarRDD.filter(new Function, Boolean>() { 227 | private static final long serialVersionUID = 1L; 228 | @Override 229 | public Boolean call(Tuple2 v1) throws Exception { 230 | return v1._1.equals("0001"); 231 | } 232 | }); 233 | } 234 | 235 | private static JavaPairRDD getMonitor2CarRDD(JavaRDD cameraRDD) { 236 | JavaPairRDD monitor2CarRDD = cameraRDD.mapToPair(new PairFunction() { 237 | private static final long serialVersionUID = 1L; 238 | @Override 239 | public Tuple2 call(Row row) throws Exception { 240 | return new Tuple2<>((String)row.getAs("monitor_id"), (String)row.getAs("car")); 241 | } 242 | }); 243 | return monitor2CarRDD; 244 | } 245 | } 246 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/skynet/SelfDefineAccumulator.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.skynet; 2 | 3 | import com.traffic.spark.constant.Constants; 4 | import com.traffic.spark.util.StringUtils; 5 | import org.apache.spark.util.AccumulatorV2; 6 | 7 | /** 8 | * Spark累加器 9 | */ 10 | public class SelfDefineAccumulator extends AccumulatorV2 { 11 | String returnResult = ""; 12 | 13 | /** 14 | * 与reset() 方法中保持一致,返回true。 15 | * 16 | * @return 17 | */ 18 | @Override 19 | public boolean isZero() { 20 | //normalMonitorCount=0|normalCameraCount=0|abnormalMonitorCount=0|abnormalCameraCount=0|abnormalMonitorCameraInfos= 21 | return "normalMonitorCount=0|normalCameraCount=0|abnormalMonitorCount=0|abnormalCameraCount=0|abnormalMonitorCameraInfos= ".equals(returnResult); 22 | } 23 | 24 | @Override 25 | public AccumulatorV2 copy() { 26 | SelfDefineAccumulator acc = new SelfDefineAccumulator(); 27 | acc.returnResult = this.returnResult; 28 | return acc; 29 | } 30 | 31 | /** 32 | * 每个分区初始值 33 | */ 34 | @Override 35 | public void reset() { 36 | //normalMonitorCount=0|normalCameraCount=0|abnormalMonitorCount=0|abnormalCameraCount=0|abnormalMonitorCameraInfos= 37 | returnResult = Constants.FIELD_NORMAL_MONITOR_COUNT + "=0|" 38 | + Constants.FIELD_NORMAL_CAMERA_COUNT + "=0|" 39 | + Constants.FIELD_ABNORMAL_MONITOR_COUNT + "=0|" 40 | + Constants.FIELD_ABNORMAL_CAMERA_COUNT + "=0|" 41 | + Constants.FIELD_ABNORMAL_MONITOR_CAMERA_INFOS + "= "; 42 | } 43 | 44 | /** 45 | * 每个分区会拿着 reset 初始化的值 ,在各自的分区内相加 46 | * 47 | * @param v 48 | */ 49 | @Override 50 | public void add(String v) { 51 | // System.out.println("add returnResult ="+returnResult+", v="+v); 52 | returnResult = myAdd(returnResult, v); 53 | } 54 | 55 | /** 56 | * 每个分区最终的结果和初始值 returnResult="" 做累加 57 | * 58 | * @param other 59 | */ 60 | @Override 61 | public void merge(AccumulatorV2 other) { 62 | //这里初始值就是 "" ,每个分区之后都是一个大的字符串 63 | 64 | SelfDefineAccumulator accumulator = (SelfDefineAccumulator) other; 65 | // System.out.println("merge returnResult="+returnResult+" , accumulator.returnResult="+accumulator.returnResult); 66 | returnResult = myAdd(returnResult, accumulator.returnResult); 67 | } 68 | 69 | @Override 70 | public String value() { 71 | return returnResult; 72 | } 73 | 74 | 75 | private String myAdd(String v1, String v2) { 76 | // System.out.println("myAdd v1="+v1); 77 | // System.out.println("myAdd v2="+v2); 78 | if (StringUtils.isEmpty(v1)) { 79 | return v2; 80 | } 81 | String[] valArr = v2.split("\\|"); 82 | for (String string : valArr) { 83 | String[] fieldAndValArr = string.split("="); 84 | String field = fieldAndValArr[0]; 85 | String value = fieldAndValArr[1]; 86 | String oldVal = StringUtils.getFieldFromConcatString(v1, "\\|", field); 87 | if (oldVal != null) { 88 | //只有这个字段是string,所以单独拿出来 89 | if (Constants.FIELD_ABNORMAL_MONITOR_CAMERA_INFOS.equals(field)) { 90 | if (value.startsWith(" ~")) { 91 | value = value.substring(2); 92 | } 93 | v1 = StringUtils.setFieldInConcatString(v1, "\\|", field, oldVal + "~" + value); 94 | } else { 95 | //其余都是int类型,直接加减就可以 96 | int newVal = Integer.parseInt(oldVal) + Integer.parseInt(value); 97 | v1 = StringUtils.setFieldInConcatString(v1, "\\|", field, String.valueOf(newVal)); 98 | } 99 | } 100 | } 101 | return v1; 102 | } 103 | } 104 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/skynet/SpeedSortKey.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.skynet; 2 | 3 | import java.io.Serializable; 4 | 5 | public class SpeedSortKey implements Comparable,Serializable { 6 | /** 7 | * 8 | */ 9 | private static final long serialVersionUID = 1L; 10 | private long lowSpeed; 11 | private long normalSpeed; 12 | private long mediumSpeed; 13 | private long highSpeed; 14 | 15 | public SpeedSortKey() { 16 | super(); 17 | } 18 | 19 | public SpeedSortKey(long lowSpeed, long normalSpeed, long mediumSpeed, long highSpeed) { 20 | super(); 21 | this.lowSpeed = lowSpeed; 22 | this.normalSpeed = normalSpeed; 23 | this.mediumSpeed = mediumSpeed; 24 | this.highSpeed = highSpeed; 25 | } 26 | 27 | @Override 28 | public int compareTo(SpeedSortKey other) { 29 | if(highSpeed - other.getHighSpeed() != 0){ 30 | return (int)(highSpeed - other.getHighSpeed()); 31 | }else if (mediumSpeed - other.getMediumSpeed() != 0) { 32 | return (int)(mediumSpeed - other.getMediumSpeed()); 33 | }else if (normalSpeed - other.getNormalSpeed() != 0) { 34 | return (int)(normalSpeed - other.getNormalSpeed()); 35 | }else if (lowSpeed - other.getLowSpeed() != 0) { 36 | return (int)(lowSpeed - other.getLowSpeed()); 37 | } 38 | return 0; 39 | } 40 | 41 | public long getLowSpeed() { 42 | return lowSpeed; 43 | } 44 | 45 | public void setLowSpeed(long lowSpeed) { 46 | this.lowSpeed = lowSpeed; 47 | } 48 | 49 | public long getNormalSpeed() { 50 | return normalSpeed; 51 | } 52 | 53 | public void setNormalSpeed(long normalSpeed) { 54 | this.normalSpeed = normalSpeed; 55 | } 56 | 57 | public long getMediumSpeed() { 58 | return mediumSpeed; 59 | } 60 | 61 | public void setMediumSpeed(long mediumSpeed) { 62 | this.mediumSpeed = mediumSpeed; 63 | } 64 | 65 | public long getHighSpeed() { 66 | return highSpeed; 67 | } 68 | 69 | public void setHighSpeed(long highSpeed) { 70 | this.highSpeed = highSpeed; 71 | } 72 | 73 | @Override 74 | public String toString() { 75 | return "SpeedSortKey [lowSpeed=" + lowSpeed + ", normalSpeed=" + normalSpeed + ", mediumSpeed=" + mediumSpeed + ", highSpeed=" + highSpeed + "]"; 76 | } 77 | } 78 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/skynet/WithTheCarAnalyze.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.skynet; 2 | 3 | import com.alibaba.fastjson.JSONObject; 4 | import com.traffic.load.data.MockData; 5 | import com.traffic.spark.conf.ConfigurationManager; 6 | import com.traffic.spark.constant.Constants; 7 | import com.traffic.spark.dao.ICarTrackDAO; 8 | import com.traffic.spark.dao.ITaskDAO; 9 | import com.traffic.spark.dao.factory.DAOFactory; 10 | import com.traffic.spark.domain.CarTrack; 11 | import com.traffic.spark.domain.Task; 12 | import com.traffic.spark.util.DateUtils; 13 | import com.traffic.spark.util.ParamUtils; 14 | import com.traffic.spark.util.SparkUtils; 15 | import org.apache.spark.SparkConf; 16 | import org.apache.spark.api.java.JavaPairRDD; 17 | import org.apache.spark.api.java.JavaRDD; 18 | import org.apache.spark.api.java.JavaSparkContext; 19 | import org.apache.spark.api.java.function.PairFunction; 20 | import org.apache.spark.api.java.function.VoidFunction; 21 | import org.apache.spark.sql.Row; 22 | import org.apache.spark.sql.SparkSession; 23 | import scala.Tuple2; 24 | 25 | import java.util.*; 26 | 27 | public class WithTheCarAnalyze { 28 | public static void main(String[] args) { 29 | 30 | /** 31 | * 判断应用程序是否在本地执行 32 | */ 33 | JavaSparkContext sc = null; 34 | SparkSession spark = null; 35 | Boolean onLocal = ConfigurationManager.getBoolean(Constants.SPARK_LOCAL); 36 | 37 | if (onLocal) { 38 | // 构建Spark运行时的环境参数 39 | SparkConf conf = new SparkConf() 40 | .setAppName(Constants.SPARK_APP_NAME) 41 | // .set("spark.sql.shuffle.partitions", "300") 42 | // .set("spark.default.parallelism", "100") 43 | // .set("spark.storage.memoryFraction", "0.5") 44 | // .set("spark.shuffle.consolidateFiles", "true") 45 | // .set("spark.shuffle.file.buffer", "64") 46 | // .set("spark.shuffle.memoryFraction", "0.3") 47 | // .set("spark.reducer.maxSizeInFlight", "96") 48 | // .set("spark.shuffle.io.maxRetries", "60") 49 | // .set("spark.shuffle.io.retryWait", "60") 50 | // .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") 51 | // .registerKryoClasses(new Class[]{SpeedSortKey.class}) 52 | ; 53 | /** 54 | * 设置spark运行时的master 根据配置文件来决定的 55 | */ 56 | conf.setMaster("local"); 57 | sc = new JavaSparkContext(conf); 58 | 59 | spark = SparkSession.builder().getOrCreate(); 60 | /** 61 | * 基于本地测试生成模拟测试数据,如果在集群中运行的话,直接操作Hive中的表就可以 62 | * 本地模拟数据注册成一张临时表 63 | * monitor_flow_action 数据表:监控车流量所有数据 64 | * monitor_camera_info 标准表:卡扣对应摄像头标准表 65 | */ 66 | MockData.mock(sc, spark); 67 | } else { 68 | System.out.println("++++++++++++++++++++++++++++++++++++++开启hive的支持"); 69 | /** 70 | * "SELECT * FROM table1 join table2 ON (连接条件)" 如果某一个表小于20G 他会自动广播出去 71 | * 会将小于spark.sql.autoBroadcastJoinThreshold值(默认为10M)的表广播到executor节点,不走shuffle过程,更加高效。 72 | * 73 | * config("spark.sql.autoBroadcastJoinThreshold", "1048576000"); //单位:字节 74 | */ 75 | spark = SparkSession.builder().config("spark.sql.autoBroadcastJoinThreshold", "1048576000").enableHiveSupport().getOrCreate(); 76 | spark.sql("use traffic"); 77 | } 78 | 79 | // /** 80 | // * 现在要计算的是所有的车的跟踪信息 81 | // * 82 | // *标准是:两个车的时间差在5分钟内就是有跟踪嫌疑 83 | // * table1:car track 1:时间段(精确到5分钟) 2 3 4 5 84 | // * table2:monitor_id 12:00-12:05 cars A B C 85 | // * 12:06-12:10 cars 86 | // */ 87 | // // 构建Spark上下文 88 | // SparkConf conf = new SparkConf() 89 | // .setAppName(Constants.SPARK_APP_NAME) 90 | // .set("spark.sql.shuffle.partitions", "10") 91 | // .set("spark.default.parallelism", "100") 92 | //// .set("spark.storage.memoryFraction", "0.5") 93 | //// .set("spark.shuffle.consolidateFiles", "true") 94 | //// .set("spark.shuffle.file.buffer", "64") 95 | //// .set("spark.shuffle.memoryFraction", "0.3") 96 | //// .set("spark.reducer.maxSizeInFlight", "24") 97 | //// .set("spark.shuffle.io.maxRetries", "60") 98 | //// .set("spark.shuffle.io.retryWait", "60") 99 | //// .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") 100 | // ; 101 | // SparkUtils.setMaster(conf); 102 | // 103 | // JavaSparkContext sc = new JavaSparkContext(conf); 104 | // 105 | // SparkSession spark = SparkUtils.getSQLContext(sc); 106 | 107 | /** 108 | * 基于本地测试生成模拟测试数据,如果在集群中运行的话,直接操作Hive中的临时表就可以 109 | * 本地模拟数据注册成一张临时表 110 | * monitor_flow_action 111 | */ 112 | SparkUtils.mockData(sc, spark); 113 | 114 | 115 | //从配置文件中查询出来指定的任务ID 116 | long taskId = ParamUtils.getTaskIdFromArgs(args, Constants.SPARK_LOCAL_WITH_THE_CAR); 117 | 118 | /** 119 | * 通过taskId从数据库中查询相应的参数 120 | * 1、通过DAOFactory工厂类创建出TaskDAO组件 121 | * 2、查询task 122 | */ 123 | ITaskDAO taskDAO = DAOFactory.getTaskDAO(); 124 | Task task = taskDAO.findTaskById(taskId); 125 | 126 | if (task == null) { 127 | return; 128 | } 129 | 130 | /** 131 | * task对象已经获取到,因为params是一个json,所以需要创建一个解析json的对象 132 | */ 133 | JSONObject taskParamsJsonObject = JSONObject.parseObject(task.getTaskParams()); 134 | 135 | /** 136 | * 统计出指定时间内的车辆信息 137 | */ 138 | JavaRDD cameraRDD = SparkUtils.getCameraRDDByDateRange(spark, taskParamsJsonObject); 139 | 140 | withTheCarAnalyze(taskId, sc, cameraRDD); 141 | sc.close(); 142 | } 143 | 144 | private static void withTheCarAnalyze(final long taskId, JavaSparkContext sc, JavaRDD cameraRDD) { 145 | 146 | /** 147 | * trackWithActionTimeRDD 148 | * k: car 149 | * v:monitor_id+"="+action_time 150 | */ 151 | JavaPairRDD trackWithActionTimeRDD = getCarTrack(cameraRDD); 152 | /** 153 | * 所有车辆轨迹存储在MySQL中,测试只是放入到MySQL 实际情况是在Redis中 154 | * 155 | * 156 | * 157 | * car monitor:actionTime 158 | * 159 | * monitor_id 时间段(actionTime) 160 | */ 161 | trackWithActionTimeRDD.foreachPartition(new VoidFunction>>() { 162 | /** 163 | * 164 | */ 165 | private static final long serialVersionUID = 1L; 166 | 167 | @Override 168 | public void call(Iterator> iterator) throws Exception { 169 | 170 | List carTracks = new ArrayList<>(); 171 | 172 | while (iterator.hasNext()) { 173 | Tuple2 tuple = iterator.next(); 174 | String car = tuple._1; 175 | String timeAndTack = tuple._2; 176 | //carTrackWithTime.append("|" + row.getString(1)+"="+row.getString(4)) 177 | //car , monitor_id,actionTime 178 | carTracks.add(new CarTrack(taskId, DateUtils.getTodayDate(), car, timeAndTack)); 179 | } 180 | ICarTrackDAO carTrackDAO = DAOFactory.getCarTrackDAO(); 181 | /** 182 | * 将数据插入到表car_track中 183 | */ 184 | carTrackDAO.insertBatchCarTrack(carTracks); 185 | } 186 | }); 187 | 188 | /*List> car2Track = trackWithActionTimeRDD.collect(); 189 | final Broadcast>> car2TrackBroadcast = sc.broadcast(car2Track); 190 | 191 | trackWithActionTimeRDD.mapToPair(new PairFunction, String,String>() { 192 | *//** 193 | * 194 | *//* 195 | private static final long serialVersionUID = 1L; 196 | 197 | @Override 198 | public Tuple2 call(Tuple2 tuple) throws Exception { 199 | List> car2Tracks = car2TrackBroadcast.value(); 200 | return null; 201 | } 202 | });*/ 203 | 204 | /** 205 | * 卡口号 时间段(粒度至5分钟) 车牌集合 206 | * 具体查看每一个卡口在每一个时间段内车辆的数量。 207 | * 实现思路: 208 | * 按照卡口进行聚合 209 | *//* 210 | 211 | cameraRDD.mapToPair(new PairFunction() { 212 | *//** 213 | * 214 | *//* 215 | private static final long serialVersionUID = 1L; 216 | 217 | @Override 218 | public Tuple2 call(Row row) throws Exception { 219 | return new Tuple2(row.getString(1), row); 220 | } 221 | }).groupByKey().foreach(new VoidFunction>>() { 222 | 223 | *//** 224 | * 225 | *//* 226 | private static final long serialVersionUID = 1L; 227 | 228 | @Override 229 | public void call(Tuple2> tuple) throws Exception { 230 | String monitor = tuple._1; 231 | Iterator rowIterator = tuple._2.iterator(); 232 | List rows = new ArrayList<>(); 233 | while (rowIterator.hasNext()) { 234 | Row row = rowIterator.next(); 235 | rows.add(row); 236 | } 237 | 238 | 239 | 240 | } 241 | }); 242 | */ 243 | 244 | 245 | } 246 | 247 | private static JavaPairRDD getCarTrack(JavaRDD cameraRDD) { 248 | return cameraRDD.mapToPair(new PairFunction() { 249 | 250 | /** 251 | * 252 | */ 253 | private static final long serialVersionUID = 1L; 254 | 255 | @Override 256 | public Tuple2 call(Row row) throws Exception { 257 | return new Tuple2(row.getString(3), row); 258 | } 259 | }).groupByKey().mapToPair(new PairFunction>, String, String>() { 260 | 261 | /** 262 | * 263 | */ 264 | private static final long serialVersionUID = 1L; 265 | 266 | @Override 267 | public Tuple2 call(Tuple2> t) throws Exception { 268 | String car = t._1; 269 | Iterator iterator = t._2.iterator(); 270 | List rows = new ArrayList<>(); 271 | while (iterator.hasNext()) { 272 | Row row = iterator.next(); 273 | rows.add(row); 274 | } 275 | Collections.sort(rows, new Comparator() { 276 | 277 | 278 | @Override 279 | public int compare(Row r1, Row r2) { 280 | String actionTime1 = r1.getString(4); 281 | String actionTime2 = r2.getString(4); 282 | 283 | if (DateUtils.before(actionTime1, actionTime2)) { 284 | return -1; 285 | } 286 | return 1; 287 | } 288 | }); 289 | 290 | StringBuilder carTrackWithTime = new StringBuilder(); 291 | for (Row row : rows) { 292 | carTrackWithTime.append("|" + row.getString(1) + "=" + row.getString(4)); 293 | } 294 | String trackWithTime = ""; 295 | if (!"".equals(carTrackWithTime.toString())) { 296 | trackWithTime = carTrackWithTime.toString().substring(1); 297 | } 298 | 299 | return new Tuple2(car, trackWithTime); 300 | } 301 | }); 302 | } 303 | 304 | 305 | } 306 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/util/DateUtils.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.util; 2 | 3 | import java.text.ParseException; 4 | import java.text.SimpleDateFormat; 5 | import java.util.Calendar; 6 | import java.util.Date; 7 | 8 | /** 9 | * 日期时间工具类 10 | * 11 | * @author Administrator 12 | */ 13 | public class DateUtils { 14 | 15 | public static final SimpleDateFormat TIME_FORMAT = 16 | new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); 17 | public static final SimpleDateFormat DATE_FORMAT = 18 | new SimpleDateFormat("yyyy-MM-dd"); 19 | public static final SimpleDateFormat DATEKEY_FORMAT = 20 | new SimpleDateFormat("yyyyMMdd"); 21 | 22 | /** 23 | * 判断一个时间是否在另一个时间之前 24 | * 25 | * @param time1 第一个时间 26 | * @param time2 第二个时间 27 | * @return 判断结果 28 | */ 29 | public static boolean before(String time1, String time2) { 30 | try { 31 | Date dateTime1 = TIME_FORMAT.parse(time1); 32 | Date dateTime2 = TIME_FORMAT.parse(time2); 33 | 34 | if (dateTime1.before(dateTime2)) { 35 | return true; 36 | } 37 | } catch (Exception e) { 38 | e.printStackTrace(); 39 | } 40 | return false; 41 | } 42 | 43 | /** 44 | * 判断一个时间是否在另一个时间之后 45 | * 46 | * @param time1 第一个时间 47 | * @param time2 第二个时间 48 | * @return 判断结果 49 | */ 50 | public static boolean after(String time1, String time2) { 51 | try { 52 | Date dateTime1 = TIME_FORMAT.parse(time1); 53 | Date dateTime2 = TIME_FORMAT.parse(time2); 54 | 55 | if (dateTime1.after(dateTime2)) { 56 | return true; 57 | } 58 | } catch (Exception e) { 59 | e.printStackTrace(); 60 | } 61 | return false; 62 | } 63 | 64 | /** 65 | * 计算时间差值(单位为秒) 66 | * 67 | * @param time1 时间1 68 | * @param time2 时间2 69 | * @return 差值 70 | */ 71 | public static int minus(String time1, String time2) { 72 | try { 73 | Date datetime1 = TIME_FORMAT.parse(time1); 74 | Date datetime2 = TIME_FORMAT.parse(time2); 75 | 76 | long millisecond = datetime1.getTime() - datetime2.getTime(); 77 | 78 | return Integer.valueOf(String.valueOf(millisecond / 1000)); 79 | } catch (Exception e) { 80 | e.printStackTrace(); 81 | } 82 | return 0; 83 | } 84 | 85 | /** 86 | * 获取年月日和小时 87 | * 88 | * @param datetime 时间(yyyy-MM-dd HH:mm:ss) 89 | * @return 结果(yyyy-MM-dd_HH) 90 | */ 91 | public static String getDateHour(String datetime) { 92 | String date = datetime.split(" ")[0]; 93 | String hourMinuteSecond = datetime.split(" ")[1]; 94 | String hour = hourMinuteSecond.split(":")[0]; 95 | return date + "_" + hour; 96 | } 97 | 98 | /** 99 | * 获取当天日期(yyyy-MM-dd) 100 | * 101 | * @return 当天日期 102 | */ 103 | public static String getTodayDate() { 104 | return DATE_FORMAT.format(new Date()); 105 | } 106 | 107 | /** 108 | * 获取昨天的日期(yyyy-MM-dd) 109 | * 110 | * @return 昨天的日期 111 | */ 112 | public static String getYesterdayDate() { 113 | Calendar cal = Calendar.getInstance(); 114 | cal.setTime(new Date()); 115 | cal.add(Calendar.DAY_OF_YEAR, -1); 116 | 117 | Date date = cal.getTime(); 118 | 119 | return DATE_FORMAT.format(date); 120 | } 121 | 122 | /** 123 | * 格式化日期(yyyy-MM-dd) 124 | * 125 | * @param date Date对象 126 | * @return 格式化后的日期 127 | */ 128 | public static String formatDate(Date date) { 129 | return DATE_FORMAT.format(date); 130 | } 131 | 132 | /** 133 | * 格式化时间(yyyy-MM-dd HH:mm:ss) 134 | * 135 | * @param date Date对象 136 | * @return 格式化后的时间 137 | */ 138 | public static String formatTime(Date date) { 139 | return TIME_FORMAT.format(date); 140 | } 141 | 142 | /** 143 | * 解析时间字符串 144 | * 145 | * @param time 时间字符串 146 | * @return Date 147 | */ 148 | public static Date parseTime(String time) { 149 | try { 150 | return TIME_FORMAT.parse(time); 151 | } catch (ParseException e) { 152 | e.printStackTrace(); 153 | } 154 | return null; 155 | } 156 | 157 | /** 158 | * 格式化日期key 159 | * 160 | * @param date 161 | * @return yyyyMMdd 162 | */ 163 | public static String formatDateKey(Date date) { 164 | return DATEKEY_FORMAT.format(date); 165 | } 166 | 167 | /** 168 | * 格式化日期key (yyyyMMdd) 169 | * 170 | * @param datekey 171 | * @return 172 | */ 173 | public static Date parseDateKey(String datekey) { 174 | try { 175 | return DATEKEY_FORMAT.parse(datekey); 176 | } catch (ParseException e) { 177 | e.printStackTrace(); 178 | } 179 | return null; 180 | } 181 | 182 | /** 183 | * 格式化时间,保留到分钟级别 184 | * 185 | * @param date 186 | * @return yyyyMMddHHmm --201701012301 187 | */ 188 | public static String formatTimeMinute(Date date) { 189 | SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmm"); 190 | return sdf.format(date); 191 | } 192 | 193 | /** 194 | * @param dateTime yyyy-MM-dd HH:mm:ss 195 | * @return 196 | */ 197 | public static String getRangeTime(String dateTime) { 198 | String date = dateTime.split(" ")[0]; 199 | String hour = dateTime.split(" ")[1].split(":")[0]; 200 | int minute = StringUtils.convertStringtoInt(dateTime.split(" ")[1].split(":")[1]); 201 | // String second = dateTime.split(" ")[1].split(":")[2]; 202 | if (minute + (5 - minute % 5) == 60) { 203 | return date + " " + hour + ":" + StringUtils.fulFuill((minute - (minute % 5)) + "") + "~" + date + " " + StringUtils.fulFuill((Integer.parseInt(hour) + 1) + "") + ":00"; 204 | } 205 | return date + " " + hour + ":" + StringUtils.fulFuill((minute - (minute % 5)) + "") + "~" + date + " " + hour + ":" + StringUtils.fulFuill((minute + (5 - minute % 5)) + ""); 206 | } 207 | } 208 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/util/NumberUtils.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.util; 2 | 3 | import java.math.BigDecimal; 4 | 5 | /** 6 | * 数字格工具类 7 | * @author Administrator 8 | * 9 | */ 10 | public class NumberUtils { 11 | 12 | /** 13 | * 格式化小数 14 | * @param num 字符串? 15 | * @param scale 四舍五入的位数 16 | * @return 格式化小数 17 | */ 18 | public static double formatDouble(double num, int scale) { 19 | BigDecimal bd = new BigDecimal(num); 20 | return bd.setScale(scale, BigDecimal.ROUND_HALF_UP).doubleValue(); 21 | } 22 | 23 | } 24 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/util/ParamUtils.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.util; 2 | 3 | import com.alibaba.fastjson.JSONArray; 4 | import com.alibaba.fastjson.JSONObject; 5 | import com.traffic.spark.conf.ConfigurationManager; 6 | import com.traffic.spark.constant.Constants; 7 | 8 | /** 9 | * 参数工具类 10 | * 11 | * @author Administrator 12 | */ 13 | public class ParamUtils { 14 | 15 | /** 16 | * 从命令行参数中提取任务id 17 | * 18 | * @param args 命令行参数 19 | * @param taskType 参数类型(任务id对应的值是Long类型才可以),对应my.properties中的key 20 | * @return 任务id 21 | * spark.local.taskId.monitorFlow 22 | */ 23 | public static Long getTaskIdFromArgs(String[] args, String taskType) { 24 | boolean local = ConfigurationManager.getBoolean(Constants.SPARK_LOCAL); 25 | 26 | if (local) { 27 | return ConfigurationManager.getLong(taskType); 28 | } else { 29 | try { 30 | if (args != null && args.length > 0) { 31 | return Long.valueOf(args[0]); 32 | } else { 33 | System.out.println("集群提交任务,需要参数"); 34 | } 35 | } catch (Exception e) { 36 | e.printStackTrace(); 37 | } 38 | } 39 | return 0L; 40 | } 41 | 42 | /** 43 | * 从JSON对象中提取参数 44 | * 45 | * @param jsonObject JSON对象 46 | * @return 参数 47 | * {"name":"zhangsan","age":"18"} 48 | */ 49 | public static String getParam(JSONObject jsonObject, String field) { 50 | JSONArray jsonArray = jsonObject.getJSONArray(field); 51 | if (jsonArray != null && jsonArray.size() > 0) { 52 | return jsonArray.getString(0); 53 | } 54 | return null; 55 | } 56 | 57 | } 58 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/util/SparkUtils.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.util; 2 | 3 | import com.alibaba.fastjson.JSONObject; 4 | import com.traffic.load.data.MockData; 5 | import com.traffic.spark.conf.ConfigurationManager; 6 | import com.traffic.spark.constant.Constants; 7 | import org.apache.spark.api.java.JavaRDD; 8 | import org.apache.spark.api.java.JavaSparkContext; 9 | import org.apache.spark.sql.Dataset; 10 | import org.apache.spark.sql.Row; 11 | import org.apache.spark.sql.SparkSession; 12 | 13 | 14 | /** 15 | * Spark工具类 16 | */ 17 | public class SparkUtils { 18 | 19 | /** 20 | * 根据当前是否本地测试的配置,决定 如何设置SparkConf的master 21 | */ 22 | // public static void setMaster(SparkConf conf) { 23 | // boolean local = ConfigurationManager.getBoolean(Constants.SPARK_LOCAL); 24 | // if(local) { 25 | // conf.setMaster("local"); 26 | // } 27 | // } 28 | 29 | /** 30 | * 获取SQLContext 31 | * 如果spark.local设置为true,那么就创建SQLContext;否则,创建HiveContext 32 | */ 33 | // public static SparkSession getSQLContext(JavaSparkContext sc) { 34 | // boolean local = ConfigurationManager.getBoolean(Constants.SPARK_LOCAL); 35 | // if(local) { 36 | // //如果SparkContext已经存在,SparkSession就会重用它,如果不存在,Spark就会创建一个新的SparkContext 37 | // SparkSession.builder().getOrCreate(); 38 | // return SparkSession.builder().getOrCreate(); 39 | // } else { 40 | // System.out.println("++++++++++++++++++++++++++++++++++++++开启hive的支持"); 41 | // /** 42 | // * "SELECT * FROM table1 join table2 ON (连接条件)" 如果某一个表小于20G 他会自动广播出去 43 | // * 会将小于spark.sql.autoBroadcastJoinThreshold值(默认为10M)的表广播到executor节点,不走shuffle过程,更加高效。 44 | // * 45 | // * config("spark.sql.autoBroadcastJoinThreshold", "1048576000"); //单位:字节 46 | // */ 47 | // return SparkSession.builder().config("spark.sql.autoBroadcastJoinThreshold", "1048576000").enableHiveSupport().getOrCreate(); 48 | // } 49 | // } 50 | 51 | /** 52 | * 生成模拟数据 53 | * 如果spark.local配置设置为true,则生成模拟数据;否则不生成 54 | */ 55 | public static void mockData(JavaSparkContext sc, SparkSession spark) { 56 | boolean local = ConfigurationManager.getBoolean(Constants.SPARK_LOCAL); 57 | /** 58 | * 如何local为true 说明在本地测试 应该生产模拟数据 RDD-》DataFrame-->注册成临时表0 59 | * false HiveContext 直接可以操作hive表 60 | */ 61 | if (local) { 62 | MockData.mock(sc, spark); 63 | } 64 | } 65 | 66 | /** 67 | * 获取指定日期范围内的卡口信息 68 | */ 69 | public static JavaRDD getCameraRDDByDateRange(SparkSession spark, JSONObject taskParamsJsonObject) { 70 | String startDate = ParamUtils.getParam(taskParamsJsonObject, Constants.PARAM_START_DATE); 71 | String endDate = ParamUtils.getParam(taskParamsJsonObject, Constants.PARAM_END_DATE); 72 | String sql = 73 | "SELECT * " 74 | + "FROM monitor_flow_action " 75 | + "WHERE date>='" + startDate + "' " 76 | + "AND date<='" + endDate + "'"; 77 | 78 | Dataset monitorDF = spark.sql(sql); 79 | /** 80 | * repartition可以提高stage的并行度 81 | */ 82 | // return actionDF.javaRDD().repartition(1000); 83 | return monitorDF.javaRDD(); 84 | } 85 | 86 | /** 87 | * 获取指定日期内出现指定车辆的卡扣信息 88 | */ 89 | public static JavaRDD getCameraRDDByDateRangeAndCars(SparkSession spark, JSONObject taskParamsJsonObject) { 90 | String startDate = ParamUtils.getParam(taskParamsJsonObject, Constants.PARAM_START_DATE); 91 | String endDate = ParamUtils.getParam(taskParamsJsonObject, Constants.PARAM_END_DATE); 92 | String cars = ParamUtils.getParam(taskParamsJsonObject, Constants.FIELD_CARS); 93 | String[] carArr = cars.split(","); 94 | String sql = 95 | "SELECT * " 96 | + "FROM monitor_flow_action " 97 | + "WHERE date>='" + startDate + "' " 98 | + "AND date<='" + endDate + "' " 99 | + "AND car IN ("; 100 | 101 | for (int i = 0; i < carArr.length; i++) { 102 | sql += "'" + carArr[i] + "'"; 103 | if (i < carArr.length - 1) { 104 | sql += ","; 105 | } 106 | } 107 | sql += ")"; 108 | 109 | System.out.println("sql:" + sql); 110 | Dataset monitorDF = spark.sql(sql); 111 | 112 | /** 113 | * repartition可以提高stage的并行度 114 | */ 115 | // return actionDF.javaRDD().repartition(1000); 116 | 117 | return monitorDF.javaRDD(); 118 | } 119 | 120 | /*****************************************************/ 121 | /** 122 | * 获取指定日期范围和指定区域范围内的卡口信息 123 | */ 124 | public static JavaRDD getCameraRDDByDateRangeAndArea(SparkSession spark, JSONObject taskParamsJsonObject, String a) { 125 | String startDate = ParamUtils.getParam(taskParamsJsonObject, Constants.PARAM_START_DATE); 126 | String endDate = ParamUtils.getParam(taskParamsJsonObject, Constants.PARAM_END_DATE); 127 | 128 | String sql = 129 | "SELECT * " 130 | + "FROM monitor_flow_action " 131 | + "WHERE date>='" + startDate + "' " 132 | + "AND date<='" + endDate + "'" 133 | + "AND area_id in ('" + a + "')"; 134 | Dataset monitorDF = spark.sql(sql); 135 | monitorDF.show(); 136 | /** 137 | * repartition可以提高stage的并行度 138 | */ 139 | // return actionDF.javaRDD().repartition(1000); 140 | return monitorDF.javaRDD(); 141 | } 142 | 143 | } 144 | -------------------------------------------------------------------------------- /src/main/java/com/traffic/spark/util/StringUtils.java: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.util; 2 | 3 | import java.util.HashMap; 4 | import java.util.Map; 5 | 6 | /** 7 | * 字符串工具类 8 | * 9 | * @author Administrator 10 | */ 11 | public class StringUtils { 12 | 13 | /** 14 | * 判断字符串是否为空 15 | * 16 | * @param str 字符串 17 | * @return 是否为空 18 | */ 19 | public static boolean isEmpty(String str) { 20 | return str == null || "".equals(str); 21 | } 22 | 23 | /** 24 | * 判断字符串是否不为空 25 | * 26 | * @param str 字符串 27 | * @return 是否不为空 28 | */ 29 | public static boolean isNotEmpty(String str) { 30 | return str != null && !"".equals(str); 31 | } 32 | 33 | /** 34 | * 截断字符串两侧的逗号 35 | * 36 | * @param str 字符串 37 | * @return 字符串 38 | */ 39 | public static String trimComma(String str) { 40 | if (str.startsWith(",")) { 41 | str = str.substring(1); 42 | } 43 | if (str.endsWith(",")) { 44 | str = str.substring(0, str.length() - 1); 45 | } 46 | return str; 47 | } 48 | 49 | /** 50 | * 补全两位数字 51 | * 52 | * @param str 53 | * @return 54 | */ 55 | public static String fulFuill(String str) { 56 | if (str.length() == 1) 57 | return "0" + str; 58 | return str; 59 | } 60 | 61 | 62 | /** 63 | * 补全num位数字 64 | * 将给定的字符串前面补0,使字符串的长度为num位。 65 | * 66 | * @param str 67 | * @return 68 | */ 69 | public static String fulFuill(int num, String str) { 70 | if (str.length() == num) { 71 | return str; 72 | } else { 73 | int fulNum = num - str.length(); 74 | String tmpStr = ""; 75 | for (int i = 0; i < fulNum; i++) { 76 | tmpStr += "0"; 77 | } 78 | return tmpStr + str; 79 | } 80 | } 81 | 82 | 83 | /** 84 | * 从拼接的字符串中提取字段 85 | * 86 | * @param str 字符串 87 | * @param delimiter 分隔符 88 | * @param field 字段 89 | * @return 字段值 90 | * @example name=zhangsan|age=18 91 | */ 92 | public static String getFieldFromConcatString(String str, String delimiter, String field) { 93 | try { 94 | String[] fields = str.split(delimiter); 95 | for (String concatField : fields) { 96 | // searchKeywords=|clickCategoryIds=1,2,3 97 | if (concatField.split("=").length == 2) { 98 | String fieldName = concatField.split("=")[0]; 99 | String fieldValue = concatField.split("=")[1]; 100 | if (fieldName.equals(field)) { 101 | return fieldValue; 102 | } 103 | } 104 | } 105 | } catch (Exception e) { 106 | e.printStackTrace(); 107 | } 108 | return null; 109 | } 110 | 111 | public static void main(String[] args) { 112 | // System.out.println(getFieldFromConcatString("name=zhangsan|age=18","\\|","age")); 113 | System.out.println(setFieldInConcatString("name=zhangsan|age=12", "\\|", "name", "lisi")); 114 | // Map keyValuesFromConcatString = getKeyValuesFromConcatString("name=lisi","\\|"); 115 | // Set> entrySet = keyValuesFromConcatString.entrySet(); 116 | // for(Entry entry : entrySet) { 117 | // System.out.println("key = "+entry.getKey()+",value = "+entry.getValue()); 118 | // } 119 | } 120 | 121 | /** 122 | * 从拼接的字符串中给字段设置值 123 | * 124 | * @param str 字符串 125 | * @param delimiter 分隔符 126 | * @param field 字段名 127 | * @param newFieldValue 新的field值 128 | * @return 字段值 129 | * name=zhangsan|age=12 130 | */ 131 | public static String setFieldInConcatString(String str, 132 | String delimiter, 133 | String field, 134 | String newFieldValue) { 135 | 136 | // searchKeywords=iphone7|clickCategoryIds=1,2,3 137 | 138 | String[] fields = str.split(delimiter); 139 | 140 | for (int i = 0; i < fields.length; i++) { 141 | String fieldName = fields[i].split("=")[0]; 142 | if (fieldName.equals(field)) { 143 | String concatField = fieldName + "=" + newFieldValue; 144 | fields[i] = concatField; 145 | break; 146 | } 147 | } 148 | 149 | StringBuffer buffer = new StringBuffer(""); 150 | for (int i = 0; i < fields.length; i++) { 151 | buffer.append(fields[i]); 152 | if (i < fields.length - 1) { 153 | buffer.append("|"); 154 | } 155 | } 156 | 157 | return buffer.toString(); 158 | } 159 | 160 | /** 161 | * 给定字符串和分隔符,返回一个K,V map 162 | * name=zhangsan|age=18 163 | * 164 | * @param str 165 | * @param delimiter 166 | * @return map 167 | */ 168 | public static Map getKeyValuesFromConcatString(String str, String delimiter) { 169 | Map map = new HashMap<>(); 170 | try { 171 | String[] fields = str.split(delimiter); 172 | for (String concatField : fields) { 173 | // searchKeywords=|clickCategoryIds=1,2,3 174 | if (concatField.split("=").length == 2) { 175 | String fieldName = concatField.split("=")[0]; 176 | String fieldValue = concatField.split("=")[1]; 177 | map.put(fieldName, fieldValue); 178 | } 179 | } 180 | return map; 181 | } catch (Exception e) { 182 | e.printStackTrace(); 183 | } 184 | return null; 185 | } 186 | 187 | /** 188 | * String 字符串转Integer数字 189 | * 190 | * @param str 191 | * @return 192 | */ 193 | public static Integer convertStringtoInt(String str) { 194 | try { 195 | return Integer.parseInt(str); 196 | } catch (Exception e) { 197 | e.printStackTrace(); 198 | } 199 | return null; 200 | 201 | } 202 | 203 | } 204 | -------------------------------------------------------------------------------- /src/main/resources/my.properties: -------------------------------------------------------------------------------- 1 | jdbc.driver=com.mysql.jdbc.Driver 2 | jdbc.datasource.size=10 3 | 4 | jdbc.url=jdbc:mysql://node02:3306/traffic 5 | jdbc.user=root 6 | jdbc.password=123456 7 | 8 | jdbc.url.prod=jdbc:mysql://node02:3306/traffic 9 | jdbc.user.prod=root 10 | jdbc.password.prod=123456 11 | 12 | spark.local=false 13 | spark.local.taskId.monitorFlow=1 14 | spark.local.taskId.extractCar=2 15 | spark.local.taskId.withTheCar=3 16 | spark.local.taskid.tpn.road.flow=4 17 | spark.local.taskid.road.one.step.convert=5 18 | 19 | kafka.metadata.broker.list=node01:9092,node02:9092,node03:9092 20 | kafka.topics=RoadRealTimeLog 21 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/spark/areaRoadFlow/AreaTop3RoadFlowAnalyzeScala1.scala: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.areaRoadFlow 2 | 3 | import java.lang 4 | 5 | import com.alibaba.fastjson.{JSON, JSONObject} 6 | import com.traffic.spark.conf.ConfigurationManager 7 | import com.traffic.spark.constant.Constants 8 | import com.traffic.spark.dao.ITaskDAO 9 | import com.traffic.spark.dao.factory.DAOFactory 10 | import com.traffic.spark.domain.Task 11 | import com.traffic.spark.util.ParamUtils 12 | import com.com.bjsxt.spark.util.SparkUtilsScala 13 | import com.spark.test.MockDataByMysql 14 | import org.apache.spark.SparkContext 15 | import org.apache.spark.broadcast.Broadcast 16 | import org.apache.spark.rdd.RDD 17 | import org.apache.spark.sql.{DataFrame, Row, SparkSession} 18 | 19 | import scala.collection.mutable.ListBuffer 20 | 21 | /** 22 | * !!!!!!!!!!!!!!!MYSQL数据库首先要有数据!!!!!!!!!!!!!!!!!!!!! 23 | * !!!!!次方法生成视图的!!!!!!!!!!MockDataByMysql.MockData(sc, ssc) 24 | * 采用传统rdd方式 25 | * 26 | * 计算出每一个区域top3的道路流量 27 | * 每一个区域车流量最多的3条道路 每条道路有多个卡扣 28 | * 29 | * @author root 30 | * ./spark-submit 31 | * --master spark://node1:7077 32 | * --jars ../lib/mysql-connector-java-5.1.6.jar,../lib/fastjson-1.2.11.jar 33 | * --driver-class-path ../lib/mysql-connector-java-5.1.6.jar:../lib/fastjson-1.2.11.jar 34 | * ../lib/Test.jar 4 35 | * 36 | * 这是一个分组取topN SparkSQL分组取topN 37 | * 区域,道路流量排序 按照区域和道路进行分组 38 | * 39 | */ 40 | object AreaTop3RoadFlowAnalyzeScala1 { 41 | def main(args: Array[String]): Unit = { 42 | var sc: SparkContext = null 43 | var ssc: SparkSession = null 44 | /** 45 | * 判断应用程序是否在本地执行 46 | */ 47 | val onLocal: Boolean = ConfigurationManager.getBoolean(Constants.SPARK_LOCAL) 48 | 49 | /** 50 | * 本地Mysql生成视图或者从Hive查询 51 | */ 52 | if (onLocal) { 53 | //这里不会真正的创建SparkSession,而是根据前面这个SparkContext来获取封装SparkSession,因为不会创建存在两个SparkContext的。 54 | ssc = SparkSession.builder().master("local").appName("test").getOrCreate() 55 | // val ssss = ssc 56 | // import ssss.implicits._ 57 | sc = ssc.sparkContext 58 | 59 | /** 60 | * !!!!!!!!!!!基于本地测试采用Mysql数据!!!!!!!!!!!!!! 61 | * 如果在集群中运行的话,直接操作Hive中的表就可以 62 | * 本地模拟数据注册成一张临时表 63 | * monitor_flow_action 数据表:监控车流量所有数据 64 | * monitor_camera_info 标准表:卡扣对应摄像头标准表 65 | * area_info 区域表:区域对应的区域名称 66 | */ 67 | MockDataByMysql.MockData(sc, ssc) 68 | } else { 69 | println("++++++++++++开启hive的支持+++++++++++++++++") 70 | ssc = SparkSession.builder().appName(Constants.SPARK_APP_NAME).enableHiveSupport().getOrCreate() 71 | sc = ssc.sparkContext 72 | ssc.sql("usr traffic") 73 | } 74 | sc.setLogLevel("ERROR") 75 | 76 | /** 77 | * 采用的是task5 的时间日期,反正task4只要时间就行 78 | */ 79 | val taskId: lang.Long = ParamUtils.getTaskIdFromArgs(args, Constants.SPARK_LOCAL_TASKID_MONITOR_ONE_STEP_CONVERT) 80 | val taskDAO: ITaskDAO = DAOFactory.getTaskDAO 81 | val task: Task = taskDAO.findTaskById(taskId) 82 | if (task == null) { 83 | println("没有当前taskID对应的对象") 84 | System.exit(1) 85 | } 86 | val parms: JSONObject = JSON.parseObject(task.getTaskParams()) 87 | 88 | /** 89 | * 通过params(json字符串)查询monitor_flow_action 90 | * 获取指定日期内检测的monitor_flow_action中车流量数据,返回RDD 91 | */ 92 | val cameraDF: DataFrame = SparkUtilsScala.getCameraRDDByDateRange(ssc, parms) 93 | //转换为RDD 94 | val cameraRdd: RDD[Row] = cameraDF.rdd 95 | cameraRdd.cache() 96 | 97 | /** 98 | * 查询area_action 99 | * 获取指定日期内检测的area_info中的数据,返回RDD 100 | */ 101 | val areaDF: DataFrame = SparkUtilsScala.getAreaRDDByDateRange(ssc) 102 | val areaRdd: RDD[Row] = areaDF.rdd 103 | areaRdd.cache() 104 | 105 | /** 106 | * 将areaRdd 转换为 (area_id,area_name)格式的 rdd广播出去 107 | */ 108 | val map: collection.Map[String, String] = areaRdd.map(row => { 109 | (row.getAs[String]("area_id"), row.getAs[String]("area_name")) 110 | }).collectAsMap() 111 | val areaIdNameMap: Broadcast[collection.Map[String, String]] = sc.broadcast(map) 112 | 113 | /** 114 | * ------------------------统计每个区域经过车辆数最多的top3道路及通过的车辆数 115 | */ 116 | 117 | /** 118 | * 首先转把row换成 (区域-道路,1)类型的rdd 119 | */ 120 | val areaRoadRdd: RDD[(String, Int)] = cameraRdd.map(row => { 121 | val areaId: String = row.getAs[String]("area_id") 122 | val roadId: String = row.getAs[String]("road_id") 123 | (areaId + "-" + roadId, 1) 124 | }) 125 | /** 126 | * 通过区域和道路排序 127 | */ 128 | val areaRoadSortRdd: RDD[(String, Int)] = areaRoadRdd.reduceByKey((_ + _)).sortBy(_._2, false) 129 | 130 | /** 131 | * 转换成 (区域,(道路,车辆数)) 类型的rdd 132 | */ 133 | val tempRdd: RDD[(String, (String, Int))] = areaRoadSortRdd.map(tp => { 134 | val areaRoad: String = tp._1 135 | val area: String = areaRoad.split("-")(0) 136 | val road: String = areaRoad.split("-")(1) 137 | (area, (road, tp._2)) 138 | }) 139 | 140 | /** 141 | * 统计每个区域经过车辆数最多的top3道路及通过的车辆数 取出前三即可 142 | */ 143 | val tempResult: RDD[(String, Int)] = tempRdd.groupByKey().flatMap(tp => { 144 | var top = 0 145 | //取出拍好序的城区和道路 146 | val list: List[(String, Int)] = tp._2.iterator.toList 147 | //创建返回对象 148 | val returnList: ListBuffer[(String, Int)] = ListBuffer[(String, Int)]() 149 | for (value <- list) { 150 | if (top < 3) { 151 | val key: String = tp._1.toString + "-" + value._1 152 | returnList.append((key, value._2)) 153 | top += 1 154 | } 155 | } 156 | returnList.iterator 157 | }) 158 | /** 159 | * 将 前三的 (areaId-roadId,count) 广播出去 160 | */ 161 | val broadcastResult: collection.Map[String, Int] = tempResult.collectAsMap() 162 | val broadcastResultMap: Broadcast[collection.Map[String, Int]] = sc.broadcast(broadcastResult) 163 | /** 164 | * 分组后转换成最终结果,带上区域名字 165 | */ 166 | val result: RDD[(String, Int)] = tempResult.map(tp => { 167 | val areaIdName: collection.Map[String, String] = areaIdNameMap.value 168 | var tempArea: String = tp._1.split("-")(0) 169 | val tempRoad: String = tp._1.split("-")(1) 170 | if (areaIdName.contains(tempArea)) { 171 | tempArea = areaIdName.get(tempArea).get 172 | } 173 | (tempArea + "-" + tempRoad, tp._2) 174 | }) 175 | 176 | /** 177 | * 统计每个区域经过车辆数最多的top3道路及通过的车辆数 取出前三即可 178 | */ 179 | println("------------------------统计每个区域经过车辆数最多的top3道路及通过的车辆数") 180 | result.foreach(println) 181 | 182 | 183 | /** 184 | * ----------------------------统计每个区域经过车辆数最多的top3道路及通过的车辆数、当前道路下每个卡扣经过的车辆数 185 | */ 186 | 187 | /** 188 | * 将row中的值转换为(areaId-roadId-monitorId,row) 189 | */ 190 | val areaRoadMonitorid: RDD[(String, Int)] = cameraRdd.map(row => { 191 | val areaId: String = row.getAs[String]("area_id") 192 | val roadId: String = row.getAs[String]("road_id") 193 | val monitorId: String = row.getAs[String]("monitor_id") 194 | (areaId + "-" + roadId + "-" + monitorId, 1) 195 | }) 196 | /** 197 | * 过滤掉非前top3的数据 198 | */ 199 | val areaRoadMonitoridfilter: RDD[(String, Int)] = areaRoadMonitorid.filter(tp => { 200 | val areaIdRoadIdMonitorId: String = tp._1 201 | val split: Array[String] = areaIdRoadIdMonitorId.split("-") 202 | val areaIdRoadId: String = split(0) + "-" + split(1) 203 | val broadcastResult: collection.Map[String, Int] = broadcastResultMap.value 204 | broadcastResult.contains(areaIdRoadId) 205 | }) 206 | /** 207 | * 将areaRoadMonitorid中的值合并value并排序 208 | */ 209 | val areaRoadMonitoridSort: RDD[(String, Int)] = areaRoadMonitoridfilter.reduceByKey(_ + _).sortBy(_._2, false) 210 | 211 | /** 212 | * 转换成(areaname,roadId-monitorid-value) 213 | * 214 | */ 215 | val tempResult1: RDD[(String, Iterable[String])] = areaRoadMonitoridSort.map(tp => { 216 | val areaIdRoadIdMonitorId: String = tp._1 217 | val split: Array[String] = areaIdRoadIdMonitorId.split("-") 218 | (split(0) + "-" + split(1), split(2) + "-" + (tp._2)) 219 | }).groupByKey() 220 | 221 | 222 | /** 223 | * 分组后转换成最终结果,带上区域名字 224 | */ 225 | val result1: RDD[(String, Iterable[String])] = tempResult1.map(tp => { 226 | val areaIdName: collection.Map[String, String] = areaIdNameMap.value 227 | var tempArea: String = tp._1.split("-")(0) 228 | val tempRoad: String = tp._1.split("-")(1) 229 | if (areaIdName.contains(tempArea)) { 230 | tempArea = areaIdName.get(tempArea).get 231 | } 232 | (tempArea + "-" + tempRoad, tp._2) 233 | }) 234 | println("----------------------------统计每个区域经过车辆数最多的top3道路及通过的车辆数、当前道路下每个卡扣经过的车辆数") 235 | result1.sortByKey(false).foreach(println) 236 | } 237 | } 238 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/spark/areaRoadFlow/AreaTop3RoadFlowAnalyzeScala2.scala: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.areaRoadFlow 2 | 3 | import com.spark.test.MockDataByMysql 4 | import com.traffic.spark.conf.ConfigurationManager 5 | import com.traffic.spark.constant.Constants 6 | import org.apache.spark.SparkContext 7 | import org.apache.spark.sql.{DataFrame, SparkSession} 8 | 9 | /** 10 | * !!!!!!!!!!!!!!!MYSQL数据库首先要有数据!!!!!!!!!!!!!!!!!!!!! 11 | * * !!!!!次方法生成视图的!!!!!!!!!!MockDataByMysql.MockData(sc, ssc) 12 | * !!!!!!!!!!!!!!!!!!!!!!!!采用开窗函数!!!!!!!!!!!!!!!!!!!!!!!!!!! 13 | * 计算出每一个区域top3的道路流量 14 | * 每一个区域车流量最多的3条道路 每条道路有多个卡扣 15 | * 16 | * @author root 17 | * ./spark-submit 18 | * --master spark://node1:7077 19 | * --jars ../lib/mysql-connector-java-5.1.6.jar,../lib/fastjson-1.2.11.jar 20 | * --driver-class-path ../lib/mysql-connector-java-5.1.6.jar:../lib/fastjson-1.2.11.jar 21 | * ../lib/Test.jar 4 22 | * 23 | * 这是一个分组取topN SparkSQL分组取topN 24 | * 区域,道路流量排序 按照区域和道路进行分组 25 | **/ 26 | object AreaTop3RoadFlowAnalyzeScala2 { 27 | def main(args: Array[String]): Unit = { 28 | var sc: SparkContext = null 29 | var ssc: SparkSession = null 30 | /** 31 | * 判断应用程序是否在本地执行 32 | */ 33 | val onLocal: Boolean = ConfigurationManager.getBoolean(Constants.SPARK_LOCAL) 34 | 35 | /** 36 | * 本地生成视图或者从Hive查询 37 | */ 38 | if (onLocal) { 39 | //这里不会真正的创建SparkSession,而是根据前面这个SparkContext来获取封装SparkSession,因为不会创建存在两个SparkContext的。 40 | ssc = SparkSession.builder().master("local").appName("test").getOrCreate() 41 | sc = ssc.sparkContext 42 | 43 | /** 44 | * !!!!!!!!!!!基于本地测试采用Mysql数据!!!!!!!!!!!!!! 45 | * 如果在集群中运行的话,直接操作Hive中的表就可以 46 | * 本地模拟数据注册成一张临时表 47 | * monitor_flow_action 数据表:监控车流量所有数据 48 | * monitor_camera_info 标准表:卡扣对应摄像头标准表 49 | * area_info 区域表:区域对应的区域名称 50 | */ 51 | MockDataByMysql.MockData(sc, ssc) 52 | } else { 53 | println("++++++++++++开启hive的支持+++++++++++++++++") 54 | ssc = SparkSession.builder().appName(Constants.SPARK_APP_NAME) 55 | .config("spark.sql.autoBroadcastJoinThreshold", "1048576000").enableHiveSupport().getOrCreate() 56 | sc = ssc.sparkContext 57 | ssc.sql("usr traffic") 58 | } 59 | sc.setLogLevel("ERROR") 60 | 61 | ssc.udf.register("group_concat_distinct", new GroupConcatDistinctUDAFScala()) 62 | val result: DataFrame = ssc.sql( 63 | """ 64 | | select 65 | | temp.area_name,temp.road_id,temp.carCount,temp.monitor_infos,row_number() over(partition by area_name order by carCount desc) as rank 66 | | from 67 | | (SELECT 68 | | area_name,road_id,COUNT(car) carCount,group_concat_distinct(monitor_id) monitor_infos 69 | | FROM 70 | | monitor_flow_action t1 LEFT JOIN area_info t2 71 | | ON 72 | | t1.area_id=t2.area_id GROUP BY area_name,road_id) temp 73 | | HAVING rank <=3 74 | """.stripMargin) 75 | result.show(24, false) 76 | 77 | 78 | 79 | // val result: DataFrame = ssc.sql( 80 | // """ 81 | // | SELECT area_name,road_id,COUNT(car) carCount,group_concat_distinct(monitor_id) monitor_infos 82 | // | FROM monitor_flow_action t1 LEFT JOIN area_info t2 83 | // | ON t1.area_id=t2.area_id 84 | // | GROUP BY area_name,road_id 85 | // """.stripMargin 86 | // ) 87 | // result.show(1000) 88 | 89 | // val result: DataFrame = ssc.sql("SELECT " + 90 | // "area_name,road_id,COUNT(car) carCount " + 91 | // "FROM monitor_flow_action t1 LEFT JOIN area_info t2 ON t1.area_id=t2.area_id GROUP BY area_name,road_id ") 92 | // result.show(100) 93 | } 94 | } 95 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/spark/areaRoadFlow/GroupConcatDistinctUDAFScala.scala: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.areaRoadFlow 2 | 3 | import org.apache.spark.sql.Row 4 | import org.apache.spark.sql.expressions.{MutableAggregationBuffer, UserDefinedAggregateFunction} 5 | import org.apache.spark.sql.types.{DataType, DataTypes, IntegerType, MapType, StringType, StructType} 6 | 7 | import scala.collection.mutable 8 | 9 | /** 10 | * 组内拼接去重函数(group_concat_distinct()) 11 | * 12 | * 技术点:自定义UDAF聚合函数 13 | * 14 | * @author LPF 15 | * 16 | */ 17 | class GroupConcatDistinctUDAFScala extends UserDefinedAggregateFunction { 18 | /** 19 | * 输入数据的类型 注意数据类型 20 | */ 21 | override def inputSchema: StructType = { 22 | DataTypes.createStructType(Array(DataTypes.createStructField("xx", StringType, true))) 23 | } 24 | 25 | /** 26 | * 聚合操作时,所处理的数据的类型 27 | */ 28 | override def bufferSchema: StructType = { 29 | DataTypes.createStructType(Array(DataTypes.createStructField("oo", MapType(StringType, IntegerType), true))) 30 | } 31 | 32 | /** 33 | * 最终函数返回值的类型 34 | */ 35 | override def dataType: DataType = { 36 | DataTypes.StringType 37 | } 38 | 39 | /** 40 | * 多次运行 相同的输入总是相同的输出,确保一致性 41 | */ 42 | override def deterministic: Boolean = { 43 | true 44 | } 45 | 46 | /** 47 | * 为每个分组的数据执行初始化值 48 | * 两个部分的初始化: 49 | * 1.在map端每个RDD分区内,在RDD每个分区内 按照group by 的字段分组,每个分组都有个初始化的值 50 | * 2.在reduce 端给每个group by 的分组做初始值 51 | */ 52 | override def initialize(buffer: MutableAggregationBuffer): Unit = { 53 | buffer(0) = mutable.Map[String, Int]() 54 | } 55 | 56 | /** 57 | * 每个组,有新的值进来的时候,进行分组对应的聚合值的计算 58 | */ 59 | override def update(buffer: MutableAggregationBuffer, input: Row): Unit = { 60 | // 取出每个组里面初始化的Map转换成mutableMap经行计算 61 | val map: collection.Map[String, Int] = buffer.getMap(0) 62 | val returnMap: mutable.Map[String, Int] = collection.mutable.Map(map.toSeq: _*) 63 | // 取出输入的MonitorID 64 | val inputMonitorInfo: String = input.getString(0) 65 | if (!returnMap.contains(inputMonitorInfo)) { 66 | // 如果map里没有次MonitorID就放一个初始count为1的进去 67 | returnMap.put(inputMonitorInfo, 1) 68 | } else { 69 | // 如果有了此MonitorID就取出count,count+1放回去 70 | val count: Int = returnMap.get(inputMonitorInfo).get 71 | returnMap.put(inputMonitorInfo, count + 1) 72 | } 73 | buffer.update(0, returnMap) 74 | } 75 | 76 | 77 | /** 78 | * 最后merger的时候,在各个节点上的聚合值,要进行merge,也就是合并 79 | */ 80 | override def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = { 81 | // 缓冲的总的map 82 | val allMap: collection.Map[String, Int] = buffer1.getMap(0) 83 | val returnAllMap: mutable.Map[String, Int] = collection.mutable.Map(allMap.toSeq: _*) 84 | // 传进来的map 85 | val inputMap: collection.Map[String, Int] = buffer2.getMap(0) 86 | 87 | // 遍历inputMap 将里面的值添加进returnAllMap中缓存 88 | for (key <- inputMap.keys) { 89 | if (!returnAllMap.contains(key)) { 90 | returnAllMap.put(key, inputMap.get(key).get) 91 | } else { 92 | val allCount: Int = returnAllMap.get(key).get 93 | val someCount: Int = inputMap.get(key).get 94 | returnAllMap.put(key, allCount + someCount) 95 | } 96 | } 97 | buffer1.update(0, returnAllMap) 98 | } 99 | 100 | 101 | /** 102 | * 最后返回一个最终的聚合值要和dataType的类型一一对应 103 | */ 104 | override def evaluate(buffer: Row): Any = { 105 | val map: collection.Map[String, Int] = buffer.getMap(0) 106 | var str: String = "" 107 | for (key <- map.keys) { 108 | str = "|" + key + "-" + map.get(key).get + str 109 | } 110 | str.substring(1) 111 | } 112 | } 113 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/spark/areaRoadFlow/MonitorOneStepConvertRateAnalyzeScala.scala: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.areaRoadFlow 2 | 3 | import java.lang 4 | 5 | import com.alibaba.fastjson.{JSON, JSONObject} 6 | import com.traffic.spark.conf.ConfigurationManager 7 | import com.traffic.spark.constant.Constants 8 | import com.traffic.spark.dao.ITaskDAO 9 | import com.traffic.spark.dao.factory.DAOFactory 10 | import com.traffic.spark.domain.Task 11 | import com.traffic.spark.util.ParamUtils 12 | import com.com.bjsxt.spark.util.SparkUtilsScala 13 | import com.spark.test.{MockDataByMysql, MockDataScala} 14 | import org.apache.spark.SparkContext 15 | import org.apache.spark.broadcast.Broadcast 16 | import org.apache.spark.rdd.RDD 17 | import org.apache.spark.sql.{DataFrame, Row, SparkSession} 18 | 19 | import scala.collection.mutable.ListBuffer 20 | 21 | /** 22 | * 卡扣流量转换率 23 | */ 24 | object MonitorOneStepConvertRateAnalyzeScala { 25 | def main(args: Array[String]): Unit = { 26 | var sc: SparkContext = null 27 | var ssc: SparkSession = null 28 | /** 29 | * 判断应用程序是否在本地执行 30 | */ 31 | val onLocal: Boolean = ConfigurationManager.getBoolean(Constants.SPARK_LOCAL) 32 | 33 | /** 34 | * 本地生成视图或者从Hive查询 35 | */ 36 | if (onLocal) { 37 | //这里不会真正的创建SparkSession,而是根据前面这个SparkContext来获取封装SparkSession,因为不会创建存在两个SparkContext的。 38 | ssc = SparkSession.builder().master("local").appName("test").getOrCreate() 39 | sc = ssc.sparkContext 40 | 41 | /** 42 | * !!!!!!!!!!!基于本地测试采用Mysql数据!!!!!!!!!!!!!! 43 | * 如果在集群中运行的话,直接操作Hive中的表就可以 44 | * 本地模拟数据注册成一张临时表 45 | * monitor_flow_action 数据表:监控车流量所有数据 46 | * monitor_camera_info 标准表:卡扣对应摄像头标准表 47 | * area_info 区域表:区域对应的区域名称 48 | */ 49 | MockDataByMysql.MockData(sc, ssc) 50 | } else { 51 | println("++++++++++++开启hive的支持+++++++++++++++++") 52 | ssc = SparkSession.builder().appName(Constants.SPARK_APP_NAME).enableHiveSupport().getOrCreate() 53 | sc = ssc.sparkContext 54 | ssc.sql("usr traffic") 55 | } 56 | sc.setLogLevel("ERROR") 57 | 58 | val taskId: lang.Long = ParamUtils.getTaskIdFromArgs(args, Constants.SPARK_LOCAL_TASKID_MONITOR_ONE_STEP_CONVERT) 59 | val taskDAO: ITaskDAO = DAOFactory.getTaskDAO 60 | val task: Task = taskDAO.findTaskById(taskId) 61 | if (task == null) { 62 | println("没有当前taskID对应的对象") 63 | System.exit(1) 64 | } 65 | val parms: JSONObject = JSON.parseObject(task.getTaskParams()) 66 | 67 | /** 68 | * 从数据库中查找出来我们指定的卡扣流 组装成list并广播 69 | * 0001,0002,0003,0004,0005 70 | */ 71 | val roadFlow: String = ParamUtils.getParam(parms, Constants.PARAM_MONITOR_FLOW) 72 | val split: Array[String] = roadFlow.split(",") 73 | //定义未来广播的list 74 | var listRoad: ListBuffer[String] = ListBuffer[String]() 75 | // 定义变量road 用于临时存放listRoad中的值 76 | var road = "" 77 | for (str <- split) { 78 | road = road + "," + str 79 | listRoad.append(road.substring(1)) 80 | } 81 | val listRoadBroadcast: Broadcast[ListBuffer[String]] = sc.broadcast(listRoad) 82 | 83 | /** 84 | * 通过params(json字符串)查询monitor_flow_action 85 | * 获取指定日期内检测的monitor_flow_action中车流量数据,返回JavaRDD 86 | */ 87 | val cameraDF: DataFrame = SparkUtilsScala.getCameraRDDByDateRange(ssc, parms) 88 | //转换为RDD 89 | val rdd1: RDD[Row] = cameraDF.rdd 90 | 91 | /** 92 | * 将RDD转换为 (car,row)格式 93 | */ 94 | val rdd2: RDD[(String, Row)] = rdd1.map(row => { 95 | (row.getAs[String]("car"), row) 96 | }) 97 | 98 | /** 99 | * 将rdd2 先按照 row里面的时间排序,再按照car来分组 100 | */ 101 | // val rdd3: RDD[(String, Iterable[Row])] = rdd2.sortBy(tp => {tp._2.getAs[String]("action_time")}).groupByKey() 102 | 103 | 104 | val rdd3: RDD[(String, Iterable[Row])] = rdd2.sortBy(_._2.getAs[String]("action_time")).groupByKey() 105 | 106 | /** 107 | * rrd3分好组的(car,(row,row))转换为 (car,str) 108 | * str 为 monitor_id按照时间顺序来拼接的字符串 109 | */ 110 | val rdd4: RDD[String] = rdd3.map(tp => { 111 | val list: List[Row] = tp._2.iterator.toList 112 | var monitorIds = "" 113 | for (row <- list) { 114 | val monitorId: String = row.getAs[String]("monitor_id") 115 | monitorIds = monitorIds + "," + monitorId 116 | } 117 | (monitorIds.substring(1)) 118 | }) 119 | /** 120 | * 计算这一辆车,有多少次匹配到我们指定的卡扣流 121 | * 122 | * 先拿到车辆的轨迹,比如一辆车轨迹:0001,0002,0003,0004,0001,0002,0003,0001,0004 123 | * 返回一个二元组(切分的片段,该片段对应的该车辆轨迹中匹配上的次数) 124 | * ("0001",3) 125 | * ("0001,0002",2) 126 | * ("0001,0002,0003",2) 127 | * ("0001,0002,0003,0004",1) 128 | * ("0001,0002,0003,0004,0005",0) 129 | * ... ... 130 | * ("0001",13) 131 | * ("0001,0002",12) 132 | * ("0001,0002,0003",11) 133 | * ("0001,0002,0003,0004",11) 134 | * ("0001,0002,0003,0004,0005",10) 135 | */ 136 | val rdd5: RDD[(String, Long)] = rdd4.flatMap(str => { 137 | val returnList: ListBuffer[(String, Long)] = ListBuffer[(String, Long)]() 138 | //获取广播变量的值 139 | val list: ListBuffer[String] = listRoadBroadcast.value 140 | for (s <- list) { 141 | //indexOf 从哪个位置开始查找 142 | var index = 0 143 | //这辆车有多少次匹配到这个卡扣切片的次数 144 | var count = 0L 145 | while (str.indexOf(s, index) != -1) { 146 | index = str.indexOf(s, index) + 1 147 | count += 1 148 | } 149 | returnList.append((s, count)) 150 | } 151 | returnList.iterator 152 | }) 153 | 154 | /** 155 | * 合并并打印数据 156 | */ 157 | val rdd6: RDD[(String, Long)] = rdd5.reduceByKey(_ + _) 158 | rdd6.foreach(println) 159 | // 下面的方式为何不行 foreachPartition是打印迭代器了 160 | 161 | // rdd6.foreachPartition(println) 162 | } 163 | } 164 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/spark/areaRoadFlow/TestGroupByKey.scala: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.areaRoadFlow 2 | 3 | import org.apache.spark.rdd.RDD 4 | import org.apache.spark.{SparkConf, SparkContext} 5 | 6 | /** 7 | * 无聊测试代码 8 | * 测试groupByKey对分区是否有影响 9 | */ 10 | object TestGroupByKey { 11 | def main(args: Array[String]): Unit = { 12 | // val conf = new SparkConf().setMaster("local[8]").setAppName("test") 13 | val conf = new SparkConf().setAppName("test") 14 | val sc = new SparkContext(conf) 15 | sc.setLogLevel("ERROR") 16 | val lines: RDD[String] = sc.parallelize(List[String]( 17 | "zhangshan,A,100", 18 | "zhangshan,B,99", 19 | "zhangshan,C,98", 20 | "zhangshan,D,97", 21 | "lisi,Z,1", 22 | "lisi,X,2", 23 | "lisi,V,3", 24 | "lisi,O,4" 25 | )) 26 | val nameKemu: RDD[(String, Int)] = lines.map(str => { 27 | val split: Array[String] = str.split(",") 28 | ((split(0) + "-" + split(1)), split(2).toInt) 29 | }) 30 | val nameKemuReduce: RDD[(String, Int)] = nameKemu.reduceByKey(_ + _, 8) 31 | .sortBy(_._2, false, 8) 32 | // nameKemuReduce.mapPartitionsWithIndex((index: Int, iter: Iterator[(String, Int)]) => { 33 | // val list: List[(String, Int)] = iter.toList 34 | // for (oneTp <- list) { 35 | // println(s"分区是 $index 名字班级是 ${oneTp._1} 分数是 ${oneTp._2} ") 36 | // } 37 | // list.iterator 38 | // }).count() 39 | val tempResult: RDD[(String, (String, Int))] = nameKemuReduce.map(tp => { 40 | val nameKecen: String = tp._1 41 | val fenshu: Int = tp._2.toInt 42 | val split: Array[String] = nameKecen.split("-") 43 | (split(0), (split(1), fenshu)) 44 | }) 45 | 46 | /** 47 | * 1、搜集后在excutor里面打印 48 | */ 49 | tempResult.groupByKey(8).foreach(println) 50 | 51 | /** 52 | * 2、搜集后在excutor取出前三个打印 53 | */ 54 | // tempResult.groupByKey().foreach(tp=>{ 55 | // val list: List[(String, Int)] = tp._2.toList 56 | // for(a<-list){ 57 | // println(s"搜集后在excutor取出前三个打印----key = ${tp._1}---value = $a") 58 | // } 59 | // }) 60 | // 61 | // 62 | // val res: RDD[(String, Iterable[(String, Int)])] = tempResult.groupByKey() 63 | // res.foreach(println) 64 | //// res.collect().foreach(println) 65 | // 66 | // 67 | // val rdd: RDD[(String, Iterator[(String, Int)])] = tempResult.groupByKey().map(tp => { 68 | // val list: List[(String, Int)] = tp._2.toList 69 | // val list01: List[(String, Int)] = list.sortBy(_._2) 70 | // (tp._1, list01.iterator) 71 | // }) 72 | // rdd.foreach(tp => { 73 | // val list: List[(String, Int)] = tp._2.toList 74 | // for (a <- list) { 75 | // println(s"key = ${tp._1}----value = $a") 76 | // } 77 | // }) 78 | } 79 | } 80 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/spark/areaRoadFlow/Test_DF_DS_RDD_Speed.scala: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.areaRoadFlow 2 | 3 | /** 4 | * 测试 RDD FD DS速度 5 | */ 6 | 7 | import java.util.UUID 8 | 9 | import org.apache.spark.rdd.RDD 10 | import org.apache.spark.sql.{Dataset, SparkSession} 11 | 12 | object Test_DF_DS_RDD_Speed { 13 | def main(args: Array[String]): Unit = { 14 | val spark: SparkSession = SparkSession.builder().appName("无聊耍耍").master("local").getOrCreate() 15 | spark.sparkContext.setLogLevel("ERROR") 16 | 17 | val firstRdd: RDD[(String, Int)] = spark.sparkContext.parallelize(0 to 400000).map(num => { 18 | (UUID.randomUUID().toString, num) 19 | }) 20 | //firstRdd 21 | firstRdd.cache() 22 | 23 | val beginTimeRdd: Long = System.currentTimeMillis() 24 | firstRdd.map(tp => { 25 | tp._1 + "-" + tp._2 26 | }).collect() 27 | val endTimeRdd: Long = System.currentTimeMillis() 28 | 29 | import spark.implicits._ 30 | val beginTimeDF: Long = System.currentTimeMillis() 31 | firstRdd.toDF().map(row => { 32 | row.get(0) + "-" + row.get(1) 33 | }).collect() 34 | val endTimeDF: Long = System.currentTimeMillis() 35 | 36 | val beginTimeDS: Long = System.currentTimeMillis() 37 | firstRdd.toDS().map(tp => { 38 | tp._1 + "-" + tp._2 39 | }).collect() 40 | val endTimeDS: Long = System.currentTimeMillis() 41 | 42 | println(s"RDD算子耗时${endTimeRdd - beginTimeRdd}") 43 | println(s"DF算子耗时${endTimeDF - beginTimeDF}") 44 | println(s"DS算子耗时${endTimeDS - beginTimeDS}") 45 | } 46 | } 47 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/spark/rtmroad/RedisClient.scala: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.rtmroad 2 | 3 | import org.apache.commons.pool2.impl.GenericObjectPoolConfig 4 | import redis.clients.jedis.JedisPool 5 | 6 | object RedisClient { 7 | val redisHost = "node01" 8 | val redisPort = 6379 9 | val redisTimeout = 30000 10 | 11 | val pool = new JedisPool(new GenericObjectPoolConfig(), redisHost, redisPort, redisTimeout) 12 | } 13 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/spark/rtmroad/RoadRealTimeAnalyzeScala1.scala: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.rtmroad 2 | 3 | import java.text.SimpleDateFormat 4 | import java.util.Calendar 5 | 6 | import com.traffic.spark.conf.ConfigurationManager 7 | import com.traffic.spark.constant.Constants 8 | import org.apache.kafka.clients.consumer.ConsumerRecord 9 | import org.apache.kafka.common.serialization.StringDeserializer 10 | import org.apache.spark.SparkConf 11 | import org.apache.spark.streaming.dstream.{DStream, InputDStream} 12 | import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe 13 | import org.apache.spark.streaming.kafka010.KafkaUtils 14 | import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent 15 | import org.apache.spark.streaming.{Durations, StreamingContext} 16 | 17 | object RoadRealTimeAnalyzeScala1 { 18 | def main(args: Array[String]): Unit = { 19 | val conf = new SparkConf().setAppName("test").setMaster("local") 20 | // 构建Spark Streaming上下文 21 | val ssc = new StreamingContext(conf, Durations.seconds(5)) 22 | ssc.sparkContext.setLogLevel("ERROR") 23 | // checkpoint 保留计算信息 24 | ssc.checkpoint("./MyCheckpoint") 25 | val brokers: String = ConfigurationManager.getProperty(Constants.KAFKA_METADATA_BROKER_LIST) 26 | val kafkaParms: Map[String, Object] = Map[String, Object]( 27 | "bootstrap.servers" -> brokers, 28 | "key.deserializer" -> classOf[StringDeserializer], 29 | "value.deserializer" -> classOf[StringDeserializer], 30 | "group.id" -> "MyGroupId-Traffic", 31 | "auto.offset.reset" -> "earliest", 32 | "enable.auto.commit" -> "true" 33 | ) 34 | // 设置topic 35 | val topics: Array[String] = Array[String]("MyMockRealTimeData") 36 | 37 | val stream: InputDStream[ConsumerRecord[String, String]] = KafkaUtils.createDirectStream( 38 | ssc, 39 | PreferConsistent, 40 | Subscribe[String, String](topics, kafkaParms) 41 | ) 42 | /** 43 | * 转换为卡扣和一辆车的速度 44 | */ 45 | val monitorSpeed: DStream[(String, Int)] = stream.map(tp => { 46 | val row: String = tp.value() 47 | val split: Array[String] = row.split("\t") 48 | (split(1), split(5).toInt) 49 | }) 50 | /** 51 | * 转换为卡扣,(速度count,carcount) 52 | */ 53 | val someCount: DStream[(String, (Int, Int))] = monitorSpeed.mapValues((_, 1)) 54 | 55 | /** 56 | * 用优化的方式统计速度,返回的是tuple2(monitorId,(总速度,当前卡口通过的车辆总个数)) 57 | */ 58 | val result: DStream[(String, (Int, Int))] = someCount.reduceByKeyAndWindow( 59 | (v1: Tuple2[Int, Int], v2: Tuple2[Int, Int]) => { 60 | (v1._1 + v2._1, v1._2 + v2._2) 61 | }, 62 | (v1: Tuple2[Int, Int], v2: Tuple2[Int, Int]) => { 63 | (v1._1 - v2._1, v1._2 - v2._2) 64 | }, Durations.minutes(5), Durations.seconds(5)) 65 | 66 | // 打印结果 67 | result.foreachRDD(rdd => { 68 | rdd.foreachPartition(iter => { 69 | while (iter.hasNext) { 70 | val tuple: (String, (Int, Int)) = iter.next() 71 | val monitorId: String = tuple._1 72 | val speedCount: Int = tuple._2._1 73 | val carCount: Int = tuple._2._2 74 | val secondFormate = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss") 75 | println("当前时间:" + secondFormate.format(Calendar.getInstance.getTime) + " 卡扣编号:" + monitorId + " 车辆总数:" + carCount + " 速度总数:" + speedCount + " 平均速度:" + (speedCount / carCount)) 76 | } 77 | }) 78 | }) 79 | ssc.start() 80 | ssc.awaitTermination() 81 | ssc.stop() 82 | } 83 | } 84 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/spark/rtmroad/RoadRealTimeAnalyzeScala2.scala: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.rtmroad 2 | 3 | import java.text.SimpleDateFormat 4 | import java.util 5 | import java.util.Calendar 6 | 7 | import com.traffic.spark.conf.ConfigurationManager 8 | import com.traffic.spark.constant.Constants 9 | import org.apache.kafka.clients.consumer.ConsumerRecord 10 | import org.apache.kafka.common.TopicPartition 11 | import org.apache.kafka.common.serialization.StringDeserializer 12 | import org.apache.spark.streaming.dstream.{DStream, InputDStream} 13 | import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent 14 | import org.apache.spark.streaming.kafka010.{ConsumerStrategies, HasOffsetRanges, KafkaUtils, OffsetRange} 15 | import org.apache.spark.streaming.{Durations, StreamingContext} 16 | import org.apache.spark.{SparkConf, TaskContext} 17 | import redis.clients.jedis.Jedis 18 | 19 | import scala.collection.mutable 20 | 21 | object RoadRealTimeAnalyzeScala2 { 22 | def main(args: Array[String]): Unit = { 23 | val conf = new SparkConf().setAppName("test").setMaster("local") 24 | // 构建Spark Streaming上下文 25 | val ssc = new StreamingContext(conf, Durations.seconds(5)) 26 | ssc.sparkContext.setLogLevel("ERROR") 27 | // checkpoint 保留计算信息 28 | ssc.checkpoint("./MyCheckpoint") 29 | 30 | val db: Int = 4; 31 | val topic: String = "MyMockRealTimeData" 32 | /** 33 | * 从Redis 中获取消费者offset 34 | */ 35 | val currentTopicOffset: mutable.Map[String, String] = getOffSetFromRedis(db, topic) 36 | //初始读取到的topic offset: 37 | currentTopicOffset.foreach(x => { 38 | println(s" 初始读取到的offset: $x") 39 | }) 40 | //转换成需要的类型 41 | val fromOffsets: Map[TopicPartition, Long] = currentTopicOffset.map(resultSet => { 42 | new TopicPartition(topic, resultSet._1.toInt) -> resultSet._2.toLong 43 | }).toMap 44 | 45 | val brokers: String = ConfigurationManager.getProperty(Constants.KAFKA_METADATA_BROKER_LIST) 46 | val kafkaParms: Map[String, Object] = Map[String, Object]( 47 | "bootstrap.servers" -> brokers, 48 | "key.deserializer" -> classOf[StringDeserializer], 49 | "value.deserializer" -> classOf[StringDeserializer], 50 | "group.id" -> "MyGroupId-Traffic", 51 | "auto.offset.reset" -> "earliest", 52 | "enable.auto.commit" -> "false" 53 | ) 54 | 55 | //获取每五秒的流 56 | val stream: InputDStream[ConsumerRecord[String, String]] = KafkaUtils.createDirectStream( 57 | ssc, 58 | PreferConsistent, 59 | ConsumerStrategies.Assign[String, String](fromOffsets.keys.toList, kafkaParms, fromOffsets) 60 | ) 61 | 62 | /** 63 | * 转换为卡扣和一辆车的速度 64 | */ 65 | val monitorSpeed: DStream[(String, Int)] = stream.map(tp => { 66 | val row: String = tp.value() 67 | val split: Array[String] = row.split("\t") 68 | (split(1), split(5).toInt) 69 | }) 70 | /** 71 | * 转换为卡扣,(速度count,carcount) 72 | */ 73 | val someCount: DStream[(String, (Int, Int))] = monitorSpeed.mapValues((_, 1)) 74 | 75 | /** 76 | * 用优化的方式统计速度,返回的是tuple2(monitorId,(总速度,当前卡口通过的车辆总个数)) 77 | */ 78 | val result: DStream[(String, (Int, Int))] = someCount.reduceByKeyAndWindow( 79 | (v1: Tuple2[Int, Int], v2: Tuple2[Int, Int]) => { 80 | (v1._1 + v2._1, v1._2 + v2._2) 81 | }, 82 | (v1: Tuple2[Int, Int], v2: Tuple2[Int, Int]) => { 83 | (v1._1 - v2._1, v1._2 - v2._2) 84 | }, Durations.minutes(5), Durations.seconds(5)) 85 | 86 | // 打印结果 87 | result.foreachRDD(rdd => { 88 | rdd.foreachPartition(iter => { 89 | while (iter.hasNext) { 90 | val tuple: (String, (Int, Int)) = iter.next() 91 | val monitorId: String = tuple._1 92 | val speedCount: Int = tuple._2._1 93 | val carCount: Int = tuple._2._2 94 | val secondFormate = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss") 95 | println("当前时间:" + secondFormate.format(Calendar.getInstance.getTime) + " 卡扣编号:" + monitorId + " 车辆总数:" + carCount + " 速度总数:" + speedCount + " 平均速度:" + (speedCount / carCount)) 96 | } 97 | }) 98 | }) 99 | // offset存入redis 100 | stream.foreachRDD(rdd => { 101 | val offsetRanges: Array[OffsetRange] = rdd.asInstanceOf[HasOffsetRanges].offsetRanges 102 | 103 | //LPF foreachPartition是遍历分区吗 104 | rdd.foreachPartition { iter => 105 | val o: OffsetRange = offsetRanges(TaskContext.get.partitionId) 106 | println(s"topic:${o.topic} partition:${o.partition} fromOffset:${o.fromOffset} untilOffset: ${o.untilOffset}") 107 | } 108 | saveOffsetToRedis(db, offsetRanges) 109 | }) 110 | 111 | ssc.start() 112 | ssc.awaitTermination() 113 | ssc.stop() 114 | } 115 | 116 | def getOffSetFromRedis(db: Int, topic: String) = { 117 | val jedis: Jedis = RedisClient.pool.getResource 118 | jedis.select(db) 119 | val result: util.Map[String, String] = jedis.hgetAll(topic) 120 | RedisClient.pool.returnResource(jedis) 121 | if (result.size() == 0) { 122 | // 每个分区的偏移量 123 | result.put("0", "0") 124 | } 125 | import scala.collection.JavaConversions.mapAsScalaMap 126 | 127 | val offsetMap: scala.collection.mutable.Map[String, String] = result 128 | offsetMap 129 | } 130 | 131 | /** 132 | * 将消费者offset 保存到 Redis中 133 | * 134 | */ 135 | def saveOffsetToRedis(db: Int, offsetRanges: Array[OffsetRange]) = { 136 | val jedis: Jedis = RedisClient.pool.getResource 137 | jedis.select(db) 138 | offsetRanges.foreach(one => { 139 | jedis.hset(one.topic.toString, one.partition.toString, one.untilOffset.toString) 140 | }) 141 | RedisClient.pool.returnResource(jedis) 142 | } 143 | } 144 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/spark/skynet/SpeedSortKeyScala.scala: -------------------------------------------------------------------------------- 1 | package com.traffic.spark.skynet 2 | 3 | case class SpeedSortKeyScala(lowSpeed: Int, 4 | normalSpeed: Int, 5 | mediumSpeed: Int, 6 | highSpeed: Int) extends Ordered[SpeedSortKeyScala] { 7 | override def compare(that: SpeedSortKeyScala): Int = { 8 | var result: Int = this.highSpeed.compareTo(that.highSpeed) 9 | if (result == 0) { 10 | result = this.mediumSpeed.compareTo(that.mediumSpeed) 11 | if (result == 0) { 12 | result = this.normalSpeed.compareTo(that.normalSpeed) 13 | if (result == 0) { 14 | result = this.lowSpeed.compareTo(that.lowSpeed) 15 | } 16 | } 17 | } 18 | result 19 | } 20 | } 21 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/test/MockDataByMysql.scala: -------------------------------------------------------------------------------- 1 | package com.spark.test 2 | 3 | import java.util.Properties 4 | 5 | import org.apache.spark.SparkContext 6 | import org.apache.spark.sql.{DataFrame, SparkSession} 7 | 8 | object MockDataByMysql { 9 | def main(args: Array[String]): Unit = { 10 | val ssc: SparkSession = SparkSession.builder().master("local").appName("makeData").getOrCreate() 11 | val sc: SparkContext = ssc.sparkContext 12 | sc.setLogLevel("ERROR") 13 | MockData(sc, ssc) 14 | } 15 | 16 | def MockData(sc: SparkContext, ssc: SparkSession): Unit = { 17 | val properties = new Properties() 18 | properties.setProperty("user", "root") 19 | properties.setProperty("password", "123") 20 | val carInfos: DataFrame = ssc.read.jdbc("jdbc:mysql://node01:3306/traffic", "carTable", properties) 21 | carInfos.createOrReplaceTempView("monitor_flow_action") 22 | carInfos.show() 23 | val jkInfos: DataFrame = ssc.read.jdbc("jdbc:mysql://node01:3306/traffic", "jkTable", properties) 24 | jkInfos.createOrReplaceTempView("monitor_camera_info") 25 | jkInfos.show() 26 | val areaInfos: DataFrame = ssc.read.jdbc("jdbc:mysql://node01:3306/traffic", "area_info", properties) 27 | areaInfos.createOrReplaceTempView("area_info") 28 | areaInfos.show() 29 | } 30 | } 31 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/test/MockDataScala.scala: -------------------------------------------------------------------------------- 1 | package com.spark.test 2 | 3 | import java.util 4 | import java.util.{Arrays, HashSet, Properties} 5 | 6 | import com.traffic.spark.util.{DateUtils, StringUtils} 7 | import org.apache.spark.rdd.RDD 8 | import org.apache.spark.sql.types.{DataTypes, StringType, StructField, StructType} 9 | import org.apache.spark.sql.{DataFrame, Dataset, Row, RowFactory, SaveMode, SparkSession} 10 | import org.apache.spark.{SparkConf, SparkContext} 11 | 12 | import scala.collection.mutable 13 | import scala.collection.mutable.ListBuffer 14 | import scala.util.Random 15 | 16 | /** 17 | * 模拟数据 数据格式如下: 18 | * 19 | * 日期 卡口ID 摄像头编号 车牌号 拍摄时间 车速 道路ID 区域ID 20 | * date monitor_id camera_id car action_time speed road_id area_id 21 | * 22 | * monitor_flow_action 23 | * monitor_camera_info 24 | * 25 | * @author Administrator 26 | */ 27 | object MockDataScala { 28 | 29 | def main(args: Array[String]): Unit = { 30 | val ssc: SparkSession = SparkSession.builder().master("local").appName("makeData").getOrCreate() 31 | val sc: SparkContext = ssc.sparkContext 32 | sc.setLogLevel("ERROR") 33 | MockData(sc, ssc) 34 | } 35 | 36 | def MockData(sc: SparkContext, ssc: SparkSession) = { 37 | val properties = new Properties() 38 | properties.setProperty("user", "root") 39 | properties.setProperty("password", "123") 40 | 41 | 42 | val dataList = new ListBuffer[Row] 43 | val random = new Random() 44 | val locations: Array[String] = Array[String]("鲁", "京", "粤", "浙", "沪", "上", "川", "深", "晋", "湘") 45 | val date: String = DateUtils.getTodayDate() 46 | 47 | /** 48 | * 模拟3000个车辆 49 | */ 50 | for (i <- 1 to 3000) { 51 | //模拟车牌号:如:京A00001 52 | val car: String = locations(random.nextInt(10)) + (65 + random.nextInt(26)).toChar + StringUtils.fulFuill(5, random.nextInt(100000) + "") 53 | 54 | //baseActionTime 模拟24小时 55 | var baseActionTime: String = date + " " + StringUtils.fulFuill(random.nextInt(24) + "") //2019-05-05 08 56 | /** 57 | * 这里的for循环模拟每辆车经过不同的卡扣不同的摄像头 数据。 58 | */ 59 | for (j <- 1 to (random.nextInt(300) + 1)) { 60 | //模拟每个车辆每被30个摄像头拍摄后 时间上累计加1小时。这样做使数据更加真实。 61 | if (j % 30 == 0 && j != 0) { 62 | var addOneHourTime: Integer = baseActionTime.split(" ")(1).toInt + 1 63 | if (addOneHourTime == 24) { 64 | addOneHourTime = 0 65 | baseActionTime = date + " " + StringUtils.fulFuill(addOneHourTime + "") 66 | } 67 | 68 | val cameraId: String = StringUtils.fulFuill(5, random.nextInt(100000) + "") //模拟摄像头id cameraId 69 | 70 | val areaId: String = StringUtils.fulFuill(2, Integer.valueOf(cameraId) % 8 + 1 + "") //模拟areaId 【一共8个区域】 71 | 72 | val roadId: String = Integer.valueOf(cameraId) % 50 + 1 + "" //模拟道路id 【1~50 个道路】 73 | 74 | val monitorId: String = StringUtils.fulFuill(4, Integer.valueOf(cameraId) % 9 + 1 + "") //模拟9个卡扣monitorId,0补全4位 75 | 76 | val actionTime: String = baseActionTime + ":" + StringUtils.fulFuill(random.nextInt(60) + "") + ":" + StringUtils.fulFuill(random.nextInt(60) + "") //模拟经过此卡扣开始时间 ,如:2019-05-05 08:10:20 77 | 78 | val speed: String = (random.nextInt(260) + 1) + "" //模拟速度 79 | 80 | val row: Row = RowFactory.create(date, monitorId, cameraId, car, actionTime, speed, roadId, areaId) 81 | dataList.append(row) 82 | } 83 | } 84 | } 85 | var rowRdd: RDD[Row] = sc.parallelize(dataList) 86 | val structType: StructType = StructType(List[StructField]( 87 | StructField("date", StringType, true), 88 | StructField("monitor_id", StringType, true), 89 | StructField("camera_id", StringType, true), 90 | StructField("car", StringType, true), 91 | StructField("action_time", StringType, true), 92 | StructField("speed", StringType, true), 93 | StructField("road_id", StringType, true), 94 | StructField("area_id", StringType, true))) 95 | val df: DataFrame = ssc.createDataFrame(rowRdd, structType) 96 | //默认打印出来df里面的20行数据 97 | println("----打印 车辆信息数据----") 98 | println("----打印 row总数----" + df.count()) 99 | df.show() 100 | 101 | df.write.mode(SaveMode.Overwrite).jdbc("jdbc:mysql://node01:3306/traffic", "carTable", properties) 102 | 103 | df.createOrReplaceTempView("monitor_flow_action") 104 | var ds: Dataset[Row] = df 105 | ds.write.mode(SaveMode.Overwrite).json("mmmm") 106 | 107 | /** 108 | * monitorAndCameras 109 | * key:monitor_id 110 | * value:hashSet(camera_id) 111 | * 基于生成的数据,生成对应的卡扣号和摄像头对应基本表 112 | */ 113 | val monitorAndCameras: mutable.Map[String, mutable.Set[String]] = mutable.Map[String, mutable.Set[String]]() 114 | var index = 0 115 | for (row <- dataList) { 116 | var sets: mutable.Set[String] = monitorAndCameras.get(row.getAs[String](1)).getOrElse(null) 117 | if (sets == null) { 118 | sets = mutable.Set[String]() 119 | monitorAndCameras.put(row.getAs[String](1), sets) 120 | } 121 | //这里每隔1000条数据随机插入一条数据,模拟出来标准表中卡扣对应摄像头的数据比模拟数据中多出来的摄像头。 122 | // 这个摄像头的数据不一定会在车辆数据中有。即可以看出卡扣号下有坏的摄像头。 123 | index += 1 124 | if (index % 1000 == 0) { 125 | val str: String = StringUtils.fulFuill(7, random.nextInt(1000000) + "") 126 | println(s"------------------------$str") 127 | sets.add(str); 128 | } 129 | val cameraId: String = row.getAs[String](2) 130 | sets.add(cameraId) 131 | } 132 | dataList.clear() 133 | 134 | val monitor_ids: Iterable[String] = monitorAndCameras.keys 135 | for (monitor_id <- monitor_ids) { 136 | val camera_ids: mutable.Set[String] = monitorAndCameras.get(monitor_id).get 137 | var row: Row = null 138 | for (camera_id <- camera_ids) { 139 | row = RowFactory.create(monitor_id, camera_id) 140 | dataList.append(row) 141 | } 142 | } 143 | val monitorSchema: StructType = StructType(List[StructField]( 144 | StructField("monitor_id", StringType, true), 145 | StructField("camera_id", StringType, true))) 146 | rowRdd = sc.parallelize(dataList) 147 | val monitorDF: DataFrame = ssc.createDataFrame(rowRdd, monitorSchema) 148 | 149 | monitorDF.write.mode(SaveMode.Overwrite).jdbc("jdbc:mysql://node01:3306/traffic", "jkTable", properties) 150 | 151 | monitorDF.write.mode(SaveMode.Overwrite).json("nnnnn") 152 | println("----打印monitorDF总数----" + monitorDF.count()) 153 | monitorDF.createOrReplaceTempView("monitor_camera_info") 154 | println("----打印 卡扣号对应摄像头号 数据----") 155 | monitorDF.show() 156 | } 157 | } 158 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/test/MyMockRealTimeDataScala.scala: -------------------------------------------------------------------------------- 1 | package com.spark.test 2 | 3 | import java.util.Properties 4 | 5 | import com.traffic.spark.util.{DateUtils, StringUtils} 6 | import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord} 7 | 8 | import scala.util.Random 9 | 10 | class MyMockRealTimeDataScala extends Thread { 11 | private val random = new Random() 12 | private val locations: Array[String] = Array[String]("鲁", "京", "川", "深", "沪", "晋", "京", "西", "重", "湘") 13 | 14 | var producer: KafkaProducer[String, String] = new KafkaProducer[String, String](createProducerConfig()) 15 | 16 | def createProducerConfig() = { 17 | val prop = new Properties() 18 | prop.put("bootstrap.servers", "node01:9092,node02:9092,node03:9092") 19 | prop.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer") 20 | prop.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer") 21 | prop 22 | } 23 | 24 | override def run() = { 25 | println("正在生产数据 ... ... ") 26 | while (true) { 27 | val date: String = DateUtils.getTodayDate 28 | val baseActionTime: String = date + " " + StringUtils.fulFuill(random.nextInt(24) + "") 29 | val actionTime: String = baseActionTime + ":" + StringUtils.fulFuill(random.nextInt(60) + "") + ":" + StringUtils.fulFuill(random.nextInt(60) + "") 30 | val monitorId: String = StringUtils.fulFuill(4, random.nextInt(9) + "") 31 | val car: String = locations(random.nextInt(10)) + (65 + random.nextInt(26)).toChar + StringUtils.fulFuill(5, random.nextInt(99999) + "") 32 | val speed: String = random.nextInt(260) + "" 33 | 34 | val cameraId: String = StringUtils.fulFuill(5, random.nextInt(9999) + "") 35 | val roadId: String = Integer.valueOf(cameraId) % 50 + 1 + "" 36 | val areaId: String = StringUtils.fulFuill(2, Integer.valueOf(cameraId) % 9 + 1 + "") 37 | producer.send(new ProducerRecord[String, String]("MyMockRealTimeData", date + "\t" + monitorId + "\t" + cameraId + "\t" + car + "\t" + actionTime + "\t" + speed + "\t" + roadId + "\t" + areaId)) 38 | Thread.sleep(50) 39 | } 40 | } 41 | } 42 | 43 | object producer { 44 | def main(args: Array[String]): Unit = { 45 | val mockRealTimeDataScala = new MyMockRealTimeDataScala() 46 | mockRealTimeDataScala.start() 47 | } 48 | } 49 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/util/SelfDateCarCountScala.scala: -------------------------------------------------------------------------------- 1 | package com.com.bjsxt.spark.util 2 | 3 | import org.apache.spark.util.AccumulatorV2 4 | 5 | import scala.collection.mutable 6 | 7 | /** 8 | * map -> (date,carCount) 9 | */ 10 | case class DateCarCount(dateCarCountMap: mutable.Map[String, Long]) 11 | 12 | class SelfDateCarCountScala extends AccumulatorV2[DateCarCount, DateCarCount] { 13 | /** 14 | * 初始化累计器的值,这个值是最后要在merge合并的时候累加到最终结果内 15 | */ 16 | var returnResult = DateCarCount(mutable.Map[String, Long]()) 17 | 18 | /** 19 | * 与reset() 方法中保持一致,返回true。 20 | */ 21 | override def isZero: Boolean = { 22 | returnResult == DateCarCount(mutable.Map[String, Long]()) 23 | } 24 | 25 | /** 26 | * 复制一个新的累加器,在这里就是如果用到了就会复制一个新的累加器。 27 | */ 28 | override def copy(): AccumulatorV2[DateCarCount, DateCarCount] = { 29 | val acc: SelfDateCarCountScala = new SelfDateCarCountScala() 30 | acc.returnResult = this.returnResult 31 | acc 32 | } 33 | 34 | /** 35 | * 重置AccumulatorV2中的数据,这里初始化的数据是在RDD每个分区内部,每个分区内的初始值。 36 | */ 37 | override def reset(): Unit = { 38 | returnResult = DateCarCount(mutable.Map[String, Long]()) 39 | } 40 | 41 | 42 | /** 43 | * 每个分区累加数据 44 | * 这里是拿着初始的result值和每个分区的数据累加 45 | */ 46 | override def add(v: DateCarCount): Unit = { 47 | returnResult = myAdd(returnResult, v) 48 | } 49 | 50 | /** 51 | * 分区之间总和累加数据 52 | * 这里拿着初始的result值 和每个分区最终的结果累加 53 | * 54 | */ 55 | override def merge(other: AccumulatorV2[DateCarCount, DateCarCount]): Unit = { 56 | val v: SelfDateCarCountScala = other.asInstanceOf[SelfDateCarCountScala] 57 | returnResult = myAdd(returnResult, v.returnResult) 58 | } 59 | 60 | /** 61 | * 累计器对外返回的最终的结果 62 | */ 63 | override def value: DateCarCount = returnResult 64 | 65 | /** 66 | * 67 | * @param returnResult 68 | * @param v 69 | * @return 70 | */ 71 | def myAdd(returnResult: DateCarCount, v: DateCarCount): DateCarCount = { 72 | val map: mutable.Map[String, Long] = v.dateCarCountMap 73 | map.foreach(mp => { 74 | val key: String = mp._1 75 | val value: Long = mp._2 76 | returnResult.dateCarCountMap.put(key, value) 77 | }) 78 | returnResult 79 | } 80 | } 81 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/util/SelfDateCarInfosScala.scala: -------------------------------------------------------------------------------- 1 | package com.com.bjsxt.spark.util 2 | 3 | import org.apache.spark.util.AccumulatorV2 4 | 5 | import scala.collection.mutable 6 | import scala.collection.mutable.ListBuffer 7 | 8 | /** 9 | * map -> (date,List(infos)) 10 | */ 11 | case class DateCarInfos(dateCarInfosMap: mutable.Map[String, ListBuffer[String]]) 12 | 13 | class SelfDateCarInfosScala extends AccumulatorV2[DateCarInfos, DateCarInfos] { 14 | /** 15 | * 初始化累计器的值,这个值是最后要在merge合并的时候累加到最终结果内 16 | */ 17 | var returnResult = DateCarInfos(mutable.Map[String, ListBuffer[String]]()) 18 | 19 | /** 20 | * 与reset() 方法中保持一致,返回true。 21 | */ 22 | override def isZero: Boolean = { 23 | returnResult == DateCarInfos(mutable.Map[String, ListBuffer[String]]()) 24 | } 25 | 26 | /** 27 | * 复制一个新的累加器,在这里就是如果用到了就会复制一个新的累加器。 28 | */ 29 | override def copy(): AccumulatorV2[DateCarInfos, DateCarInfos] = { 30 | val acc: SelfDateCarInfosScala = new SelfDateCarInfosScala() 31 | acc.returnResult = this.returnResult 32 | acc 33 | } 34 | 35 | /** 36 | * 重置AccumulatorV2中的数据,这里初始化的数据是在RDD每个分区内部,每个分区内的初始值。 37 | */ 38 | override def reset(): Unit = { 39 | returnResult = DateCarInfos(mutable.Map[String, ListBuffer[String]]()) 40 | } 41 | 42 | 43 | /** 44 | * 每个分区累加数据 45 | * 这里是拿着初始的result值和每个分区的数据累加 46 | */ 47 | override def add(v: DateCarInfos): Unit = { 48 | returnResult = myAdd(returnResult, v) 49 | } 50 | 51 | /** 52 | * 分区之间总和累加数据 53 | * 这里拿着初始的result值 和每个分区最终的结果累加 54 | * 55 | */ 56 | override def merge(other: AccumulatorV2[DateCarInfos, DateCarInfos]): Unit = { 57 | val v: SelfDateCarInfosScala = other.asInstanceOf[SelfDateCarInfosScala] 58 | returnResult = myAdd(returnResult, v.returnResult) 59 | } 60 | 61 | /** 62 | * 累计器对外返回的最终的结果 63 | */ 64 | override def value: DateCarInfos = returnResult 65 | 66 | /** 67 | * 68 | * @param returnResult 69 | * @param v 70 | * @return 71 | */ 72 | def myAdd(returnResult: DateCarInfos, v: DateCarInfos): DateCarInfos = { 73 | val map: mutable.Map[String, ListBuffer[String]] = v.dateCarInfosMap 74 | map.foreach(mp => { 75 | val key: String = mp._1 76 | val value: ListBuffer[String] = mp._2 77 | if (!returnResult.dateCarInfosMap.contains(key)) { 78 | returnResult.dateCarInfosMap.put(key, value) 79 | } else { 80 | value.foreach(returnResult.dateCarInfosMap.get(key).get.append(_)) 81 | } 82 | 83 | }) 84 | returnResult 85 | } 86 | } 87 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/util/SelfDefineAccumulatorScala.scala: -------------------------------------------------------------------------------- 1 | package com.com.bjsxt.spark.util 2 | 3 | import org.apache.spark.util.AccumulatorV2 4 | 5 | import scala.collection.mutable.ListBuffer 6 | 7 | /** 8 | * 正常卡扣数,正常摄像头数,异常卡扣数,异常摄像头数,异常摄像头的详细信息 9 | */ 10 | case class MonitorStatus(var noramlMonitorCount: Int, var noramlCameraCount: Int, 11 | var abnoramlMonitorCount: Int, var abnoramlCameraCount: Int, 12 | var abnoramlCameraCountInfo: ListBuffer[String]) {} 13 | 14 | class SelfDefineAccumulatorScala extends AccumulatorV2[MonitorStatus, MonitorStatus] { 15 | /** 16 | * 初始化累计器的值,这个值是最后要在merge合并的时候累加到最终结果内 17 | */ 18 | var returnResult = MonitorStatus(0, 0, 0, 0, ListBuffer[String]()) 19 | 20 | /** 21 | * 与reset() 方法中保持一致,返回true。 22 | */ 23 | override def isZero: Boolean = { 24 | returnResult == MonitorStatus(0, 0, 0, 0, ListBuffer[String]()) 25 | } 26 | 27 | /** 28 | * 复制一个新的累加器,在这里就是如果用到了就会复制一个新的累加器。 29 | */ 30 | override def copy(): AccumulatorV2[MonitorStatus, MonitorStatus] = { 31 | val acc: SelfDefineAccumulatorScala = new SelfDefineAccumulatorScala() 32 | acc.returnResult = this.returnResult 33 | acc 34 | } 35 | 36 | /** 37 | * 重置AccumulatorV2中的数据,这里初始化的数据是在RDD每个分区内部,每个分区内的初始值。 38 | */ 39 | override def reset(): Unit = { 40 | returnResult = MonitorStatus(0, 0, 0, 0, ListBuffer[String]()) 41 | // true 42 | } 43 | 44 | /** 45 | * 每个分区累加数据 46 | * 这里是拿着初始的result值和每个分区的数据累加 47 | */ 48 | override def add(v: MonitorStatus): Unit = { 49 | returnResult = myAdd(returnResult, v) 50 | } 51 | 52 | /** 53 | * 分区之间总和累加数据 54 | * 这里拿着初始的result值 和每个分区最终的结果累加 55 | * 56 | */ 57 | override def merge(other: AccumulatorV2[MonitorStatus, MonitorStatus]): Unit = { 58 | val accumulator: SelfDefineAccumulatorScala = other.asInstanceOf[SelfDefineAccumulatorScala] 59 | myAdd(returnResult, accumulator.returnResult) 60 | } 61 | 62 | /** 63 | * 累计器对外返回的最终的结果 64 | */ 65 | override def value: MonitorStatus = returnResult 66 | 67 | /** 68 | * @param returnResult 69 | * @param v 70 | * @return 71 | */ 72 | def myAdd(returnResult: MonitorStatus, v: MonitorStatus): MonitorStatus = { 73 | returnResult.noramlMonitorCount += v.noramlMonitorCount 74 | returnResult.noramlCameraCount += v.noramlCameraCount 75 | returnResult.abnoramlMonitorCount += v.abnoramlMonitorCount 76 | returnResult.abnoramlCameraCount += v.abnoramlCameraCount 77 | returnResult.abnoramlCameraCountInfo.appendAll(v.abnoramlCameraCountInfo) 78 | println(v.abnoramlCameraCountInfo) 79 | returnResult 80 | } 81 | } 82 | -------------------------------------------------------------------------------- /src/main/scala/com/traffic/util/SparkUtilsScala.scala: -------------------------------------------------------------------------------- 1 | package com.com.bjsxt.spark.util 2 | 3 | import com.alibaba.fastjson.JSONObject 4 | import com.traffic.spark.constant.Constants 5 | import com.traffic.spark.util.ParamUtils 6 | import org.apache.spark.api.java.JavaRDD 7 | import org.apache.spark.sql.{DataFrame, Dataset, Row, SparkSession} 8 | 9 | object SparkUtilsScala { 10 | 11 | def getCameraRDDByDateRange(ssc: SparkSession, parms: JSONObject) = { 12 | 13 | val startDate: String = ParamUtils.getParam(parms, Constants.PARAM_START_DATE) 14 | val endDate: String = ParamUtils.getParam(parms, Constants.PARAM_END_DATE) 15 | val sql: String = "SELECT * FROM monitor_flow_action " + "WHERE date>='" + startDate + "' " + "AND date<='" + endDate + "'" 16 | ssc.sql(sql) 17 | } 18 | 19 | def getMonitorRDDByDateRange(ssc: SparkSession) = { 20 | val sql: String = "SELECT * FROM monitor_camera_info" 21 | ssc.sql(sql) 22 | } 23 | def getAreaRDDByDateRange(ssc: SparkSession) = { 24 | val sql: String = "SELECT * FROM area_info" 25 | ssc.sql(sql) 26 | } 27 | } 28 | -------------------------------------------------------------------------------- /src/test/resources/Spark调优.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/Spark调优.docx -------------------------------------------------------------------------------- /src/test/resources/hive/createHiveTab.sql: -------------------------------------------------------------------------------- 1 | set hive.support.sql11.reserved.keywords=false; 2 | 3 | CREATE TABLE IF NOT EXISTS traffic.monitor_flow_action( 4 | date string , 5 | monitor_id string , 6 | camera_id string , 7 | car string , 8 | action_time string , 9 | speed string , 10 | road_id string, 11 | area_id string 12 | ) 13 | ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ; 14 | 15 | load data local inpath '/root/test/monitor_flow_action' into table traffic.monitor_flow_action; 16 | 17 | CREATE TABLE IF NOT EXISTS traffic.monitor_camera_info( 18 | monitor_id string , 19 | camera_id string 20 | ) 21 | ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ; 22 | 23 | load data local inpath '/root/test/monitor_camera_info' into table traffic.monitor_camera_info; 24 | -------------------------------------------------------------------------------- /src/test/resources/hive/提交hive运行的命令.txt: -------------------------------------------------------------------------------- 1 | ./spark-submit 2 | --master spark://node1:7077,node2:7077 3 | --class com.bjsxt.spark.skynet.MonitorFlowAnalyze 4 | --jars ../lib/mysql-connector-java-5.1.6.jar,../lib/fastjson-1.2.11.jar 5 | ../lib/ProduceData2Hive.jar 6 | 1 7 | -------------------------------------------------------------------------------- /src/test/resources/img/卡扣流量转换率.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/img/卡扣流量转换率.jpg -------------------------------------------------------------------------------- /src/test/resources/img/卡扣监控.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/img/卡扣监控.jpg -------------------------------------------------------------------------------- /src/test/resources/img/双重聚合.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/img/双重聚合.jpg -------------------------------------------------------------------------------- /src/test/resources/img/抽取车辆.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/img/抽取车辆.jpg -------------------------------------------------------------------------------- /src/test/resources/img/提高shuffle并行度.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/img/提高shuffle并行度.jpg -------------------------------------------------------------------------------- /src/test/resources/img/数据处理.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/img/数据处理.jpg -------------------------------------------------------------------------------- /src/test/resources/img/数据本地化级别.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/img/数据本地化级别.jpg -------------------------------------------------------------------------------- /src/test/resources/img/数据本地化级别调优.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/img/数据本地化级别调优.jpg -------------------------------------------------------------------------------- /src/test/resources/img/调节堆外内存.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/img/调节堆外内存.jpg -------------------------------------------------------------------------------- /src/test/resources/img/车辆碰撞.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/img/车辆碰撞.jpg -------------------------------------------------------------------------------- /src/test/resources/img/车辆高速通过的卡扣topn.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/img/车辆高速通过的卡扣topn.jpg -------------------------------------------------------------------------------- /src/test/resources/img/过程.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/img/过程.jpg -------------------------------------------------------------------------------- /src/test/resources/img/采样倾斜key并分拆join操作.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/img/采样倾斜key并分拆join操作.jpg -------------------------------------------------------------------------------- /src/test/resources/img/随机抽取车辆.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/img/随机抽取车辆.jpg -------------------------------------------------------------------------------- /src/test/resources/mysql/traffic.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Navicat MySQL Data Transfer 3 | 4 | Source Server : 192.168.179.4 5 | Source Server Version : 50173 6 | Source Host : 192.168.179.4:3306 7 | Source Database : traffic 8 | 9 | Target Server Type : MYSQL 10 | Target Server Version : 50173 11 | File Encoding : 65001 12 | 13 | Date: 2019-07-28 19:38:34 14 | */ 15 | 16 | SET FOREIGN_KEY_CHECKS=0; 17 | 18 | -- ---------------------------- 19 | -- Table structure for area_info 20 | -- ---------------------------- 21 | DROP TABLE IF EXISTS `area_info`; 22 | CREATE TABLE `area_info` ( 23 | `area_id` varchar(255) DEFAULT NULL, 24 | `area_name` varchar(255) DEFAULT NULL 25 | ) ENGINE=MyISAM DEFAULT CHARSET=utf8; 26 | 27 | -- ---------------------------- 28 | -- Records of area_info 29 | -- ---------------------------- 30 | INSERT INTO `area_info` VALUES ('01', '海淀区'); 31 | INSERT INTO `area_info` VALUES ('02', '昌平区'); 32 | INSERT INTO `area_info` VALUES ('03', '朝阳区'); 33 | INSERT INTO `area_info` VALUES ('04', '顺义区'); 34 | INSERT INTO `area_info` VALUES ('05', '西城区'); 35 | INSERT INTO `area_info` VALUES ('06', '东城区'); 36 | INSERT INTO `area_info` VALUES ('07', '大兴区'); 37 | INSERT INTO `area_info` VALUES ('08', '石景山'); 38 | 39 | -- ---------------------------- 40 | -- Table structure for car_track 41 | -- ---------------------------- 42 | DROP TABLE IF EXISTS `car_track`; 43 | CREATE TABLE `car_track` ( 44 | `task_id` varchar(255) DEFAULT NULL, 45 | `date` varchar(255) DEFAULT NULL, 46 | `car` varchar(255) DEFAULT NULL, 47 | `car_track` text 48 | ) ENGINE=MyISAM DEFAULT CHARSET=utf8; 49 | 50 | -- ---------------------------- 51 | -- Records of car_track 52 | -- ---------------------------- 53 | 54 | -- ---------------------------- 55 | -- Table structure for monitor_range_time_car 56 | -- ---------------------------- 57 | DROP TABLE IF EXISTS `monitor_range_time_car`; 58 | CREATE TABLE `monitor_range_time_car` ( 59 | `task_id` varchar(255) DEFAULT NULL, 60 | `monitor_id` varchar(255) DEFAULT NULL, 61 | `range_time` varchar(255) DEFAULT NULL, 62 | `cars` text 63 | ) ENGINE=MyISAM DEFAULT CHARSET=utf8; 64 | 65 | -- ---------------------------- 66 | -- Records of monitor_range_time_car 67 | -- ---------------------------- 68 | 69 | -- ---------------------------- 70 | -- Table structure for monitor_state 71 | -- ---------------------------- 72 | DROP TABLE IF EXISTS `monitor_state`; 73 | CREATE TABLE `monitor_state` ( 74 | `taskId` varchar(255) DEFAULT NULL, 75 | `noraml_monitor_count` varchar(255) DEFAULT NULL, 76 | `normal_camera_count` varchar(255) DEFAULT NULL, 77 | `abnormal_monitor_count` varchar(255) DEFAULT NULL, 78 | `abnormal_camera_count` varchar(255) DEFAULT NULL, 79 | `abnormal_monitor_camera_infos` text 80 | ) ENGINE=MyISAM DEFAULT CHARSET=utf8; 81 | 82 | -- ---------------------------- 83 | -- Records of monitor_state 84 | -- ---------------------------- 85 | 86 | -- ---------------------------- 87 | -- Table structure for random_extract_car 88 | -- ---------------------------- 89 | DROP TABLE IF EXISTS `random_extract_car`; 90 | CREATE TABLE `random_extract_car` ( 91 | `task_id` varchar(255) DEFAULT NULL, 92 | `car_info` varchar(255) DEFAULT NULL, 93 | `date_d` varchar(255) DEFAULT NULL, 94 | `date_hour` varchar(255) DEFAULT NULL 95 | ) ENGINE=MyISAM DEFAULT CHARSET=utf8; 96 | 97 | -- ---------------------------- 98 | -- Records of random_extract_car 99 | -- ---------------------------- 100 | 101 | -- ---------------------------- 102 | -- Table structure for random_extract_car_detail_info 103 | -- ---------------------------- 104 | DROP TABLE IF EXISTS `random_extract_car_detail_info`; 105 | CREATE TABLE `random_extract_car_detail_info` ( 106 | `task_id` varchar(255) DEFAULT NULL, 107 | `date` varchar(255) DEFAULT NULL, 108 | `monitor_id` varchar(255) DEFAULT NULL, 109 | `camera_id` varchar(255) DEFAULT NULL, 110 | `car` varchar(255) DEFAULT NULL, 111 | `action_time` varchar(255) DEFAULT NULL, 112 | `speed` varchar(255) DEFAULT NULL, 113 | `road_id` varchar(255) DEFAULT NULL 114 | ) ENGINE=MyISAM DEFAULT CHARSET=utf8; 115 | 116 | -- ---------------------------- 117 | -- Records of random_extract_car_detail_info 118 | -- ---------------------------- 119 | 120 | -- ---------------------------- 121 | -- Table structure for task 122 | -- ---------------------------- 123 | DROP TABLE IF EXISTS `task`; 124 | CREATE TABLE `task` ( 125 | `task_id` int(11) NOT NULL AUTO_INCREMENT, 126 | `task_name` varchar(255) DEFAULT NULL COMMENT '任务名称', 127 | `create_time` varchar(255) DEFAULT NULL COMMENT '任务创建时间', 128 | `start_time` varchar(255) DEFAULT NULL COMMENT '任务执行时间', 129 | `finish_time` varchar(255) DEFAULT NULL COMMENT '任务结束时间', 130 | `task_type` varchar(255) DEFAULT NULL COMMENT '任务类型 一个模块一个任务类型', 131 | `task_status` varchar(255) DEFAULT NULL COMMENT '任务状态 创建-执行-结束 ', 132 | `task_param` text COMMENT '任务参数 json', 133 | PRIMARY KEY (`task_id`) 134 | ) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8; 135 | 136 | -- ---------------------------- 137 | -- Records of task 138 | -- ---------------------------- 139 | INSERT INTO `task` VALUES ('1', '卡口流量监测', null, '', '', null, null, '{\"startDate\":[\"2019-07-20\"],\"endDate\":[\"2019-07-20\"],\"topNum\":[\"5\"],\"areaName\":[\"海淀区\"]}'); 140 | INSERT INTO `task` VALUES ('2', '随机抽取N个车辆信息', null, null, null, null, null, '{\"startDate\":[\"2019-07-20\"],\"endDate\":[\"2019-07-20\"],\"extractNum\":[\"100\"]}'); 141 | INSERT INTO `task` VALUES ('3', '跟车分析', null, null, null, null, null, '{\"startDate\":[\"2019-07-20\"],\"endDate\":[\"2019-07-20\"],\"cars\":[\"京I42152,京Q18277,京K10100,京R24874,京N63229,京E25462,京W43404,京J13254,鲁G65763,鲁R55733,京L32167,京R54122,京K44557,京W41927,京S90923,京D86196,京W63299,沪N19518,京B47292,京A11951,沪D71306,沪D39243,京G44724,京E05123,京Y03722,京O28098,鲁Y63080,深N55336,京G89927,京Z29402\"]}'); 142 | INSERT INTO `task` VALUES ('4', '各个区域topN的车流量', null, null, null, null, null, '{\"startDate\":[\"2019-07-20\"],\"endDate\":[\"2019-07-20\"]}'); 143 | INSERT INTO `task` VALUES ('5', '道路转化率', null, null, null, null, null, '{\"startDate\":[\"2019-07-20\"],\"endDate\":[\"2019-07-20\"],\"roadFlow\":[\"0001,0002,0003,0004,0005\"]}'); 144 | 145 | -- ---------------------------- 146 | -- Table structure for top10_speed_detail 147 | -- ---------------------------- 148 | DROP TABLE IF EXISTS `top10_speed_detail`; 149 | CREATE TABLE `top10_speed_detail` ( 150 | `task_id` varchar(255) DEFAULT NULL, 151 | `date` varchar(255) DEFAULT NULL, 152 | `monitor_id` varchar(255) DEFAULT NULL, 153 | `camera_id` varchar(255) DEFAULT NULL, 154 | `car` varchar(255) DEFAULT NULL, 155 | `action_time` varchar(255) DEFAULT NULL, 156 | `speed` varchar(255) DEFAULT NULL, 157 | `road_id` varchar(255) DEFAULT NULL 158 | ) ENGINE=MyISAM DEFAULT CHARSET=utf8; 159 | 160 | -- ---------------------------- 161 | -- Records of top10_speed_detail 162 | -- ---------------------------- 163 | 164 | -- ---------------------------- 165 | -- Table structure for topn_monitor_car_count 166 | -- ---------------------------- 167 | DROP TABLE IF EXISTS `topn_monitor_car_count`; 168 | CREATE TABLE `topn_monitor_car_count` ( 169 | `task_id` varchar(11) DEFAULT NULL, 170 | `monitor_id` varchar(11) DEFAULT NULL, 171 | `carCount` int(11) DEFAULT NULL 172 | ) ENGINE=MyISAM DEFAULT CHARSET=utf8; 173 | 174 | -- ---------------------------- 175 | -- Records of topn_monitor_car_count 176 | -- ---------------------------- 177 | 178 | -- ---------------------------- 179 | -- Table structure for topn_monitor_detail_info 180 | -- ---------------------------- 181 | DROP TABLE IF EXISTS `topn_monitor_detail_info`; 182 | CREATE TABLE `topn_monitor_detail_info` ( 183 | `task_id` varchar(255) DEFAULT NULL, 184 | `date` varchar(255) DEFAULT NULL, 185 | `monitor_id` varchar(255) DEFAULT NULL, 186 | `camera_id` varchar(255) DEFAULT NULL, 187 | `car` varchar(255) DEFAULT NULL, 188 | `action_time` varchar(255) DEFAULT NULL, 189 | `speed` varchar(255) DEFAULT NULL, 190 | `road_id` varchar(255) DEFAULT NULL 191 | ) ENGINE=MyISAM DEFAULT CHARSET=utf8; 192 | 193 | -- ---------------------------- 194 | -- Records of topn_monitor_detail_info 195 | -- ---------------------------- 196 | -------------------------------------------------------------------------------- /src/test/resources/任务提交.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/任务提交.jpg -------------------------------------------------------------------------------- /src/test/resources/卡扣监控.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/卡扣监控.jpg -------------------------------------------------------------------------------- /src/test/resources/大数据综合业务平台.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/大数据综合业务平台.pdf -------------------------------------------------------------------------------- /src/test/resources/数据处理.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/数据处理.jpg -------------------------------------------------------------------------------- /src/test/resources/车流量监控项目.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/车流量监控项目.pdf -------------------------------------------------------------------------------- /src/test/resources/车流量监控项目v1.2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leelovejava/TrafficTeach/e9cc83083f69e59d8f452fe54633f58d7cf1d091/src/test/resources/车流量监控项目v1.2.pdf -------------------------------------------------------------------------------- /src/test/resources/项目.txt: -------------------------------------------------------------------------------- 1 | ====数据表==== 2 | monitor_flow_action 车流量监控表 3 | monitor_camera_info 卡扣摄像头基本关系表 4 | 5 | ====数据来源==== 6 | 1.如果任务在本地执行,数据是每次运行模拟 7 | 2.如果任务在集群中运行,数据来源是Hive表 8 | 9 | ====数据模拟==== 10 | 本地模拟 11 | 数据导入到Hive中 12 | 13 | ====项目业务==== 14 | core 15 | 1.卡扣监控 16 | 正常的卡扣数 7 17 | 异常的卡扣数 2 18 | 正常的摄像头个数 1000 19 | 异常的摄像头个数 5 20 | 异常的摄像头详细信息 0001:33333,44444~00005:12814,87463,99123 21 | 22 | monitor_flow_action: 23 | (0006,11111_22222,33333,44444,55555) 24 | monitor_camera_info: 25 | (0006,11111_22222,33333,44444,55555) 26 | 27 | 2.车流量top5的卡扣 28 | 29 | --------------------------------------------------------------------------------