├── README.md ├── docs └── window_watermark.md ├── flink-files ├── pom.xml └── src │ └── main │ └── java │ └── com │ └── jdd │ └── streaming │ └── demos │ └── FilesWordCounter.java ├── flink-kafka-redis ├── pom.xml └── src │ └── main │ ├── java │ └── com │ │ └── jdd │ │ └── streaming │ │ └── demos │ │ ├── KafkaRedisStreamingCount.java │ │ └── sink │ │ ├── RedisCommand.java │ │ ├── RedisConfig.java │ │ ├── RedisPushCommand.java │ │ └── RedisSink.java │ └── resources │ └── log4j.properties ├── flink-kafka ├── pom.xml └── src │ └── main │ ├── java │ └── com │ │ └── jdd │ │ └── streaming │ │ └── demos │ │ ├── KafkaEventSchema.java │ │ └── KafkaStreamingCount.java │ └── resources │ ├── log4j.properties │ └── logback.xml ├── flink-rocketmq ├── pom.xml └── src │ └── main │ └── java │ └── com │ └── jdd │ └── streaming │ └── demo │ └── connectors │ ├── SimpleConsumer.java │ ├── SimpleProducer.java │ ├── SimpleRMQ.java │ ├── common │ ├── RocketMQConfig.java │ └── RocketMQUtils.java │ ├── selector │ ├── DefaultTopicSelector.java │ ├── SimpleTopicSelector.java │ └── TopicSelector.java │ ├── serialization │ ├── KeyValueDeserializationSchema.java │ ├── KeyValueSerializationSchema.java │ ├── SimpleKeyValueDeserializationSchema.java │ └── SimpleKeyValueSerializationSchema.java │ ├── sink │ └── RocketMQSink.java │ └── source │ └── RocketMQSource.java ├── flink-simple-lambda ├── pom.xml └── src │ └── main │ ├── java │ └── com │ │ └── jdd │ │ └── streaming │ │ └── demos │ │ ├── SimpleLambda.java │ │ ├── SimpleWaterMark.java │ │ ├── StringLineEventSource.java │ │ └── UserDefineWaterMark.java │ └── resources │ └── log4j.properties ├── flink-socket ├── pom.xml └── src │ └── main │ └── java │ └── com │ └── jdd │ └── streaming │ └── demos │ └── SocketWindowWordCount.java ├── flink-sql-demos ├── pom.xml └── src │ └── main │ └── java │ └── com │ └── jdd │ └── streaming │ └── demos │ ├── SimpleDataStreamToTable.java │ └── TableDemo.java ├── flink-state-demo ├── pom.xml └── src │ └── main │ └── java │ └── com │ └── jdd │ └── streaming │ └── demos │ ├── ConnectionStreamDemo.java │ ├── SimpleRichFlatMapState.java │ └── SimpleState.java ├── flink-taxi-demos ├── flinktaxidemos.iml ├── pom.xml └── src │ └── main │ ├── java │ └── com │ │ └── jdd │ │ └── streaming │ │ └── demos │ │ ├── DataStreamDemo.java │ │ ├── SimpleESStream.java │ │ ├── SimpleFunctions.java │ │ ├── SimpleSideOutput.java │ │ ├── SimpleTable.java │ │ ├── SimpleWaterMark.java │ │ ├── SocketSideOutput.java │ │ ├── TaxiCheckpointDemo.java │ │ ├── TaxiRideCleansing.java │ │ ├── TaxiRideCount.java │ │ ├── TaxiTableDemo.java │ │ ├── entity │ │ ├── TaxiFare.java │ │ └── TaxiRide.java │ │ ├── sink │ │ ├── RedisCommand.java │ │ ├── RedisConfig.java │ │ ├── RedisPushCommand.java │ │ └── RedisSink.java │ │ ├── source │ │ ├── CheckpointTaxiFareSource.java │ │ ├── CheckpointTaxiRideSource.java │ │ ├── TaxiFareSource.java │ │ └── TaxiRideSource.java │ │ └── utils │ │ └── GeoUtils.java │ └── resources │ └── log4j.properties ├── flink-userdefine-pojo ├── pom.xml └── src │ └── main │ ├── java │ └── com │ │ └── jdd │ │ └── streaming │ │ └── demos │ │ ├── AsyncOperatorDemo.java │ │ ├── FoldDemo.java │ │ ├── KeySelectorDemo.java │ │ ├── TimeCharacteristicDemo.java │ │ └── UserDefinePoJo.java │ └── resources │ └── log4j.properties ├── flink-wikipedia-demo ├── pom.xml └── src │ └── main │ ├── java │ └── com │ │ └── jdd │ │ └── streaming │ │ └── demos │ │ └── WikipediaCount.java │ └── resources │ └── log4j.properties ├── flink-wikis └── flinkwikis.iml ├── pom.xml └── window_watermark.md /README.md: -------------------------------------------------------------------------------- 1 | # flink简单demo应用 2 | ### 一.docker安装kafka 3 | #### 1.下载docker镜像(如果直接下载docker镜像慢 可通过指定国内镜像仓库进行操作) 4 | docker pull wurstmeister/zookeeper 5 | docker pull wurstmeister/kafka 6 | 7 | #### 2.分别运行docker镜像: zookeeper和kafka 8 | 2.1启动zookeeper 9 | docker run -d --name zookeeper --publish 2181:2181 \ 10 | --volume /etc/localtime:/etc/localtime \ 11 | wurstmeister/zookeeper 12 | 13 | 2.2启动kafka 14 | docker run -d --name kafka --publish 9092:9092 \ 15 | --link zookeeper \ 16 | --env KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 \ 17 | --env KAFKA_ADVERTISED_HOST_NAME=localhost \ 18 | --env KAFKA_ADVERTISED_PORT=9092 \ 19 | --volume /etc/localtime:/etc/localtime \ 20 | wurstmeister/kafka 21 | 22 | #### 3.验证docker对应的容器是否启动成功 23 | 3.1 运行 docker ps,找到kafka的 CONTAINER ID, 24 | 3.2 运行 docker exec -it ${CONTAINER ID} /bin/bash,进入kafka容器。 25 | 3.3 进入kafka默认目录 /opt/kafka_2.11-0.10.1.0, 26 | 运行 bin/kafka-topics.sh --create --zookeeper zookeeper:2181 --replication-factor 1 --partitions 1 --topic test, 27 | 创建一个 topic 名称为 test。 28 | 29 | 运行 bin/kafka-topics.sh --list --zookeeper zookeeper:2181 查看当前的 topic 列表。 30 | 31 | 运行一个消息生产者,指定 topic 为刚刚创建的 test , 32 | bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test,输入一些测试消息。 33 | 34 | 运行一个消息消费者,同样指定 topic 为 test, 35 | bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning,可以接收到生产者发送的消息。 36 | 37 | ### 二.docker安装redis 38 | #### 1.下载redis镜像 39 | docker pull registry.docker-cn.com/library/redis 40 | #### 2.启动redis镜像 41 | docker run -d -p 6379:6379 --name myredis registry.docker-cn.com/library/redis 42 | #### 3.查看docker ps 查看运行中的容器 43 | #### 4.连接、查看容器,使用redis镜像执行redis-cli命令连接到刚启动的容器 44 | sudo docker exec -it 6fb1ba029b41 redis-cli 45 | 出现类似: 127.0.0.1:6379> 46 | 47 | ### 三.测试数据集 48 | #### 3.1 数据集地址如下: 49 | wget http://training.ververica.com/trainingData/nycTaxiRides.gz 50 | 51 | wget http://training.ververica.com/trainingData/nycTaxiFares.gz 52 | #### 3.2 数据集字段说明 53 | ``` 54 | =============================Taxi Ride数据集相关字段说明============================= 55 | rideId : Long // a unique id for each ride 一次行程 56 | taxiId : Long // a unique id for each taxi 本次行程使用的出租车 57 | driverId : Long // a unique id for each driver 本次行程的司机 58 | isStart : Boolean // TRUE for ride start events, FALSE for ride end events 行程开始标识 59 | startTime : DateTime // the start time of a ride 行程开始日期 60 | endTime : DateTime // the end time of a ride, 行程结束日期 61 | // "1970-01-01 00:00:00" for start events 62 | startLon : Float // the longitude of the ride start location 行程开始经度 63 | startLat : Float // the latitude of the ride start location 行程开始维度 64 | endLon : Float // the longitude of the ride end location 行程结束经度 65 | endLat : Float // the latitude of the ride end location 行程结束维度 66 | passengerCnt : Short // number of passengers on the ride 本次行程乘客数 67 | 68 | ```` 69 | ``` 70 | =============================TaxiFare数据集相关字段说明============================= 71 | rideId : Long // a unique id for each ride 一次行程 72 | taxiId : Long // a unique id for each taxi 本次行程的出租车 73 | driverId : Long // a unique id for each driver 本次行程的司机 74 | startTime : DateTime // the start time of a ride 行程开始时间 75 | paymentType : String // CSH or CRD 行程付费方式(CSH/CRD) 76 | tip : Float // tip for this ride 本次行程的里程 77 | tolls : Float // tolls for this ride 本次行程缴费 78 | totalFare : Float // total fare collected 本次行程总费用 79 | ``` 80 | 81 | ### 四.完整实例 82 | ``` 83 | // 读取配置参数: 84 | // --file-path /home/wmm/go_bench/flink_sources/nycTaxiRides.gz --output-redis 127.0.0.1 --max-delay 60 --serving-speed 600 85 | final ParameterTool params = ParameterTool.fromArgs(args); 86 | String path = params.get("file-path","/home/wmm/go_bench/flink_sources/nycTaxiRides.gz"); 87 | int maxDeply = params.getInt("max-delay",60); 88 | int servingSpeed = params.getInt("serving-speed",600); 89 | 90 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 91 | env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); 92 | env.getConfig().disableSysoutLogging(); 93 | 94 | // 指定TaxiRide 95 | DataStream rides = env.addSource(new TaxiRideSource(path, maxDeply, servingSpeed)); 96 | 97 | DataStream> tuples = rides.map(new MapFunction>() { 98 | @Override 99 | public Tuple2 map(TaxiRide ride) throws Exception { 100 | return new Tuple2(ride.driverId, 1L); // 基于行程中的司机id划分数据 并进行统计 101 | } 102 | }); 103 | 104 | KeyedStream, Tuple> keyByDriverId = tuples.keyBy(0); // 基于司机id进行数据划分 105 | DataStream> rideCounts = keyByDriverId.sum(1); // 累计每个司机的里程数 106 | 107 | RedisConfig redisConfig = new RedisConfig(); 108 | redisConfig.setHost(params.get("output-redis","127.0.0.1")); 109 | redisConfig.setPort(6379); 110 | redisConfig.setPassword(null); 111 | 112 | // 直接使用匿名类实现redis sink 113 | rideCounts.addSink(new RichSinkFunction>() { // 定义sink 114 | private transient JedisPool jedisPool; 115 | @Override 116 | public void open(Configuration parameters) throws Exception { // 新建redis pool 117 | try { 118 | super.open(parameters); 119 | JedisPoolConfig config = new JedisPoolConfig(); 120 | config.setMaxIdle(redisConfig.getMaxIdle()); 121 | config.setMinIdle(redisConfig.getMinIdle()); 122 | config.setMaxTotal(redisConfig.getMaxTotal()); 123 | jedisPool = new JedisPool(config, redisConfig.getHost(), redisConfig.getPort(), 124 | redisConfig.getConnectionTimeout(), redisConfig.getPassword(), redisConfig.getDatabase()); 125 | } catch (Exception e) { 126 | LOGGER.error("redis sink error {}", e); 127 | } 128 | } 129 | 130 | @Override 131 | public void close() throws Exception { // 关闭redis链接 132 | try { 133 | jedisPool.close(); 134 | } catch (Exception e) { 135 | LOGGER.error("redis sink error {}", e); 136 | } 137 | } 138 | 139 | @Override 140 | public void invoke(Tuple2 val, Context context) throws Exception { // 执行将内容落地redis 141 | Jedis jedis = null; 142 | try { 143 | jedis = jedisPool.getResource(); 144 | jedis.set("taxi:ride:" + val.f0,val.f1.toString()); 145 | } catch (Exception e) { 146 | e.printStackTrace(); 147 | } finally { 148 | if (null != jedis){ 149 | if (jedis != null) { 150 | try { 151 | jedis.close(); 152 | } catch (Exception e) { 153 | e.printStackTrace(); 154 | } 155 | } 156 | } 157 | } 158 | } 159 | }); 160 | //rideCounts.print(); 161 | 162 | JobExecutionResult result = env.execute("Ride Count By DriverID"); 163 | ``` 164 | -------------------------------------------------------------------------------- /docs/window_watermark.md: -------------------------------------------------------------------------------- 1 | ## 一.分类 2 | ### TunbingWindow:滚动窗口 3 | 1.前后两个计算不存在重叠 4 | * 子主题 1 5 | * 子主题 2 6 | ### SlidingWindow:滑动窗口 7 | 1.元素会在多个窗口中存在,存在重叠 8 | * 子主题 1 9 | * 子主题 2 10 | ## 二.方式 11 | ### 基于Time方式 12 | * EventTime: 13 | 1. 每个独立event在其生产设备上产生的时间; 14 | 2.event记录的时间戳在进入flink时已经存在; 15 | 在使用的时候需要提供时间戳提取方法 16 | (实现AssignTimeStampAndWaterMark接口) 17 | 3.使用eventtime时,数据决定了数据进度时间,并不受系统的物理时钟影响; 18 | 4.基于EventTime实现的程序需要指定如何生成TimeStamp和WaterMark这样能够显示event处理进度; 19 | * IngestionTime: 20 | 1.该time记录的是event进入flink的时间;一般是在source操作时每个event记录都能得到当前source的时间,而后续的基于time的操作使用的时间戳即为该时间戳; 21 | 2.IngestTime处于EventTime和ProcessTime间;对比ProcessTime提供稳定的timestamp,成本却有点高;同时在进行每个Window操作时对应的timestamp却是相同的,不同于ProcessTime进行每个Window操作使用的不同时间戳; 22 | 对比EventTime来说面对out-of-order或late data记录时却无能为力.除此之外两者是类似的,由于IngestTime对应的timestamp是自动生成的,则watermark不需要指定; 23 | * ProcessTime: 24 | 1.event在flink中被执行的时间,是基于当前执行机器的物理时钟(会导致不同的机器上ProcessTime存在差异) 25 | 2.执行Window的操作是基于机器物理时钟周期内达到的所有记录的操作; 26 | (比如当应用09:15开始,对应的窗口大小1h,则第一个window[9:15, 10:00],第二个window[10:00,11:00]等等) 27 | 3.ProcessTime相对来说是一个比较简单,同时也不需要streams和machine间的协调的Window时间机制,并能保证最好的吞吐性能又保障了低延迟. 28 | 4.在分布式和异构的环境下,ProcessTime会受event到达系统的影响其确定性会出现不确定性; 29 | ### 基于Count方式 30 | ## 三.应用 31 | ### 类结构 32 | * TimeCharacteristic 33 | * 目前只提供:ProcessingTime/IngestionTime/EventTime三类时间类型 34 | * Window: 35 | 1.窗口Window主要用来将不同event分组到不同的buckets中; 36 | 2.maxTimestamp()用来标记在某一时刻,<=maxTimestamp的记录均会到达对应的Window; 37 | 3.任何实现Window抽象类的子类,需要实现equals()和hashCode()方法来保证逻辑相同的Window得到同样的处理; 38 | 4.每种Window都需要提供的Serialzer实现用于Window类型的序列化 39 | * TimeWindow: 40 | 1.时间类型窗口:具有一个从[start,end)间隔的窗口; 41 | 2.在使用过程中能够产生多个Window 42 | * maxTimestamp=end-1; 43 | 例如当前创建时间10:05,对应的窗口间隔=5min,则窗口的有效间隔[10:05, 10:10);结束点 ≈ 10:09 59:999 44 | * 实现equals:针对相同TimeWindow比较其窗口start和end 45 | * 实现hashCode: 基于start + end将long转为int 46 | * intersects:判断指定Window是否包含在当前窗口内 47 | * cover:根据指定Window和当前窗口生成新的包含两个窗口的新Window 48 | * GlobalWindow: 49 | 1.默认使用的Window,将所有数据放置到一个窗口;对应窗口时间戳不超过Long.MAX_VALUE即可; 50 | 2.在使用过程中只会存在一个GlobalWindow; 51 | * maxTimestamp=Long.MAX_VALUE 52 | * 实现equals:只要属于相同类型即可 53 | * 实现hashCode: return 0; 54 | * Serializer: 55 | 1.主要用于完成对Window序列化 56 | 2.通过继承抽象类TpyeSerializerSingleton 57 | * 接口: TypeSerializer 58 | 1.描述Flink运行时处理数据类型所需的序列化和复制方法。在该接口中的方法被假定为无状态的,因此它实际上是线程安全的。 59 | (有状态的这些方法的实现可能会导致不可预测的副作用,并且会损害程序的稳定性和正确性) 60 | 2.duplicate() 61 | 创建一个serializer的deep copy: 62 | a.若serializer是无状态的 则直接返回this 63 | b.若是serializer是有状态的,则需要创建该serializer的deep-copy 64 | 由于serializer可能会在多个thread中被使用,对应无状态的serializer是线程安全的,而有状态的则是存在非线程安全的风险; 65 | 3.snapshotConfiguration() 66 | 创建serializer当前配置snapshot及其关联的managed state一起存储; 67 | 配置snapshot需要包括serializer的parameter设置以及序列化格式等信息; 68 | 当一个新的serializer注册用来序列化相同的Managed State,配置snapshot需要确保新的Serializer的兼容性,也会存在状态迁移的需要; 69 | 4.ensureCompatibility() 70 | 用于完成不同的Serializer间的兼容性: 71 | a.snapshot配置类型属于ParameterlessTypeSerializerConfig同时当前Serializer标识相同则进行兼容处理 72 | b.当不满足a情况 则需要进行状态迁移 73 | 74 | * 关于TimeWindow的mergeWindows: 75 | 针对TimeWindow定义的窗口集将重叠/交叉部分进行合并,减少Window的数量; 76 | 首先会将所有的Window基于其start字段进行排序,便于Window合并. 77 | a.当前记录的Window包含迭代的Window,则会以当前Window作为key,并将迭代Window放置到Set中 78 | b.当前记录的Window并不包含迭代的Window,重新添加一条新的记录> 79 | 以下是使用伪码 80 | ```java 81 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 82 | // 指定使用eventtime 83 | env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime); 84 | 85 | DataStream stream = env.addSource(new FlinkKafkaConsumer09(topic, schema, props)); 86 | 87 | stream 88 | .keyBy( (event) -> event.getUser() ) 89 | .timeWindow(Time.hours(1)) // 指定窗口:大小=1h,以自然小时为周期 90 | .reduce( (a, b) -> a.add(b) ) 91 | .addSink(...); 92 | ``` 93 | ##四. Watermark 94 | 在Flink中提供了使用Eventtime来衡量event被处理额机制: Watermark.会作为DataStream的一部分进行传输并携带timestamp,比如Watermark(t)声明了达到Window数据的结束时间,换句话说也是没有DataStream中的element对应的timestamp t' <= t 95 | -------------------------------------------------------------------------------- /flink-files/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | flink-demos 7 | com.jdd.streaming 8 | 1.0-SNAPSHOT 9 | 10 | 4.0.0 11 | flink-files 12 | flink-files-streaming 13 | jar 14 | 15 | 16 | 17 | org.apache.flink 18 | flink-streaming-java_2.11 19 | 20 | 21 | 22 | org.apache.flink 23 | flink-clients_${scala.binary.version} 24 | 25 | 26 | 27 | org.slf4j 28 | slf4j-log4j12 29 | compile 30 | 31 | 32 | log4j 33 | log4j 34 | compile 35 | 36 | 37 | 38 | org.apache.flink 39 | flink-test-utils-junit 40 | test 41 | 42 | 43 | 44 | 45 | 46 | 47 | org.apache.maven.plugins 48 | maven-jar-plugin 49 | 50 | 51 | 52 | com.jdd.streaming.demos.FilesWordCounter 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | -------------------------------------------------------------------------------- /flink-files/src/main/java/com/jdd/streaming/demos/FilesWordCounter.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.api.common.functions.FlatMapFunction; 4 | import org.apache.flink.api.java.DataSet; 5 | import org.apache.flink.api.java.ExecutionEnvironment; 6 | import org.apache.flink.api.java.tuple.Tuple2; 7 | import org.apache.flink.api.java.utils.ParameterTool; 8 | import org.apache.flink.util.Collector; 9 | 10 | public class FilesWordCounter { 11 | public static void main(String[] args) throws Exception { 12 | final String filePath; 13 | 14 | final ParameterTool params = ParameterTool.fromArgs(args); 15 | filePath = params.has("path") ? params.get("path") : ""; 16 | 17 | final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); 18 | 19 | DataSet text = env.readTextFile(filePath); 20 | 21 | DataSet> counts = 22 | // split up the lines in pairs (2-tuples) containing: (word,1) 23 | text.flatMap(new FlatMapFunction>() { 24 | public void flatMap(String s, Collector> collector) throws Exception { 25 | // normalize and split the line 26 | String[] tokens = s.toLowerCase().split("\\W+"); 27 | 28 | // emit the pairs 29 | for (String token : tokens) { 30 | if (token.length() > 0) { 31 | collector.collect(new Tuple2(token, 1)); 32 | } 33 | } 34 | } 35 | }) 36 | // group by the tuple field "0" and sum up tuple field "1" 37 | .groupBy(0) 38 | .sum(1); 39 | counts.print(); 40 | } 41 | 42 | public static class WordCounter{ 43 | public String word; 44 | public long count; 45 | 46 | public WordCounter(){} 47 | public WordCounter(String word, long count){ 48 | this.count = count; 49 | this.word = word; 50 | } 51 | 52 | public String toString(){ 53 | return word + " : " + count; 54 | } 55 | } 56 | } 57 | -------------------------------------------------------------------------------- /flink-kafka-redis/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | flink-demos 7 | com.jdd.streaming 8 | 1.0-SNAPSHOT 9 | 10 | 4.0.0 11 | flink-kafka-redis 12 | flink-kafka-redis 13 | jar 14 | 15 | 16 | 17 | org.slf4j 18 | slf4j-log4j12 19 | runtime 20 | 21 | 22 | org.apache.flink 23 | flink-streaming-java_2.11 24 | 25 | 26 | org.apache.flink 27 | flink-connector-kafka-0.11_2.11 28 | 29 | 30 | org.apache.flink 31 | flink-streaming-scala_2.11 32 | 33 | 34 | redis.clients 35 | jedis 36 | 37 | 38 | redis.clients 39 | jedis 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | org.apache.maven.plugins 48 | maven-jar-plugin 49 | 50 | 51 | 52 | com.jdd.streaming.demos.KafkaRedisStreamingCount 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | -------------------------------------------------------------------------------- /flink-kafka-redis/src/main/java/com/jdd/streaming/demos/KafkaRedisStreamingCount.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import com.jdd.streaming.demos.sink.RedisCommand; 4 | import com.jdd.streaming.demos.sink.RedisConfig; 5 | import com.jdd.streaming.demos.sink.RedisPushCommand; 6 | import com.jdd.streaming.demos.sink.RedisSink; 7 | import org.apache.commons.lang3.StringUtils; 8 | import org.apache.flink.api.common.ExecutionConfig; 9 | import org.apache.flink.api.common.functions.FlatMapFunction; 10 | import org.apache.flink.api.common.functions.MapFunction; 11 | import org.apache.flink.api.common.functions.ReduceFunction; 12 | import org.apache.flink.api.common.restartstrategy.RestartStrategies; 13 | import org.apache.flink.api.common.serialization.SimpleStringSchema; 14 | import org.apache.flink.api.java.utils.ParameterTool; 15 | import org.apache.flink.streaming.api.TimeCharacteristic; 16 | import org.apache.flink.streaming.api.datastream.DataStream; 17 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 18 | import org.apache.flink.streaming.api.functions.sink.SinkFunction; 19 | import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer011; 20 | import org.apache.flink.util.Collector; 21 | import org.apache.flink.api.java.tuple.Tuple2; 22 | import org.slf4j.Logger; 23 | import org.slf4j.LoggerFactory; 24 | 25 | import java.security.MessageDigest; 26 | import java.security.NoSuchAlgorithmException; 27 | 28 | 29 | public class KafkaRedisStreamingCount { 30 | /** logger */ 31 | private static final Logger LOGGER = LoggerFactory.getLogger(KafkaRedisStreamingCount.class); 32 | 33 | public static void main(String[] args) throws Exception { 34 | final ParameterTool params = ParameterTool.fromArgs(args); 35 | // --topic 36 | 37 | if (params.getNumberOfParameters() < 5) { 38 | System.out.println("Missed parametes!\n" + 39 | "Usage: Kafka --input-topic \n" + 40 | "--bootstrap.servers \n" + 41 | "--zookeeper.connect --group.id \n" + 42 | "--output-redis \n"); 43 | return; 44 | } 45 | 46 | StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 47 | // 进行全局配置 48 | env.getConfig().disableSysoutLogging(); 49 | env.getConfig().setRestartStrategy(RestartStrategies.fixedDelayRestart(4, 10000)); 50 | env.enableCheckpointing(5000); // create a checkpoint every 5 seconds 51 | env.getConfig().setGlobalJobParameters(params); // make parameters available in the web interface 52 | env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); 53 | 54 | 55 | DataStream datas = 56 | env.addSource(new FlinkKafkaConsumer011( 57 | params.getRequired("input-topic"), 58 | new SimpleStringSchema(), 59 | params.getProperties()) 60 | ).flatMap(new FlatMapFunction>() { 61 | public void flatMap(String value, Collector> out) { 62 | // normalize and split the line 63 | String[] tokens = value.toLowerCase().split("\\W+"); 64 | 65 | // emit the pairs 66 | for (String token : tokens) { 67 | if (token.length() > 0) { 68 | out.collect(new Tuple2(token, 1)); 69 | } 70 | } 71 | } 72 | }) 73 | .keyBy(0) 74 | .sum(1)//.setParallelism(2) 75 | .keyBy(0).reduce(new ReduceFunction>() { 76 | public Tuple2 reduce(Tuple2 t1, Tuple2 t2) throws Exception { 77 | return new Tuple2(t1.f0, (t1.f1 + t2.f1)); 78 | } 79 | }) 80 | .map(new MapFunction, RedisCommand>() { 81 | public RedisCommand map(Tuple2 in) { 82 | LOGGER.info(in.f0, in.f1); 83 | 84 | return new RedisPushCommand(in.f0, new String[]{in.f1.toString()},1000); 85 | } 86 | }); 87 | 88 | RedisConfig redisConfig = new RedisConfig(); 89 | redisConfig.setHost(params.get("output-redis")); 90 | redisConfig.setPort(6379); 91 | redisConfig.setPassword(null); 92 | RedisSink redisSink = new RedisSink(redisConfig); 93 | 94 | if(null != datas){ 95 | datas.print(); 96 | datas.addSink(redisSink); 97 | } 98 | 99 | //datas.addSink(new K) 100 | env.execute("Word count by kafka and redis"); 101 | } 102 | } 103 | -------------------------------------------------------------------------------- /flink-kafka-redis/src/main/java/com/jdd/streaming/demos/sink/RedisCommand.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos.sink; 2 | 3 | /** 4 | * @Auther: dalan 5 | * @Date: 19-3-15 14:39 6 | * @Description: 7 | */ 8 | import redis.clients.jedis.Jedis; 9 | 10 | import java.io.Serializable; 11 | 12 | public abstract class RedisCommand implements Serializable { 13 | String key; 14 | Object value; 15 | int expire; 16 | 17 | public RedisCommand(){} 18 | 19 | public RedisCommand(String key, Object value, int expire) { 20 | this.key = key; 21 | this.value = value; 22 | this.expire = expire; 23 | } 24 | 25 | 26 | public RedisCommand(String key, Object value) { 27 | this.key = key; 28 | this.value = value; 29 | this.expire = -1; 30 | } 31 | 32 | public void execute(Jedis jedis) { 33 | invokeByCommand(jedis); 34 | if (-1 < this.expire) { 35 | jedis.expire(key, expire); 36 | } 37 | } 38 | 39 | public abstract void invokeByCommand(Jedis jedis); 40 | 41 | public String getKey() { 42 | return key; 43 | } 44 | 45 | public void setKey(String key) { 46 | this.key = key; 47 | } 48 | 49 | public Object getValue() { 50 | return value; 51 | } 52 | 53 | public void setValue(Object value) { 54 | this.value = value; 55 | } 56 | 57 | public int getExpire() { 58 | return expire; 59 | } 60 | 61 | public void setExpire(int expire) { 62 | this.expire = expire; 63 | } 64 | } 65 | -------------------------------------------------------------------------------- /flink-kafka-redis/src/main/java/com/jdd/streaming/demos/sink/RedisConfig.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos.sink; 2 | 3 | /** 4 | * @Auther: dalan 5 | * @Date: 19-3-15 14:39 6 | * @Description: 7 | */ 8 | import java.io.Serializable; 9 | 10 | public class RedisConfig implements Serializable { 11 | private static final long serialVersionUID = 1L; 12 | 13 | private String host = "127.0.0.1"; 14 | private int port = 6379; 15 | private int database = 0; 16 | private String password = null; 17 | protected int maxTotal = 8; 18 | protected int maxIdle = 8; 19 | protected int minIdle = 0; 20 | protected int connectionTimeout = 2000; 21 | 22 | public RedisConfig host(String host) { 23 | this.host = (host); 24 | return this; 25 | } 26 | 27 | public RedisConfig port(int port) { 28 | this.port = (port); 29 | return this; 30 | } 31 | 32 | public RedisConfig database(int database) { 33 | this.database = (database); 34 | return this; 35 | } 36 | 37 | public RedisConfig password(String password) { 38 | this.password = (password); 39 | return this; 40 | } 41 | 42 | public RedisConfig maxTotal(int maxTotal) { 43 | this.maxTotal = (maxTotal); 44 | return this; 45 | } 46 | 47 | public RedisConfig maxIdle(int maxIdle) { 48 | this.maxIdle = (maxIdle); 49 | return this; 50 | } 51 | 52 | public RedisConfig minIdle(int minIdle) { 53 | this.minIdle = (minIdle); 54 | return this; 55 | } 56 | 57 | public RedisConfig connectionTimeout(int connectionTimeout) { 58 | this.connectionTimeout = (connectionTimeout); 59 | return this; 60 | } 61 | 62 | public static long getSerialVersionUID() { 63 | return serialVersionUID; 64 | } 65 | 66 | public String getHost() { 67 | return host; 68 | } 69 | 70 | public void setHost(String host) { 71 | this.host = host; 72 | } 73 | 74 | public int getPort() { 75 | return port; 76 | } 77 | 78 | public void setPort(int port) { 79 | this.port = port; 80 | } 81 | 82 | public int getDatabase() { 83 | return database; 84 | } 85 | 86 | public void setDatabase(int database) { 87 | this.database = database; 88 | } 89 | 90 | public String getPassword() { 91 | return password; 92 | } 93 | 94 | public void setPassword(String password) { 95 | this.password = password; 96 | } 97 | 98 | public int getMaxTotal() { 99 | return maxTotal; 100 | } 101 | 102 | public void setMaxTotal(int maxTotal) { 103 | this.maxTotal = maxTotal; 104 | } 105 | 106 | public int getMaxIdle() { 107 | return maxIdle; 108 | } 109 | 110 | public void setMaxIdle(int maxIdle) { 111 | this.maxIdle = maxIdle; 112 | } 113 | 114 | public int getMinIdle() { 115 | return minIdle; 116 | } 117 | 118 | public void setMinIdle(int minIdle) { 119 | this.minIdle = minIdle; 120 | } 121 | 122 | public int getConnectionTimeout() { 123 | return connectionTimeout; 124 | } 125 | 126 | public void setConnectionTimeout(int connectionTimeout) { 127 | this.connectionTimeout = connectionTimeout; 128 | } 129 | } 130 | -------------------------------------------------------------------------------- /flink-kafka-redis/src/main/java/com/jdd/streaming/demos/sink/RedisPushCommand.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos.sink; 2 | 3 | /** 4 | * @Auther: dalan 5 | * @Date: 19-3-15 14:41 6 | * @Description: 7 | */ 8 | import redis.clients.jedis.Jedis; 9 | 10 | /** 11 | * override rpush command 12 | */ 13 | public class RedisPushCommand extends RedisCommand { 14 | public RedisPushCommand(){super();} 15 | 16 | public RedisPushCommand(String key, Object value) { 17 | super(key, value); 18 | } 19 | 20 | public RedisPushCommand(String key, Object value, int expire) { 21 | super(key, value, expire); 22 | } 23 | 24 | @Override 25 | public void invokeByCommand(Jedis jedis) { 26 | jedis.rpush(getKey(), (String[]) getValue()); 27 | } 28 | 29 | 30 | } -------------------------------------------------------------------------------- /flink-kafka-redis/src/main/java/com/jdd/streaming/demos/sink/RedisSink.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos.sink; 2 | 3 | /** 4 | * @Auther: dalan 5 | * @Date: 19-3-15 14:38 6 | * @Description: 7 | */ 8 | import org.apache.flink.configuration.Configuration; 9 | import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; 10 | import org.apache.flink.util.Preconditions; 11 | import org.slf4j.Logger; 12 | import org.slf4j.LoggerFactory; 13 | import redis.clients.jedis.Jedis; 14 | import redis.clients.jedis.JedisPool; 15 | import redis.clients.jedis.JedisPoolConfig; 16 | 17 | /*** 18 | * redis sink 19 | * 20 | * support any operation 21 | * support set expire 22 | */ 23 | public class RedisSink extends RichSinkFunction { 24 | private static final long serialVersionUID = 1L; 25 | 26 | private static final Logger LOG = LoggerFactory.getLogger(RedisSink.class); 27 | 28 | private final RedisConfig redisConfig; 29 | 30 | private transient JedisPool jedisPool; 31 | 32 | public RedisSink(RedisConfig redisConfig) { 33 | this.redisConfig = Preconditions.checkNotNull(redisConfig, "Redis client config should not be null"); 34 | } 35 | 36 | 37 | @Override 38 | public void open(Configuration parameters) throws Exception { 39 | try { 40 | super.open(parameters); 41 | JedisPoolConfig config = new JedisPoolConfig(); 42 | config.setMaxIdle(redisConfig.getMaxIdle()); 43 | config.setMinIdle(redisConfig.getMinIdle()); 44 | config.setMaxTotal(redisConfig.getMaxTotal()); 45 | jedisPool = new JedisPool(config, redisConfig.getHost(), redisConfig.getPort(), 46 | redisConfig.getConnectionTimeout(), redisConfig.getPassword(), redisConfig.getDatabase()); 47 | } catch (Exception e) { 48 | LOG.error("redis sink error {}", e); 49 | } 50 | } 51 | 52 | @Override 53 | public void close() throws Exception { 54 | try { 55 | jedisPool.close(); 56 | } catch (Exception e) { 57 | LOG.error("redis sink error {}", e); 58 | } 59 | } 60 | 61 | 62 | private Jedis getJedis() { 63 | Jedis jedis = jedisPool.getResource(); 64 | return jedis; 65 | } 66 | 67 | public void closeResource(Jedis jedis) { 68 | if (jedis != null) { 69 | try { 70 | jedis.close(); 71 | } catch (Exception e) { 72 | e.printStackTrace(); 73 | } 74 | } 75 | } 76 | 77 | public void invoke(RedisCommand command, Context context) { 78 | Jedis jedis = null; 79 | try { 80 | jedis = getJedis(); 81 | command.execute(jedis); 82 | } catch (Exception e) { 83 | e.printStackTrace(); 84 | } finally { 85 | if (null != jedis) 86 | closeResource(jedis); 87 | } 88 | } 89 | 90 | } 91 | -------------------------------------------------------------------------------- /flink-kafka-redis/src/main/resources/log4j.properties: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | # Licensed to the Apache Software Foundation (ASF) under one 3 | # or more contributor license agreements. See the NOTICE file 4 | # distributed with this work for additional information 5 | # regarding copyright ownership. The ASF licenses this file 6 | # to you under the Apache License, Version 2.0 (the 7 | # "License"); you may not use this file except in compliance 8 | # with the License. You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | ################################################################################ 18 | 19 | log4j.rootLogger=INFO, console 20 | 21 | log4j.appender.console=org.apache.log4j.ConsoleAppender 22 | log4j.appender.console.layout=org.apache.log4j.PatternLayout 23 | log4j.appender.console.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n 24 | -------------------------------------------------------------------------------- /flink-kafka/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | flink-demos 7 | com.jdd.streaming 8 | 1.0-SNAPSHOT 9 | 10 | 4.0.0 11 | flink-kafka 12 | flink-kafka 13 | jar 14 | 15 | 16 | 17 | org.apache.flink 18 | flink-streaming-java_2.11 19 | 20 | 21 | org.apache.flink 22 | flink-clients_2.11 23 | 24 | 25 | org.apache.flink 26 | flink-connector-kafka-0.11_2.11 27 | 28 | 29 | org.slf4j 30 | slf4j-log4j12 31 | 32 | 33 | 34 | 35 | 36 | 37 | org.apache.maven.plugins 38 | maven-jar-plugin 39 | 40 | 41 | 42 | com.jdd.streaming.demos.KafkaStreamingCount 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | -------------------------------------------------------------------------------- /flink-kafka/src/main/java/com/jdd/streaming/demos/KafkaEventSchema.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.api.common.serialization.DeserializationSchema; 4 | import org.apache.flink.api.common.serialization.SerializationSchema; 5 | import org.apache.flink.api.common.typeinfo.TypeInformation; 6 | 7 | import java.io.IOException; 8 | 9 | public class KafkaEventSchema implements DeserializationSchema, SerializationSchema { 10 | 11 | private static final long serialVersionUID = 6154188370181669758L; 12 | 13 | //@Override 14 | public byte[] serialize(KafkaStreamingCount.KafkaEvent event) { 15 | return event.toString().getBytes(); 16 | } 17 | 18 | //@Override 19 | public KafkaStreamingCount.KafkaEvent deserialize(byte[] message) throws IOException { 20 | return KafkaStreamingCount.KafkaEvent.fromString(new String(message)); 21 | } 22 | 23 | //@Override 24 | public boolean isEndOfStream(KafkaStreamingCount.KafkaEvent nextElement) { 25 | return false; 26 | } 27 | 28 | //@Override 29 | public TypeInformation getProducedType() { 30 | return TypeInformation.of(KafkaStreamingCount.KafkaEvent.class); 31 | } 32 | } -------------------------------------------------------------------------------- /flink-kafka/src/main/java/com/jdd/streaming/demos/KafkaStreamingCount.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.api.common.functions.FlatMapFunction; 4 | import org.apache.flink.api.common.functions.ReduceFunction; 5 | import org.apache.flink.api.common.restartstrategy.RestartStrategies; 6 | import org.apache.flink.api.common.serialization.DeserializationSchema; 7 | import org.apache.flink.api.common.serialization.SerializationSchema; 8 | import org.apache.flink.api.common.typeinfo.TypeInformation; 9 | import org.apache.flink.api.java.utils.ParameterTool; 10 | import org.apache.flink.streaming.api.TimeCharacteristic; 11 | import org.apache.flink.streaming.api.datastream.DataStream; 12 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 13 | import org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks; 14 | import org.apache.flink.streaming.api.watermark.Watermark; 15 | import org.apache.flink.streaming.api.windowing.time.Time; 16 | import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer011; 17 | import org.apache.flink.util.Collector; 18 | import org.apache.flink.util.StringUtils; 19 | 20 | import javax.annotation.Nullable; 21 | import java.io.IOException; 22 | 23 | public class KafkaStreamingCount { 24 | public static void main(String[] args) throws Exception { 25 | final ParameterTool params = ParameterTool.fromArgs(args); 26 | if (params.getNumberOfParameters() < 5) { 27 | System.out.println("Missing Paramters!"); 28 | return; 29 | } 30 | 31 | StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 32 | 33 | // env.getConfig().disableSysoutLogging(); 34 | // env.getConfig().setRestartStrategy(RestartStrategies.fixedDelayRestart(4, 10000)); 35 | // env.enableCheckpointing(5000); // create a checkpoint every 5 seconds 36 | // env.getConfig().setGlobalJobParameters(params); // make parameters available in the web interface 37 | // env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); 38 | 39 | DataStream input = env 40 | .addSource(new FlinkKafkaConsumer011( 41 | params.getRequired("input-topic"), 42 | new KafkaEventSchema(), 43 | params.getProperties() 44 | )) 45 | //.assignTimestampsAndWatermarks(new CustomWatermarkExtractor()) 46 | .flatMap(new FlatMapFunction() { 47 | //@Override 48 | public void flatMap(KafkaEvent kafkaEvent, Collector collector) throws Exception { 49 | if (null != kafkaEvent && !StringUtils.isNullOrWhitespaceOnly(kafkaEvent.word)) { 50 | for (String word : kafkaEvent.word.split("\\s")){ 51 | collector.collect(new KafkaEvent(word, 1, System.currentTimeMillis())); 52 | } 53 | } 54 | } 55 | }) 56 | .keyBy("word") 57 | .timeWindow(Time.seconds(5)) 58 | .reduce(new ReduceFunction() { 59 | public KafkaEvent reduce(KafkaEvent t1, KafkaEvent t2) throws Exception { 60 | return new KafkaEvent(t1.word, t1.frequency + t2.frequency); 61 | } 62 | }) 63 | ; 64 | input.print(); 65 | 66 | env.execute("Kafka Word count"); 67 | } 68 | 69 | private static class CustomWatermarkExtractor implements AssignerWithPeriodicWatermarks { 70 | 71 | private static final long serialVersionUID = -742759155861320823L; 72 | 73 | private long currentTimestamp = Long.MIN_VALUE; 74 | 75 | //@Override 76 | public long extractTimestamp(KafkaEvent event, long previousElementTimestamp) { 77 | // the inputs are assumed to be of format (message,timestamp) 78 | this.currentTimestamp = event.getTimestamp(); 79 | return event.getTimestamp(); 80 | } 81 | 82 | @Nullable 83 | //@Override 84 | public Watermark getCurrentWatermark() { 85 | return new Watermark(currentTimestamp == Long.MIN_VALUE ? Long.MIN_VALUE : currentTimestamp - 1); 86 | } 87 | } 88 | 89 | 90 | public static class KafkaEvent{ 91 | private String word; 92 | private int frequency; 93 | private long timestamp; 94 | 95 | public KafkaEvent() {} 96 | 97 | public KafkaEvent(String word, int frequency) { 98 | this.word = word; 99 | this.frequency = frequency; 100 | } 101 | 102 | 103 | public KafkaEvent(String word, int frequency, long timestamp) { 104 | this.word = word; 105 | this.frequency = frequency; 106 | this.timestamp = timestamp; 107 | } 108 | 109 | public String getWord() { 110 | return word; 111 | } 112 | 113 | public void setWord(String word) { 114 | this.word = word; 115 | } 116 | 117 | public int getFrequency() { 118 | return frequency; 119 | } 120 | 121 | public void setFrequency(int frequency) { 122 | this.frequency = frequency; 123 | } 124 | 125 | public long getTimestamp() { 126 | return timestamp; 127 | } 128 | 129 | public void setTimestamp(long timestamp) { 130 | this.timestamp = timestamp; 131 | } 132 | 133 | public static KafkaEvent fromString(String eventStr) { 134 | //String[] split = eventStr.split(","); 135 | //return new KafkaEvent(split[0], Integer.valueOf(split[1]), Long.valueOf(split[2])); 136 | return new KafkaEvent(eventStr,1); 137 | } 138 | 139 | @Override 140 | public String toString() { 141 | return word + "," + frequency + "," + timestamp; 142 | } 143 | } 144 | } 145 | -------------------------------------------------------------------------------- /flink-kafka/src/main/resources/log4j.properties: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | # Licensed to the Apache Software Foundation (ASF) under one 3 | # or more contributor license agreements. See the NOTICE file 4 | # distributed with this work for additional information 5 | # regarding copyright ownership. The ASF licenses this file 6 | # to you under the Apache License, Version 2.0 (the 7 | # "License"); you may not use this file except in compliance 8 | # with the License. You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | ################################################################################ 18 | 19 | log4j.rootLogger=INFO, console 20 | 21 | log4j.appender.console=org.apache.log4j.ConsoleAppender 22 | log4j.appender.console.layout=org.apache.log4j.PatternLayout 23 | log4j.appender.console.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n 24 | -------------------------------------------------------------------------------- /flink-kafka/src/main/resources/logback.xml: -------------------------------------------------------------------------------- 1 | 18 | 19 | 20 | 21 | 22 | %d{HH:mm:ss.SSS} [%thread] %-5level %logger{60} %X{sourceThread} - %msg%n 23 | 24 | 25 | 26 | 27 | 28 | 29 | -------------------------------------------------------------------------------- /flink-rocketmq/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | flink-demos 7 | com.jdd.streaming 8 | 1.0-SNAPSHOT 9 | 10 | 4.0.0 11 | flink-rocketmq 12 | flink-rocketmq 13 | jar 14 | 15 | 16 | 17 | org.apache.flink 18 | flink-streaming-java_2.11 19 | 20 | 21 | org.apache.flink 22 | flink-java 23 | 24 | 25 | org.apache.flink 26 | flink-clients_2.11 27 | 28 | 29 | org.apache.flink 30 | flink-runtime_2.11 31 | 32 | 33 | org.apache.flink 34 | flink-queryable-state 35 | 36 | 37 | org.apache.flink 38 | flink-queryable-state-client-java_2.11 39 | 40 | 41 | 42 | org.apache.rocketmq 43 | rocketmq-client 44 | 45 | 46 | org.apache.rocketmq 47 | rocketmq-common 48 | 49 | 50 | 51 | commons-lang 52 | commons-lang 53 | 54 | 55 | 56 | org.apache.rocketmq 57 | rocketmq-namesrv 58 | test 59 | 60 | 61 | org.apache.rocketmq 62 | rocketmq-broker 63 | test 64 | 65 | 66 | org.slf4j 67 | slf4j-log4j12 68 | 69 | 70 | 71 | 72 | 73 | 74 | org.apache.maven.plugins 75 | maven-compiler-plugin 76 | 77 | 1.8 78 | 1.8 79 | 80 | 81 | 82 | org.apache.maven.plugins 83 | maven-jar-plugin 84 | 85 | 86 | 87 | -------------------------------------------------------------------------------- /flink-rocketmq/src/main/java/com/jdd/streaming/demo/connectors/SimpleConsumer.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demo.connectors; 2 | 3 | import org.apache.commons.compress.utils.ByteUtils; 4 | import org.apache.commons.lang3.StringUtils; 5 | import org.apache.rocketmq.client.consumer.*; 6 | import org.apache.rocketmq.client.consumer.listener.ConsumeConcurrentlyContext; 7 | import org.apache.rocketmq.client.consumer.listener.ConsumeConcurrentlyStatus; 8 | import org.apache.rocketmq.client.consumer.listener.MessageListenerConcurrently; 9 | import org.apache.rocketmq.client.exception.MQBrokerException; 10 | import org.apache.rocketmq.client.exception.MQClientException; 11 | import org.apache.rocketmq.common.message.MessageExt; 12 | import org.apache.rocketmq.common.message.MessageQueue; 13 | import org.apache.rocketmq.common.protocol.heartbeat.MessageModel; 14 | import org.apache.rocketmq.remoting.exception.RemotingException; 15 | 16 | import java.io.UnsupportedEncodingException; 17 | import java.nio.charset.StandardCharsets; 18 | import java.util.List; 19 | 20 | import static com.jdd.streaming.demo.connectors.common.RocketMQUtils.getInteger; 21 | 22 | /** 23 | * @Auther: dalan 24 | * @Date: 19-4-4 09:28 25 | * @Description: 26 | */ 27 | public class SimpleConsumer { 28 | public static void main(String[] args) throws InterruptedException, MQClientException { 29 | //directDefaultConsumer(); 30 | pullServiceConsumer(); 31 | } 32 | 33 | public static void pullServiceConsumer() throws MQClientException { 34 | MQPullConsumerScheduleService pull = null; 35 | final DefaultMQPullConsumer consumer; 36 | 37 | 38 | pull = new MQPullConsumerScheduleService("TestGroup"); 39 | consumer = pull.getDefaultMQPullConsumer(); 40 | 41 | 42 | // Specify name server addresses. 43 | consumer.setNamesrvAddr("localhost:9876"); 44 | consumer.setPollNameServerInterval(30000); 45 | consumer.setHeartbeatBrokerInterval(30000); 46 | consumer.setMessageModel(MessageModel.CLUSTERING); 47 | consumer.setPersistConsumerOffsetInterval(5000); 48 | // Subscribe one more more topics to consume. 49 | //consumer.fetchSubscribeMessageQueues("TopicTest"); 50 | 51 | pull.setPullThreadNums(2); 52 | pull.registerPullTaskCallback("TopicTest1", new PullTaskCallback() { 53 | @Override public void doPullTask(MessageQueue mq, PullTaskContext pullTaskContext) { 54 | long offset = 0; 55 | try { 56 | offset = consumer.fetchConsumeOffset(mq, false); 57 | } catch (MQClientException e) { 58 | e.printStackTrace(); 59 | } 60 | 61 | if (offset < 0) { 62 | return; 63 | } 64 | 65 | try { 66 | PullResult pullResult = consumer.pull(mq, "TagB", offset, 2); 67 | 68 | List messages = pullResult.getMsgFoundList(); 69 | for (MessageExt msg : messages) { 70 | //System.out.println(msg); 71 | 72 | byte[] key = msg.getKeys() != null ? msg.getKeys().getBytes(StandardCharsets.UTF_8) : null; 73 | byte[] value = msg.getBody(); 74 | 75 | System.out.printf("the message keys = %s, and body %s\n", 76 | new String(key,StandardCharsets.UTF_8), 77 | new String(value,StandardCharsets.UTF_8)); 78 | } 79 | 80 | } catch (MQClientException e) { 81 | e.printStackTrace(); 82 | } catch (RemotingException e) { 83 | e.printStackTrace(); 84 | } catch (MQBrokerException e) { 85 | e.printStackTrace(); 86 | } catch (InterruptedException e) { 87 | e.printStackTrace(); 88 | } catch (Exception e){ 89 | e.printStackTrace(); 90 | } 91 | } 92 | }); 93 | 94 | pull.start(); 95 | //consumer.start(); 96 | System.out.printf("Consumer Started.%n"); 97 | } 98 | 99 | public static void directDefaultConsumer() throws MQClientException { 100 | // Instantiate with specified consumer group name. 101 | DefaultMQPushConsumer consumer = new DefaultMQPushConsumer("please_rename_unique_group_name"); 102 | 103 | // Specify name server addresses. 104 | consumer.setNamesrvAddr("localhost:9876"); 105 | consumer.setConsumerGroup("TestGroup"); 106 | 107 | // Subscribe one more more topics to consume. 108 | consumer.subscribe("TopicTest", "*"); 109 | 110 | 111 | // Register callback to execute on arrival of messages fetched from brokers. 112 | consumer.registerMessageListener(new MessageListenerConcurrently() { 113 | @Override 114 | public ConsumeConcurrentlyStatus consumeMessage(List msgs, 115 | ConsumeConcurrentlyContext context) { 116 | for (MessageExt msg : msgs) { 117 | try { 118 | System.out.printf("the message keys = %s, and body %s\n", 119 | new String(msg.getProperties().get("KEYS").getBytes("UTF-8"),"UTF-8"), 120 | new String(msg.getBody(),"UTF-8") 121 | ); 122 | } catch (UnsupportedEncodingException e) { 123 | e.printStackTrace(); 124 | } 125 | } 126 | System.out.printf("%s Receive New Messages: %s %n", Thread.currentThread().getName(), msgs); 127 | return ConsumeConcurrentlyStatus.CONSUME_SUCCESS; 128 | } 129 | }); 130 | 131 | //Launch the consumer instance. 132 | consumer.start(); 133 | 134 | System.out.printf("Consumer Started.%n"); 135 | } 136 | } 137 | -------------------------------------------------------------------------------- /flink-rocketmq/src/main/java/com/jdd/streaming/demo/connectors/SimpleProducer.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demo.connectors; 2 | 3 | /** 4 | * @Auther: dalan 5 | * @Date: 19-4-4 09:25 6 | * @Description: 7 | */ 8 | 9 | import org.apache.rocketmq.client.exception.MQClientException; 10 | import org.apache.rocketmq.client.producer.DefaultMQProducer; 11 | import org.apache.rocketmq.client.producer.SendResult; 12 | import org.apache.rocketmq.common.message.Message; 13 | import org.apache.rocketmq.remoting.common.RemotingHelper; 14 | 15 | /** 16 | * This class demonstrates how to send messages to brokers using provided {@link DefaultMQProducer}. 17 | */ 18 | public class SimpleProducer { 19 | public static void main(String[] args) throws MQClientException, InterruptedException { 20 | 21 | /* 22 | * Instantiate with a producer group name. 23 | */ 24 | DefaultMQProducer producer = new DefaultMQProducer("please_rename_unique_group_name"); 25 | producer.setNamesrvAddr("localhost:9876"); 26 | producer.setProducerGroup("TestGroup"); 27 | 28 | /* 29 | * Specify name server addresses. 30 | *

31 | * 32 | * Alternatively, you may specify name server addresses via exporting environmental variable: NAMESRV_ADDR 33 | *

34 |          * {@code
35 |          * producer.setNamesrvAddr("name-server1-ip:9876;name-server2-ip:9876");
36 |          * }
37 |          * 
38 | */ 39 | 40 | /* 41 | * Launch the instance. 42 | */ 43 | producer.start(); 44 | 45 | for (int i = 0; i < 1000; i++) { 46 | try { 47 | 48 | /* 49 | * Create a message instance, specifying topic, tag and message body. 50 | */ 51 | Message msg = new Message("TopicTest" /* Topic */, 52 | "TagA" /* Tag */, 53 | "keys_" + i, 54 | ("Hello RocketMQ " + i).getBytes(RemotingHelper.DEFAULT_CHARSET) /* Message body */ 55 | ); 56 | //msg.setKeys("keys_" + i); 57 | 58 | /* 59 | * Call send message to deliver message to one of brokers. 60 | */ 61 | SendResult sendResult = producer.send(msg); 62 | 63 | System.out.printf("%s%n", sendResult); 64 | } catch (Exception e) { 65 | e.printStackTrace(); 66 | Thread.sleep(1000); 67 | } 68 | } 69 | 70 | /* 71 | * Shut down once the producer instance is not longer in use. 72 | */ 73 | producer.shutdown(); 74 | } 75 | } 76 | -------------------------------------------------------------------------------- /flink-rocketmq/src/main/java/com/jdd/streaming/demo/connectors/SimpleRMQ.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demo.connectors; 2 | 3 | import com.jdd.streaming.demo.connectors.common.RocketMQConfig; 4 | import com.jdd.streaming.demo.connectors.selector.SimpleTopicSelector; 5 | import com.jdd.streaming.demo.connectors.serialization.KeyValueDeserializationSchema; 6 | import com.jdd.streaming.demo.connectors.serialization.KeyValueSerializationSchema; 7 | import com.jdd.streaming.demo.connectors.serialization.SimpleKeyValueDeserializationSchema; 8 | import com.jdd.streaming.demo.connectors.serialization.SimpleKeyValueSerializationSchema; 9 | import com.jdd.streaming.demo.connectors.sink.RocketMQSink; 10 | import com.jdd.streaming.demo.connectors.source.RocketMQSource; 11 | import io.netty.util.internal.ObjectUtil; 12 | import org.apache.flink.api.common.functions.MapFunction; 13 | import org.apache.flink.api.java.tuple.Tuple; 14 | import org.apache.flink.api.java.tuple.Tuple2; 15 | import org.apache.flink.streaming.api.datastream.DataStream; 16 | import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator; 17 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 18 | import org.slf4j.Logger; 19 | import org.slf4j.LoggerFactory; 20 | 21 | import java.util.Map; 22 | import java.util.Properties; 23 | 24 | /** 25 | * @Auther: dalan 26 | * @Date: 19-4-3 17:15 27 | * @Description: 28 | */ 29 | public class SimpleRMQ { 30 | /** logger */ 31 | private static final Logger LOGGER = LoggerFactory.getLogger(SimpleRMQ.class); 32 | 33 | public static void main(String[] args) throws Exception { 34 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 35 | env.enableCheckpointing(3000); 36 | 37 | final Properties props = new Properties(); 38 | props.setProperty(RocketMQConfig.NAME_SERVER_ADDR,"127.0.0.1:9876"); 39 | props.setProperty(RocketMQConfig.CONSUMER_TOPIC,"TopicTest"); 40 | props.setProperty(RocketMQConfig.CONSUMER_GROUP,"*"); 41 | props.setProperty(RocketMQConfig.CONSUMER_PULL_POOL_SIZE, "2"); 42 | props.setProperty(RocketMQConfig.CONSUMER_TAG,"TagA"); 43 | //props.setProperty(RocketMQConfig.CONSUMER_OFFSET_RESET_TO,"latest"); 44 | 45 | DataStream messages = env.addSource(new RocketMQSource<>(new SimpleKeyValueDeserializationSchema("key","value"), props)); 46 | //messages.print(); 47 | 48 | SingleOutputStreamOperator> streamMessages = messages.map(new MapFunction>() { 49 | @Override public Tuple2 map(Map map) throws Exception { 50 | //System.out.println(" === " + map); 51 | 52 | String key = map.get("key").toString(); 53 | String value = map.get("value").toString(); 54 | 55 | return new Tuple2(key, value); 56 | } 57 | }); 58 | 59 | //streamMessages.print(); 60 | 61 | streamMessages.addSink(new RocketMQSink( 62 | new SimpleKeyValueSerializationSchema(), 63 | new SimpleTopicSelector("TopicTest1","topic","TagB","tag"), 64 | props)); 65 | 66 | env.execute("a simple rocketmq demo"); 67 | } 68 | } 69 | -------------------------------------------------------------------------------- /flink-rocketmq/src/main/java/com/jdd/streaming/demo/connectors/common/RocketMQConfig.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demo.connectors.common; 2 | 3 | import java.util.Properties; 4 | import java.util.UUID; 5 | 6 | 7 | import org.apache.commons.lang3.StringUtils; 8 | import org.apache.commons.lang3.Validate; 9 | import org.apache.rocketmq.client.ClientConfig; 10 | import org.apache.rocketmq.client.consumer.DefaultMQPullConsumer; 11 | import org.apache.rocketmq.client.producer.DefaultMQProducer; 12 | import org.apache.rocketmq.common.protocol.heartbeat.MessageModel; 13 | 14 | import static com.jdd.streaming.demo.connectors.common.RocketMQUtils.getInteger; 15 | 16 | 17 | /** 18 | * @Auther: dalan 19 | * @Date: 19-4-2 18:43 20 | * @Description: 21 | */ 22 | public class RocketMQConfig { 23 | // common 24 | public static final String NAME_SERVER_ADDR = "nameserver.address"; // Required 25 | 26 | public static final String NAME_SERVER_POLL_INTERVAL = "nameserver.poll.interval"; 27 | public static final int DEFAULT_NAME_SERVER_POLL_INTERVAL = 30000; // 30 seconds 28 | 29 | public static final String BROKER_HEART_BEAT_INTERVAL = "brokerserver.heartbeat.interval"; 30 | public static final int DEFAULT_BROKER_HEART_BEAT_INTERVAL = 30000; // 30 seconds 31 | 32 | 33 | // producer 34 | public static final String PRODUCER_GROUP = "producer.group"; 35 | 36 | public static final String PRODUCER_RETRY_TIMES = "producer.retry.times"; 37 | public static final int DEFAULT_PRODUCER_RETRY_TIMES = 3; 38 | 39 | public static final String PRODUCER_TIMEOUT = "producer.timeout"; 40 | public static final int DEFAULT_PRODUCER_TIMEOUT = 3000; // 3 seconds 41 | 42 | 43 | // consumer 44 | public static final String CONSUMER_GROUP = "consumer.group"; // Required 45 | 46 | public static final String CONSUMER_TOPIC = "consumer.topic"; // Required 47 | 48 | public static final String CONSUMER_TAG = "consumer.tag"; 49 | public static final String DEFAULT_CONSUMER_TAG = "*"; 50 | 51 | public static final String CONSUMER_OFFSET_RESET_TO = "consumer.offset.reset.to"; 52 | public static final String CONSUMER_OFFSET_LATEST = "latest"; 53 | public static final String CONSUMER_OFFSET_EARLIEST = "earliest"; 54 | public static final String CONSUMER_OFFSET_TIMESTAMP = "timestamp"; 55 | public static final String CONSUMER_OFFSET_FROM_TIMESTAMP = "consumer.offset.from.timestamp"; 56 | 57 | public static final String CONSUMER_OFFSET_PERSIST_INTERVAL = "consumer.offset.persist.interval"; 58 | public static final int DEFAULT_CONSUMER_OFFSET_PERSIST_INTERVAL = 5000; // 5 seconds 59 | 60 | public static final String CONSUMER_PULL_POOL_SIZE = "consumer.pull.thread.pool.size"; 61 | public static final int DEFAULT_CONSUMER_PULL_POOL_SIZE = 20; 62 | 63 | public static final String CONSUMER_BATCH_SIZE = "consumer.batch.size"; 64 | public static final int DEFAULT_CONSUMER_BATCH_SIZE = 32; 65 | 66 | public static final String CONSUMER_DELAY_WHEN_MESSAGE_NOT_FOUND = "consumer.delay.when.message.not.found"; 67 | public static final int DEFAULT_CONSUMER_DELAY_WHEN_MESSAGE_NOT_FOUND = 10; 68 | 69 | /** 70 | * Build Producer Configs. 71 | * @param props Properties 72 | * @param producer DefaultMQProducer 73 | */ 74 | public static void buildProducerConfigs(Properties props, DefaultMQProducer producer) { 75 | buildCommonConfigs(props, producer); 76 | 77 | String group = props.getProperty(PRODUCER_GROUP); 78 | if (StringUtils.isEmpty(group)) { 79 | group = UUID.randomUUID().toString(); 80 | } 81 | producer.setProducerGroup(props.getProperty(PRODUCER_GROUP, group)); 82 | 83 | producer.setRetryTimesWhenSendFailed(getInteger(props, 84 | PRODUCER_RETRY_TIMES, DEFAULT_PRODUCER_RETRY_TIMES)); 85 | producer.setRetryTimesWhenSendAsyncFailed(getInteger(props, 86 | PRODUCER_RETRY_TIMES, DEFAULT_PRODUCER_RETRY_TIMES)); 87 | producer.setSendMsgTimeout(getInteger(props, 88 | PRODUCER_TIMEOUT, DEFAULT_PRODUCER_TIMEOUT)); 89 | } 90 | 91 | /** 92 | * Build Consumer Configs. 93 | * @param props Properties 94 | * @param consumer DefaultMQPushConsumer 95 | */ 96 | public static void buildConsumerConfigs(Properties props, DefaultMQPullConsumer consumer) { 97 | buildCommonConfigs(props, consumer); 98 | 99 | consumer.setMessageModel(MessageModel.CLUSTERING); 100 | 101 | consumer.setPersistConsumerOffsetInterval(getInteger(props, 102 | CONSUMER_OFFSET_PERSIST_INTERVAL, DEFAULT_CONSUMER_OFFSET_PERSIST_INTERVAL)); 103 | } 104 | 105 | /** 106 | * Build Common Configs. 107 | * @param props Properties 108 | * @param client ClientConfig 109 | */ 110 | public static void buildCommonConfigs(Properties props, ClientConfig client) { 111 | String nameServers = props.getProperty(NAME_SERVER_ADDR); 112 | Validate.notEmpty(nameServers); 113 | client.setNamesrvAddr(nameServers); 114 | 115 | client.setPollNameServerInterval(getInteger(props, 116 | NAME_SERVER_POLL_INTERVAL, DEFAULT_NAME_SERVER_POLL_INTERVAL)); 117 | client.setHeartbeatBrokerInterval(getInteger(props, 118 | BROKER_HEART_BEAT_INTERVAL, DEFAULT_BROKER_HEART_BEAT_INTERVAL)); 119 | } 120 | } -------------------------------------------------------------------------------- /flink-rocketmq/src/main/java/com/jdd/streaming/demo/connectors/common/RocketMQUtils.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demo.connectors.common; 2 | 3 | import java.io.Serializable; 4 | import java.util.Properties; 5 | 6 | /** 7 | * @Auther: dalan 8 | * @Date: 19-4-2 19:00 9 | * @Description: 10 | */ 11 | public class RocketMQUtils { 12 | public static int getInteger(Properties props, String key, int defaultValue) { 13 | return Integer.parseInt(props.getProperty(key, String.valueOf(defaultValue))); 14 | } 15 | 16 | public static long getLong(Properties props, String key, long defaultValue) { 17 | return Long.parseLong(props.getProperty(key, String.valueOf(defaultValue))); 18 | } 19 | 20 | public static boolean getBoolean(Properties props, String key, boolean defaultValue) { 21 | return Boolean.parseBoolean(props.getProperty(key, String.valueOf(defaultValue))); 22 | } 23 | 24 | // rocketmq running check 25 | public static class RunningChecker implements Serializable { 26 | private volatile boolean isRunning = false; 27 | 28 | public boolean isRunning() { 29 | return isRunning; 30 | } 31 | 32 | public void setRunning(boolean running) { 33 | isRunning = running; 34 | } 35 | } 36 | } 37 | 38 | 39 | -------------------------------------------------------------------------------- /flink-rocketmq/src/main/java/com/jdd/streaming/demo/connectors/selector/DefaultTopicSelector.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demo.connectors.selector; 2 | 3 | import java.util.Map; 4 | 5 | /** 6 | * @Auther: dalan 7 | * @Date: 19-4-2 19:35 8 | * @Description: 9 | */ 10 | public class DefaultTopicSelector implements TopicSelector { 11 | private final String topicName; 12 | private final String tagName; 13 | 14 | public DefaultTopicSelector(final String topicName, final String tagName) { 15 | this.topicName = topicName; 16 | this.tagName = tagName; 17 | } 18 | 19 | public DefaultTopicSelector(final String topicName) { 20 | this(topicName, ""); 21 | } 22 | 23 | @Override 24 | public String getTopic(T tuple) { 25 | return topicName; 26 | } 27 | 28 | @Override 29 | public String getTag(T tuple) { 30 | return tagName; 31 | } 32 | } 33 | -------------------------------------------------------------------------------- /flink-rocketmq/src/main/java/com/jdd/streaming/demo/connectors/selector/SimpleTopicSelector.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demo.connectors.selector; 2 | 3 | import org.slf4j.Logger; 4 | import org.slf4j.LoggerFactory; 5 | 6 | import java.util.Map; 7 | 8 | /** 9 | * @Auther: dalan 10 | * @Date: 19-4-2 19:38 11 | * @Description: 12 | */ 13 | public class SimpleTopicSelector implements TopicSelector { 14 | private static final Logger LOG = LoggerFactory.getLogger(SimpleTopicSelector.class); 15 | 16 | private final String topicFieldName; 17 | private final String defaultTopicName; 18 | 19 | private final String tagFieldName; 20 | private final String defaultTagName; 21 | 22 | /** 23 | * SimpleTopicSelector Constructor. 24 | * @param topicFieldName field name used for selecting topic 25 | * @param defaultTopicName default field name used for selecting topic 26 | * @param tagFieldName field name used for selecting tag 27 | * @param defaultTagName default field name used for selecting tag 28 | */ 29 | public SimpleTopicSelector(String topicFieldName, String defaultTopicName, String tagFieldName, String defaultTagName) { 30 | this.topicFieldName = topicFieldName; 31 | this.defaultTopicName = defaultTopicName; 32 | this.tagFieldName = tagFieldName; 33 | this.defaultTagName = defaultTagName; 34 | } 35 | 36 | @Override 37 | public String getTopic(Map tuple) { 38 | if (tuple.containsKey(topicFieldName)) { 39 | Object topic = tuple.get(topicFieldName); 40 | return topic != null ? topic.toString() : defaultTopicName; 41 | } else { 42 | LOG.warn("Field {} Not Found. Returning default topic {}", topicFieldName, defaultTopicName); 43 | return defaultTopicName; 44 | } 45 | } 46 | 47 | @Override 48 | public String getTag(Map tuple) { 49 | if (tuple.containsKey(tagFieldName)) { 50 | Object tag = tuple.get(tagFieldName); 51 | return tag != null ? tag.toString() : defaultTagName; 52 | } else { 53 | LOG.warn("Field {} Not Found. Returning default tag {}", tagFieldName, defaultTagName); 54 | return defaultTagName; 55 | } 56 | } 57 | } -------------------------------------------------------------------------------- /flink-rocketmq/src/main/java/com/jdd/streaming/demo/connectors/selector/TopicSelector.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demo.connectors.selector; 2 | 3 | import java.io.Serializable; 4 | 5 | /** 6 | * @Auther: dalan 7 | * @Date: 19-4-2 19:34 8 | * @Description: 9 | */ 10 | public interface TopicSelector extends Serializable { 11 | 12 | String getTopic(T tuple); 13 | 14 | String getTag(T tuple); 15 | } 16 | -------------------------------------------------------------------------------- /flink-rocketmq/src/main/java/com/jdd/streaming/demo/connectors/serialization/KeyValueDeserializationSchema.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demo.connectors.serialization; 2 | 3 | import org.apache.flink.api.java.typeutils.ResultTypeQueryable; 4 | 5 | import java.io.Serializable; 6 | 7 | /** 8 | * @Auther: dalan 9 | * @Date: 19-4-2 19:09 10 | * @Description: 11 | */ 12 | public interface KeyValueDeserializationSchema extends ResultTypeQueryable, Serializable { 13 | T deserializeKeyAndValue(byte[] key, byte[] vlaue); // 反序列化 14 | } 15 | -------------------------------------------------------------------------------- /flink-rocketmq/src/main/java/com/jdd/streaming/demo/connectors/serialization/KeyValueSerializationSchema.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demo.connectors.serialization; 2 | 3 | import java.io.Serializable; 4 | 5 | /** 6 | * @Auther: dalan 7 | * @Date: 19-4-2 19:09 8 | * @Description: 9 | */ 10 | public interface KeyValueSerializationSchema extends Serializable { 11 | byte[] serializeKey(T tuple); // 序列化key 12 | byte[] serializeValue(T tuple); // 序列化value 13 | } 14 | -------------------------------------------------------------------------------- /flink-rocketmq/src/main/java/com/jdd/streaming/demo/connectors/serialization/SimpleKeyValueDeserializationSchema.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demo.connectors.serialization; 2 | 3 | import org.apache.flink.api.common.typeinfo.TypeInformation; 4 | 5 | import java.nio.charset.StandardCharsets; 6 | import java.util.HashMap; 7 | import java.util.Map; 8 | 9 | /** 10 | * @Auther: dalan 11 | * @Date: 19-4-2 19:14 12 | * @Description: 13 | */ 14 | public class SimpleKeyValueDeserializationSchema implements KeyValueDeserializationSchema { 15 | public static final String DEFAULT_KEY_FIELD = "key"; 16 | public static final String DEFAULT_VALUE_FIELD = "value"; 17 | 18 | public String keyField; 19 | public String valueField; 20 | 21 | public SimpleKeyValueDeserializationSchema() { 22 | this(DEFAULT_KEY_FIELD, DEFAULT_VALUE_FIELD); 23 | } 24 | 25 | /** 26 | * SimpleKeyValueDeserializationSchema Constructor. 27 | * @param keyField tuple field for selecting the key 28 | * @param valueField tuple field for selecting the value 29 | */ 30 | public SimpleKeyValueDeserializationSchema(String keyField, String valueField) { 31 | this.keyField = keyField; 32 | this.valueField = valueField; 33 | } 34 | 35 | @Override 36 | public Map deserializeKeyAndValue(byte[] key, byte[] value) { 37 | HashMap map = new HashMap(2); 38 | if (keyField != null) { 39 | String k = key != null ? new String(key, StandardCharsets.UTF_8) : null; 40 | map.put(keyField, k); 41 | } 42 | if (valueField != null) { 43 | String v = value != null ? new String(value, StandardCharsets.UTF_8) : null; 44 | map.put(valueField, v); 45 | } 46 | return map; 47 | } 48 | 49 | @Override 50 | public TypeInformation getProducedType() { 51 | return TypeInformation.of(Map.class); 52 | } 53 | } -------------------------------------------------------------------------------- /flink-rocketmq/src/main/java/com/jdd/streaming/demo/connectors/serialization/SimpleKeyValueSerializationSchema.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demo.connectors.serialization; 2 | 3 | import java.nio.charset.StandardCharsets; 4 | import java.util.Map; 5 | 6 | /** 7 | * @Auther: dalan 8 | * @Date: 19-4-2 19:23 9 | * @Description: rocketmq的内容采用map存储:Map<,> 10 | */ 11 | public class SimpleKeyValueSerializationSchema implements KeyValueSerializationSchema { 12 | public static final String DEFAULT_KEY_FIELD = "key"; 13 | public static final String DEFAULT_VALUE_FIELD = "value"; 14 | 15 | public String keyField; 16 | public String valueField; 17 | 18 | public SimpleKeyValueSerializationSchema() { 19 | this(DEFAULT_KEY_FIELD, DEFAULT_VALUE_FIELD); 20 | } 21 | 22 | /** 23 | * SimpleKeyValueSerializationSchema Constructor. 24 | * @param keyField tuple field for selecting the key 25 | * @param valueField tuple field for selecting the value 26 | */ 27 | public SimpleKeyValueSerializationSchema(String keyField, String valueField) { 28 | this.keyField = keyField; 29 | this.valueField = valueField; 30 | } 31 | 32 | @Override 33 | public byte[] serializeKey(Map tuple) { 34 | if (tuple == null || keyField == null) { 35 | return null; 36 | } 37 | Object key = tuple.get(keyField); 38 | return key != null ? key.toString().getBytes(StandardCharsets.UTF_8) : null; 39 | } 40 | 41 | @Override 42 | public byte[] serializeValue(Map tuple) { 43 | if (tuple == null || valueField == null) { 44 | return null; 45 | } 46 | Object value = tuple.get(valueField); 47 | return value != null ? value.toString().getBytes(StandardCharsets.UTF_8) : null; 48 | } 49 | 50 | } -------------------------------------------------------------------------------- /flink-rocketmq/src/main/java/com/jdd/streaming/demo/connectors/sink/RocketMQSink.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demo.connectors.sink; 2 | 3 | import com.jdd.streaming.demo.connectors.common.RocketMQConfig; 4 | import com.jdd.streaming.demo.connectors.selector.TopicSelector; 5 | import com.jdd.streaming.demo.connectors.serialization.KeyValueSerializationSchema; 6 | import org.apache.commons.lang3.Validate; 7 | import org.apache.flink.configuration.Configuration; 8 | import org.apache.flink.runtime.state.FunctionInitializationContext; 9 | import org.apache.flink.runtime.state.FunctionSnapshotContext; 10 | import org.apache.flink.streaming.api.checkpoint.CheckpointedFunction; 11 | import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; 12 | import org.apache.flink.streaming.api.operators.StreamingRuntimeContext; 13 | import org.apache.rocketmq.client.exception.MQClientException; 14 | import org.apache.rocketmq.client.producer.DefaultMQProducer; 15 | import org.apache.rocketmq.client.producer.SendCallback; 16 | import org.apache.rocketmq.client.producer.SendResult; 17 | import org.apache.rocketmq.common.message.Message; 18 | import org.slf4j.Logger; 19 | import org.slf4j.LoggerFactory; 20 | 21 | import java.nio.charset.StandardCharsets; 22 | import java.util.LinkedList; 23 | import java.util.List; 24 | import java.util.Properties; 25 | 26 | /** 27 | * @Auther: dalan 28 | * @Date: 19-4-3 17:01 29 | * @Description: 30 | */ 31 | public class RocketMQSink extends RichSinkFunction implements CheckpointedFunction { 32 | 33 | private static final long serialVersionUID = 1L; 34 | 35 | private static final Logger LOG = LoggerFactory.getLogger(RocketMQSink.class); 36 | 37 | private transient DefaultMQProducer producer; 38 | private boolean async; // false by default 39 | 40 | private Properties props; 41 | private TopicSelector topicSelector; 42 | private KeyValueSerializationSchema serializationSchema; 43 | 44 | private boolean batchFlushOnCheckpoint; // false by default 45 | private int batchSize = 1000; 46 | private List batchList; 47 | 48 | public RocketMQSink(KeyValueSerializationSchema schema, TopicSelector topicSelector, Properties props) { 49 | this.serializationSchema = schema; 50 | this.topicSelector = topicSelector; 51 | this.props = props; 52 | } 53 | 54 | @Override 55 | public void open(Configuration parameters) throws Exception { 56 | Validate.notEmpty(props, "Producer properties can not be empty"); 57 | Validate.notNull(topicSelector, "TopicSelector can not be null"); 58 | Validate.notNull(serializationSchema, "KeyValueSerializationSchema can not be null"); 59 | 60 | producer = new DefaultMQProducer(); 61 | producer.setInstanceName(String.valueOf(getRuntimeContext().getIndexOfThisSubtask())); 62 | RocketMQConfig.buildProducerConfigs(props, producer); 63 | 64 | batchList = new LinkedList<>(); 65 | 66 | if (batchFlushOnCheckpoint && !((StreamingRuntimeContext) getRuntimeContext()).isCheckpointingEnabled()) { 67 | LOG.warn("Flushing on checkpoint is enabled, but checkpointing is not enabled. Disabling flushing."); 68 | batchFlushOnCheckpoint = false; 69 | } 70 | 71 | try { 72 | producer.start(); 73 | } catch (MQClientException e) { 74 | throw new RuntimeException(e); 75 | } 76 | } 77 | 78 | @Override 79 | public void invoke(IN input, Context context) throws Exception { 80 | Message msg = prepareMessage(input); 81 | 82 | if (batchFlushOnCheckpoint) { 83 | batchList.add(msg); 84 | if (batchList.size() >= batchSize) { 85 | flushSync(); 86 | } 87 | return; 88 | } 89 | 90 | if (async) { 91 | // async sending 92 | try { 93 | producer.send(msg, new SendCallback() { 94 | @Override 95 | public void onSuccess(SendResult sendResult) { 96 | LOG.debug("Async send message success! result: {}", sendResult); 97 | } 98 | 99 | @Override 100 | public void onException(Throwable throwable) { 101 | if (throwable != null) { 102 | LOG.error("Async send message failure!", throwable); 103 | } 104 | } 105 | }); 106 | } catch (Exception e) { 107 | LOG.error("Async send message failure!", e); 108 | } 109 | } else { 110 | // sync sending, will return a SendResult 111 | try { 112 | SendResult result = producer.send(msg); 113 | LOG.debug("Sync send message result: {}", result); 114 | } catch (Exception e) { 115 | LOG.error("Sync send message failure!", e); 116 | } 117 | } 118 | } 119 | 120 | // Mapping: from storm tuple -> rocketmq Message 121 | private Message prepareMessage(IN input) { 122 | String topic = topicSelector.getTopic(input); 123 | String tag = topicSelector.getTag(input) != null ? topicSelector.getTag(input) : ""; 124 | 125 | byte[] k = serializationSchema.serializeKey(input); 126 | String key = k != null ? new String(k, StandardCharsets.UTF_8) : ""; 127 | byte[] value = serializationSchema.serializeValue(input); 128 | 129 | Validate.notNull(topic, "the message topic is null"); 130 | Validate.notNull(value, "the message body is null"); 131 | 132 | Message msg = new Message(topic, tag, key, value); 133 | return msg; 134 | } 135 | 136 | public RocketMQSink withAsync(boolean async) { 137 | this.async = async; 138 | return this; 139 | } 140 | 141 | public RocketMQSink withBatchFlushOnCheckpoint(boolean batchFlushOnCheckpoint) { 142 | this.batchFlushOnCheckpoint = batchFlushOnCheckpoint; 143 | return this; 144 | } 145 | 146 | public RocketMQSink withBatchSize(int batchSize) { 147 | this.batchSize = batchSize; 148 | return this; 149 | } 150 | 151 | @Override 152 | public void close() throws Exception { 153 | if (producer != null) { 154 | flushSync(); 155 | producer.shutdown(); 156 | } 157 | } 158 | 159 | private void flushSync() throws Exception { 160 | if (batchFlushOnCheckpoint) { 161 | synchronized (batchList) { 162 | if (batchList.size() > 0) { 163 | producer.send(batchList); 164 | batchList.clear(); 165 | } 166 | } 167 | } 168 | } 169 | 170 | @Override 171 | public void snapshotState(FunctionSnapshotContext context) throws Exception { 172 | flushSync(); 173 | } 174 | 175 | @Override 176 | public void initializeState(FunctionInitializationContext context) throws Exception { 177 | // nothing to do 178 | } 179 | } -------------------------------------------------------------------------------- /flink-simple-lambda/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | flink-demos 7 | com.jdd.streaming 8 | 1.0-SNAPSHOT 9 | 10 | 4.0.0 11 | flink-simple-lambda 12 | flink-lambda 13 | jar 14 | 15 | 16 | 17 | org.apache.flink 18 | flink-clients_2.11 19 | 20 | 21 | org.apache.flink 22 | flink-streaming-java_2.11 23 | 24 | 25 | org.apache.flink 26 | flink-connector-kafka-0.11_2.11 27 | 28 | 29 | org.slf4j 30 | slf4j-log4j12 31 | 32 | 33 | 34 | 35 | 36 | 37 | org.apache.maven.plugins 38 | maven-compiler-plugin 39 | 40 | 1.8 41 | 1.8 42 | 43 | 44 | 45 | org.apache.maven.plugins 46 | maven-jar-plugin 47 | 48 | 49 | 50 | com.jdd.streaming.demos.SimpleLambda 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | -------------------------------------------------------------------------------- /flink-simple-lambda/src/main/java/com/jdd/streaming/demos/SimpleLambda.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.api.common.functions.FlatMapFunction; 4 | import org.apache.flink.api.common.typeinfo.Types; 5 | import org.apache.flink.api.java.tuple.Tuple2; 6 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 7 | import org.apache.flink.util.Collector; 8 | import org.slf4j.Logger; 9 | import org.slf4j.LoggerFactory; 10 | import java.util.ArrayList; 11 | import java.util.Arrays; 12 | 13 | /** 14 | * @Auther: dalan 15 | * @Date: 19-3-21 09:54 16 | * @Description: 17 | */ 18 | public class SimpleLambda { 19 | /** logger */ 20 | private static final Logger LOGGER = LoggerFactory.getLogger(SimpleLambda.class); 21 | // main 22 | public static void main(String[] args) throws Exception { 23 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 24 | 25 | env.fromCollection(Arrays.asList(1,2,3,4,5,6)) 26 | .map(i -> Tuple2.of(i, i)) 27 | .returns(Types.TUPLE(Types.INT, Types.INT)) // 当转换操作存在泛型时 可以通过指定TypeInformation来处理类型擦除带来的问题 28 | .print(); 29 | 30 | /** 31 | .flatMap(new FlatMapFunction() { // 在flink使用带有泛型的类型存在类型擦除 需要明确指定对应的泛型具体类型 32 | @Override 33 | public void flatMap(Integer in, Collector out) throws Exception { 34 | out.collect(in + in); 35 | } 36 | }) 37 | .map(i -> i * i) // lambda的使用 38 | .print(); 39 | */ 40 | 41 | env.execute("a simple lambda demo"); 42 | } 43 | } 44 | -------------------------------------------------------------------------------- /flink-simple-lambda/src/main/java/com/jdd/streaming/demos/StringLineEventSource.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.commons.lang3.StringUtils; 4 | import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction; 5 | import org.slf4j.Logger; 6 | import org.slf4j.LoggerFactory; 7 | 8 | import java.time.Instant; 9 | import java.util.*; 10 | import java.util.concurrent.TimeUnit; 11 | 12 | /** 13 | * @Auther: dalan 14 | * @Date: 19-3-21 16:44 15 | * @Description: 16 | */ 17 | public class StringLineEventSource extends RichParallelSourceFunction { 18 | /** logger */ 19 | private static final Logger LOGGER = LoggerFactory.getLogger(StringLineEventSource.class); 20 | 21 | public Long latenessMills; 22 | private volatile boolean running = true; 23 | 24 | public StringLineEventSource(){super();} 25 | public StringLineEventSource(Long latenessMills){super(); this.latenessMills = latenessMills;} 26 | 27 | private List channelSet = Arrays.asList("a", "b", "c", "d"); 28 | private List behaviorTypes = Arrays.asList("INSTALL", "OPEN", 29 | "BROWSE", "CLICK", 30 | "PURCHASE", "CLOSE", "UNINSTALL"); 31 | private Random rand = new Random(); 32 | 33 | @Override 34 | public void run(SourceContext ctx) throws Exception { 35 | long numElements = Long.MAX_VALUE; 36 | long count = 0L; 37 | 38 | while (running && count < numElements){ 39 | String channel = channelSet.get(rand.nextInt(channelSet.size())); 40 | List event = generateEvent(); 41 | LOGGER.info(event.toString()); 42 | String ts = event.get(0); 43 | String id = event.get(1); 44 | String behaviorType = event.get(2); 45 | 46 | String result = StringUtils.join(Arrays.asList(ts, channel, id, behaviorType),"\t"); 47 | ctx.collect(result); 48 | 49 | count += 1; 50 | TimeUnit.MILLISECONDS.sleep(5L); 51 | } 52 | } 53 | 54 | private List generateEvent() { 55 | Long delayedTimestamp = Instant.ofEpochMilli(System.currentTimeMillis()) 56 | .minusMillis(latenessMills) 57 | .toEpochMilli(); 58 | // timestamp, id, behaviorType 59 | return Arrays.asList(delayedTimestamp.toString(), 60 | UUID.randomUUID().toString(), 61 | behaviorTypes.get(rand.nextInt(behaviorTypes.size()))); 62 | } 63 | 64 | 65 | @Override 66 | public void cancel() { 67 | this.running = false; 68 | } 69 | } 70 | -------------------------------------------------------------------------------- /flink-simple-lambda/src/main/java/com/jdd/streaming/demos/UserDefineWaterMark.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.commons.lang3.StringUtils; 4 | import org.apache.commons.lang3.math.NumberUtils; 5 | import org.apache.flink.api.common.functions.MapFunction; 6 | import org.apache.flink.api.common.serialization.SimpleStringSchema; 7 | import org.apache.flink.api.java.tuple.*; 8 | import org.apache.flink.api.java.utils.ParameterTool; 9 | import org.apache.flink.streaming.api.TimeCharacteristic; 10 | import org.apache.flink.streaming.api.datastream.DataStream; 11 | import org.apache.flink.streaming.api.datastream.DataStreamSource; 12 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 13 | import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor; 14 | import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction; 15 | import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows; 16 | import org.apache.flink.streaming.api.windowing.time.Time; 17 | import org.apache.flink.streaming.api.windowing.windows.TimeWindow; 18 | import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011; 19 | import org.apache.flink.util.Collector; 20 | import org.slf4j.Logger; 21 | import org.slf4j.LoggerFactory; 22 | 23 | import java.util.Arrays; 24 | 25 | /** 26 | * @Auther: dalan 27 | * @Date: 19-3-21 16:43 28 | * @Description: 29 | */ 30 | public class UserDefineWaterMark { 31 | /** logger */ 32 | private static final Logger LOGGER = LoggerFactory.getLogger(UserDefineWaterMark.class); 33 | 34 | // main 35 | public static void main(String[] args) throws Exception { 36 | final ParameterTool params = ParameterTool.fromArgs(args); 37 | Long sourceLatenessMillis = (Long)params.getLong("source-lateness-millis"); 38 | Long maxLaggedTimeMillis = params.getLong("window-lagged-millis"); 39 | Long windowSizeMillis = params.getLong("window-size-millis"); 40 | 41 | StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 42 | env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); // 基于event 43 | DataStream streams = env.addSource(new StringLineEventSource(sourceLatenessMillis)); 44 | 45 | //解析输入的数据 46 | DataStream inputMap = ((DataStreamSource) streams) 47 | .setParallelism(1) 48 | .assignTimestampsAndWatermarks( // 指派时间戳,并生成WaterMark 49 | new BoundedOutOfOrdernessTimestampExtractor(Time.milliseconds(maxLaggedTimeMillis)){ 50 | @Override 51 | public long extractTimestamp(String s) { 52 | return NumberUtils.toLong(s.split("\t")[0]); 53 | } 54 | }) 55 | .setParallelism(2) 56 | .map(new MapFunction, Long>>() { 57 | @Override 58 | public Tuple2, Long> map(String value) throws Exception { 59 | String[] arr = value.split("\t"); 60 | String channel = arr[1]; 61 | return new Tuple2, Long>(Tuple2.of(channel, arr[3]), 1L); 62 | } 63 | }) 64 | .setParallelism(2) 65 | //.keyBy(0,1) 66 | .keyBy(0) 67 | .window(TumblingEventTimeWindows.of(Time.milliseconds(windowSizeMillis))) 68 | .process(new ProcessWindowFunction, Long>, Object, Tuple, TimeWindow>() { 69 | @Override 70 | public void process(Tuple tuple, Context context, Iterable, Long>> iterable, Collector collector) throws Exception { 71 | // > 72 | 73 | long count = 0; 74 | Tuple2 tuple2 = null; 75 | for (Tuple2, Long> in : iterable){ 76 | tuple2 = in.f0; 77 | count++; 78 | } 79 | 80 | LOGGER.info("window===" + tuple.toString()); 81 | collector.collect(new Tuple6(tuple2.getField(0).toString(),context.window().getStart(), context.window().getEnd(),tuple.getField(0).toString(),tuple.getField(1).toString(),count)); 82 | } 83 | }) 84 | .setParallelism(4) 85 | .map(t -> { 86 | Tuple6 tt = (Tuple6)t; 87 | Long windowStart = tt.f1; 88 | Long windowEnd = tt.f2; 89 | String channel = tt.f3; 90 | String behaviorType = tt.f4; 91 | Long count = tt.f5; 92 | return StringUtils.join(Arrays.asList(windowStart, windowEnd, channel, behaviorType, count) ,"\t"); 93 | }) 94 | .setParallelism(3); 95 | 96 | inputMap.addSink((new FlinkKafkaProducer011("localhost:9092,localhost:9092","windowed-result-topic",new SimpleStringSchema()))) 97 | .setParallelism(3); 98 | 99 | //inputMap.print(); 100 | 101 | env.execute("EventTime and WaterMark Demo"); 102 | } 103 | } 104 | -------------------------------------------------------------------------------- /flink-simple-lambda/src/main/resources/log4j.properties: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | # Licensed to the Apache Software Foundation (ASF) under one 3 | # or more contributor license agreements. See the NOTICE file 4 | # distributed with this work for additional information 5 | # regarding copyright ownership. The ASF licenses this file 6 | # to you under the Apache License, Version 2.0 (the 7 | # "License"); you may not use this file except in compliance 8 | # with the License. You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | ################################################################################ 18 | 19 | log4j.rootLogger=INFO, console 20 | 21 | log4j.appender.console=org.apache.log4j.ConsoleAppender 22 | log4j.appender.console.layout=org.apache.log4j.PatternLayout 23 | log4j.appender.console.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n 24 | -------------------------------------------------------------------------------- /flink-socket/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | flink-demos 7 | com.jdd.streaming 8 | 1.0-SNAPSHOT 9 | 10 | 4.0.0 11 | flink-socket 12 | jar 13 | 14 | 15 | 16 | org.apache.flink 17 | flink-streaming-java_2.11 18 | 19 | 20 | 21 | org.apache.flink 22 | flink-clients_${scala.binary.version} 23 | 24 | 25 | 26 | org.slf4j 27 | slf4j-log4j12 28 | compile 29 | 30 | 31 | log4j 32 | log4j 33 | compile 34 | 35 | 36 | 37 | org.apache.flink 38 | flink-test-utils-junit 39 | test 40 | 41 | 42 | 43 | 44 | 45 | 46 | org.apache.maven.plugins 47 | maven-jar-plugin 48 | 49 | 50 | 51 | demos.SocketWindowWordCount 52 | 53 | 54 | 55 | 56 | 57 | 58 | -------------------------------------------------------------------------------- /flink-socket/src/main/java/com/jdd/streaming/demos/SocketWindowWordCount.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.api.common.functions.FlatMapFunction; 4 | import org.apache.flink.api.common.functions.ReduceFunction; 5 | import org.apache.flink.api.java.utils.ParameterTool; 6 | import org.apache.flink.streaming.api.datastream.DataStream; 7 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 8 | import org.apache.flink.streaming.api.windowing.time.Time; 9 | import org.apache.flink.util.Collector; 10 | 11 | public class SocketWindowWordCount { 12 | public static void main(String[] args) throws Exception { 13 | final String hostName; 14 | final int port; 15 | 16 | final ParameterTool params = ParameterTool.fromArgs(args); 17 | hostName = params.has("hostname") ? params.get("hostname") : "localhost"; 18 | port = params.getInt("port"); 19 | 20 | // 使用flink的步骤 21 | // 1、构建一个ExecutionEnvironment 22 | // 2、创建Source,对应的是DataStream即视为数据源 23 | // 3、执行算子链 得到自己想要的结果;同时需要定义结果类(map---> operators...--->reduce) 24 | // 4、对生产的结果sink处理 25 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 26 | DataStream text = env.socketTextStream(hostName, port, "\n"); 27 | 28 | DataStream windowCounts = text.flatMap(new FlatMapFunction() { 29 | public void flatMap(String s, Collector collector) throws Exception { 30 | for (String word : s.split("\\s")){ 31 | collector.collect(new WordWithCount(word, 1L)); 32 | } 33 | } 34 | }) 35 | .keyBy("word") 36 | .timeWindow(Time.seconds(5)) 37 | .reduce(new ReduceFunction() { 38 | public WordWithCount reduce(WordWithCount t1, WordWithCount t2) throws Exception { 39 | return new WordWithCount(t1.word, t1.count + t2.count); 40 | } 41 | }); 42 | 43 | windowCounts.print().setParallelism(1); 44 | env.execute("Socket Window WordCount"); 45 | } 46 | 47 | 48 | 49 | public static class WordWithCount{ // 定义 50 | public String word; 51 | public long count; 52 | 53 | public WordWithCount(){} 54 | public WordWithCount(String word, long count){ 55 | this.count = count; 56 | this.word = word; 57 | } 58 | 59 | public String toString(){ 60 | return word + " : " + count; 61 | } 62 | } 63 | } 64 | 65 | 66 | -------------------------------------------------------------------------------- /flink-sql-demos/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | flink-demos 7 | com.jdd.streaming 8 | 1.0-SNAPSHOT 9 | 10 | 4.0.0 11 | flink-sql-demos 12 | flink-sql 13 | jar 14 | 15 | 16 | 17 | org.apache.flink 18 | flink-table_2.11 19 | 20 | 21 | org.apache.flink 22 | flink-streaming-java_2.11 23 | 24 | 25 | org.apache.flink 26 | flink-streaming-scala_2.11 27 | 28 | 29 | org.slf4j 30 | slf4j-log4j12 31 | 32 | 33 | 34 | 35 | 36 | 37 | org.apache.maven.plugins 38 | maven-jar-plugin 39 | 40 | 41 | 42 | org.apache.maven.plugins 43 | maven-compiler-plugin 44 | 45 | 1.8 46 | 1.8 47 | 48 | 49 | 50 | 51 | -------------------------------------------------------------------------------- /flink-sql-demos/src/main/java/com/jdd/streaming/demos/SimpleDataStreamToTable.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.streaming.api.datastream.DataStream; 4 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 5 | import org.apache.flink.streaming.api.functions.source.SourceFunction; 6 | import org.apache.flink.table.api.Table; 7 | import org.apache.flink.table.api.TableEnvironment; 8 | import org.apache.flink.table.api.java.StreamTableEnvironment; 9 | 10 | import java.util.Random; 11 | 12 | /** 13 | * @Auther: dalan 14 | * @Date: 19-3-29 11:22 15 | * @Description: 16 | */ 17 | public class SimpleDataStreamToTable { 18 | public static String[] names = {"Tom", "Bob", "Bill", "Robin", "kalo", "jack"}; 19 | 20 | public static void main(String[] args) throws Exception { 21 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 22 | DataStream users = env.addSource(new SourceFunction() { 23 | private volatile boolean isRunning = true; 24 | private Random rand = new Random(9527); 25 | 26 | @Override public void run(SourceContext out) throws Exception { 27 | while (isRunning){ //模拟stream数据 28 | User user = new User(names[rand.nextInt(names.length)], rand.nextInt(100) + 1, "user content"); 29 | out.collect(user); 30 | Thread.sleep(50); 31 | } 32 | } 33 | 34 | @Override public void cancel() { 35 | isRunning = false; 36 | } 37 | }); 38 | 39 | 40 | // 创建table context 41 | final StreamTableEnvironment tEnv = TableEnvironment.getTableEnvironment(env); 42 | // DataStream ---> table 43 | Table user = tEnv.fromDataStream(users); 44 | Table result = user.filter("name='Tom'") 45 | .select("name, age, content"); 46 | 47 | tEnv.toAppendStream(result, User.class).print(); 48 | 49 | env.execute("a simple datastream to table demo"); 50 | } 51 | 52 | public static class User{ 53 | public String name; 54 | public Integer age; 55 | public String content; 56 | 57 | public User(){ } 58 | public User(String name, Integer age, String content){ 59 | this.name = name; 60 | this.age = age; 61 | this.content = content; 62 | } 63 | 64 | @Override public String toString(){ 65 | return "user name=" + this.name 66 | + "\tage=" + this.age 67 | + "\tcontent=" + this.content; 68 | } 69 | } 70 | 71 | 72 | } 73 | -------------------------------------------------------------------------------- /flink-sql-demos/src/main/java/com/jdd/streaming/demos/TableDemo.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.streaming.api.datastream.DataStream; 4 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 5 | import org.apache.flink.table.api.Table; 6 | import org.apache.flink.table.api.TableEnvironment; 7 | import org.apache.flink.table.api.java.StreamTableEnvironment; 8 | import org.slf4j.Logger; 9 | import org.slf4j.LoggerFactory; 10 | 11 | import java.io.Serializable; 12 | import java.util.Arrays; 13 | 14 | /** 15 | * @Auther: dalan 16 | * @Date: 19-3-28 10:10 17 | * @Description: 18 | */ 19 | public class TableDemo { 20 | /** logger */ 21 | private static final Logger LOGGER = LoggerFactory.getLogger(TableDemo.class); 22 | 23 | public static void main(String[] args) throws Exception { 24 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 25 | final StreamTableEnvironment tEnv = TableEnvironment.getTableEnvironment(env); 26 | 27 | DataStream orders1 = env.fromCollection(Arrays.asList( 28 | new Order(1L, "beer", 3), 29 | new Order(1L, "diaper", 4), 30 | new Order(3L, "rubber", 2) 31 | ) 32 | ); 33 | DataStream orders2 = env.fromCollection(Arrays.asList( 34 | new Order(2L, "pen", 3), 35 | new Order(2L, "rubber", 3), 36 | new Order(4L, "beer", 1) 37 | ) 38 | ); 39 | Table t = tEnv.fromDataStream(orders1,"user, product, amount"); 40 | tEnv.registerDataStream("orders2", orders2, "user, product, amount"); 41 | Table result = tEnv.sqlQuery("SELECT * FROM " + t + " WHERE amount > 2 UNION ALL " + 42 | " SELECT * FROM orders2 WHERE amount < 2"); 43 | 44 | 45 | tEnv.toAppendStream(result, Order.class).print(); 46 | 47 | env.execute("a simple table demo"); 48 | 49 | } 50 | 51 | // 定义POJO类 52 | public static class Order implements Serializable{ 53 | public Long user; 54 | public String product; 55 | public Integer amount; 56 | 57 | public Order() { 58 | } 59 | 60 | public Order(Long user, String product, Integer amount) { 61 | this.user = user; 62 | this.product = product; 63 | this.amount = amount; 64 | } 65 | 66 | @Override 67 | public String toString() { 68 | return "Order{" + 69 | "user=" + user + 70 | ", product='" + product + '\'' + 71 | ", amount=" + amount + 72 | '}'; 73 | } 74 | } 75 | } 76 | -------------------------------------------------------------------------------- /flink-state-demo/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | flink-demos 7 | com.jdd.streaming 8 | 1.0-SNAPSHOT 9 | 10 | 4.0.0 11 | 12 | flink-state-demo 13 | flink-state 14 | jar 15 | 16 | 17 | 18 | org.apache.flink 19 | flink-java 20 | 21 | 22 | org.apache.flink 23 | flink-streaming-java_2.11 24 | 25 | 26 | org.apache.flink 27 | flink-clients_2.11 28 | 29 | 30 | org.slf4j 31 | slf4j-log4j12 32 | 33 | 34 | 35 | 36 | 37 | 38 | org.apache.maven.plugins 39 | maven-compiler-plugin 40 | 41 | 1.8 42 | 1.8 43 | 44 | 45 | 46 | org.apache.maven.plugins 47 | maven-jar-plugin 48 | 49 | 50 | 51 | 52 | -------------------------------------------------------------------------------- /flink-state-demo/src/main/java/com/jdd/streaming/demos/ConnectionStreamDemo.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.api.common.state.ValueState; 4 | import org.apache.flink.api.common.state.ValueStateDescriptor; 5 | import org.apache.flink.api.java.tuple.Tuple2; 6 | import org.apache.flink.api.java.tuple.Tuple3; 7 | import org.apache.flink.configuration.Configuration; 8 | import org.apache.flink.streaming.api.datastream.DataStream; 9 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 10 | import org.apache.flink.streaming.api.functions.co.RichCoFlatMapFunction; 11 | import org.apache.flink.util.Collector; 12 | import org.slf4j.Logger; 13 | import org.slf4j.LoggerFactory; 14 | 15 | import java.util.Arrays; 16 | 17 | /** 18 | * @Auther: dalan 19 | * @Date: 19-3-28 19:23 20 | * @Description: 21 | */ 22 | public class ConnectionStreamDemo { 23 | /** logger */ 24 | private static final Logger LOGGER = LoggerFactory.getLogger(ConnectionStreamDemo.class); 25 | 26 | public static void main(String[] args) throws Exception { 27 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 28 | 29 | //DataStream control = env.fromElements("DROP", "IGNORE").keyBy(x->x); 30 | //DataStream streamOfWords = env.fromElements("data", "DROP", "artisans", "IGNORE").keyBy(x->x); 31 | 32 | // 33 | // 保证both streams 要么都分组 要么都不分组 34 | // 1.分组的情况 对应的key值必须是相同的类型 35 | // 2.分组的情况 要求相对开放 36 | DataStream> control = env.fromCollection(Arrays.asList( 37 | new Tuple2("hello", 1), 38 | new Tuple2("world", 1), 39 | new Tuple2("good", 1), 40 | new Tuple2("come", 1), 41 | new Tuple2("here", 1), 42 | new Tuple2("go", 1) 43 | ));//.keyBy(x->x.f0); 44 | 45 | 46 | DataStream> streamOfWords = env.fromCollection(Arrays.asList( 47 | new Tuple3("hello", 3, 1), 48 | new Tuple3("world", 3, 1), 49 | new Tuple3("good", 3, 2) 50 | ));//.keyBy(x->x.f0); 51 | 52 | // control.connect(streamOfWords) 53 | // .flatMap(new RichCoFlatMapFunction() { 54 | // @Override public void flatMap1(String s, Collector collector) throws Exception { 55 | // System.out.println("===flatMap1==="); 56 | // blocked.update(Boolean.TRUE); 57 | // } 58 | // 59 | // @Override public void flatMap2(String s, Collector out) throws Exception { 60 | // System.out.println("===flatMap2==="); 61 | // if(blocked.value() == null){ 62 | // out.collect(s); 63 | // } 64 | // } 65 | // 66 | // private transient ValueState blocked; 67 | // 68 | // @Override public void open(Configuration config){ 69 | // blocked = getRuntimeContext().getState(new ValueStateDescriptor("", Boolean.class)); 70 | // } 71 | // }) 72 | control.connect(streamOfWords) 73 | .flatMap(new RichCoFlatMapFunction, Tuple3, Object>() { 74 | @Override public void flatMap1(Tuple2 in, Collector out) throws Exception { 75 | //Tuple3 result = Tuple3.of(in.f0, in.f1, 0); 76 | System.out.println("flatMap1\t" + in.toString()); 77 | out.collect(in); 78 | } 79 | 80 | @Override public void flatMap2(Tuple3 in, Collector out) throws Exception { 81 | System.out.println("flatMap2\t" + in.toString()); 82 | out.collect(in); 83 | } 84 | }) 85 | .print(); 86 | 87 | env.execute(); 88 | } 89 | 90 | 91 | } 92 | -------------------------------------------------------------------------------- /flink-state-demo/src/main/java/com/jdd/streaming/demos/SimpleRichFlatMapState.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.api.common.functions.RichFlatMapFunction; 4 | import org.apache.flink.api.common.state.ValueState; 5 | import org.apache.flink.api.common.state.ValueStateDescriptor; 6 | import org.apache.flink.api.common.typeinfo.TypeHint; 7 | import org.apache.flink.api.common.typeinfo.TypeInformation; 8 | import org.apache.flink.api.java.tuple.Tuple2; 9 | import org.apache.flink.configuration.Configuration; 10 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 11 | import org.apache.flink.util.Collector; 12 | import org.slf4j.Logger; 13 | import org.slf4j.LoggerFactory; 14 | 15 | /** 16 | * @Auther: dalan 17 | * @Date: 19-3-28 17:53 18 | * @Description: 19 | */ 20 | 21 | public class SimpleRichFlatMapState{ 22 | /** logger */ 23 | private static final Logger LOGGER = LoggerFactory.getLogger(SimpleRichFlatMapState.class); 24 | 25 | public static void main(String[] args) throws Exception { 26 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 27 | // ... 28 | env.fromElements(Tuple2.of(1L, 3L), Tuple2.of(1L, 5L), Tuple2.of(1L, 7L), Tuple2.of(1L, 4L), Tuple2.of(1L, 2L)) 29 | .keyBy(0) 30 | .flatMap(new CountWindowAverage()) 31 | .print(); 32 | 33 | // the printed output will be (1,4) and (1,5) 34 | env.execute(); 35 | } 36 | 37 | 38 | public static class CountWindowAverage extends RichFlatMapFunction, Tuple2> { 39 | 40 | /** 41 | * ValueState状态句柄. 第一个值为count,第二个值为sum。 42 | */ 43 | private transient ValueState> sum; 44 | 45 | @Override 46 | public void flatMap(Tuple2 input, Collector> out) throws Exception { 47 | // 获取当前状态值 48 | Tuple2 currentSum = sum.value(); 49 | 50 | // 更新 51 | currentSum.f0 += 1; 52 | currentSum.f1 += input.f1; 53 | 54 | // 更新状态值 55 | sum.update(currentSum); 56 | 57 | // 如果count >=2 清空状态值,重新计算 58 | if (currentSum.f0 >= 2) { 59 | out.collect(new Tuple2<>(input.f0, currentSum.f1 / currentSum.f0)); 60 | sum.clear(); 61 | } 62 | } 63 | 64 | @Override 65 | public void open(Configuration config) { 66 | ValueStateDescriptor> descriptor = 67 | new ValueStateDescriptor<>( 68 | "average", // 状态名称 69 | TypeInformation.of(new TypeHint>() {}), // 状态类型 70 | Tuple2.of(0L, 0L)); // 状态默认值 71 | sum = getRuntimeContext().getState(descriptor); 72 | } 73 | } 74 | } 75 | 76 | -------------------------------------------------------------------------------- /flink-state-demo/src/main/java/com/jdd/streaming/demos/SimpleState.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.api.common.functions.RichMapFunction; 4 | import org.apache.flink.api.common.state.ValueState; 5 | import org.apache.flink.api.common.state.ValueStateDescriptor; 6 | import org.apache.flink.api.common.typeinfo.TypeHint; 7 | import org.apache.flink.api.common.typeinfo.TypeInformation; 8 | import org.apache.flink.api.java.Utils; 9 | import org.apache.flink.api.java.tuple.Tuple; 10 | import org.apache.flink.api.java.tuple.Tuple2; 11 | import org.apache.flink.configuration.Configuration; 12 | import org.apache.flink.streaming.api.datastream.DataStream; 13 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 14 | import org.apache.flink.streaming.api.functions.source.SourceFunction; 15 | import org.slf4j.Logger; 16 | import org.slf4j.LoggerFactory; 17 | 18 | import java.util.Random; 19 | 20 | /** 21 | * @Auther: dalan 22 | * @Date: 19-3-28 17:00 23 | * @Description: 24 | */ 25 | public class SimpleState { 26 | /** logger */ 27 | private static final Logger LOGGER = LoggerFactory.getLogger(SimpleState.class); 28 | 29 | public static void main(String[] args) throws Exception { 30 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 31 | env.setParallelism(1); 32 | 33 | final String[] items = {"hello","world","go","come","here"}; 34 | 35 | DataStream> input = env.addSource(new SourceFunction>() { 36 | private volatile boolean isRunning = true; 37 | private Random rand = new Random(9527); 38 | 39 | @Override public void run(SourceContext> ctx) throws Exception { 40 | while (isRunning){ 41 | Tuple2 item = new Tuple2<>(items[rand.nextInt(items.length)],rand.nextInt(10)+1); 42 | ctx.collect(item); 43 | // 模拟暂停 44 | Thread.sleep(100); 45 | } 46 | } 47 | 48 | @Override public void cancel() { 49 | isRunning = false; 50 | } 51 | }); 52 | 53 | DataStream> smoothed = 54 | input.keyBy(0) 55 | .map(new Smoother()); 56 | 57 | smoothed.print(); 58 | 59 | env.execute("s simple state demo"); 60 | } 61 | 62 | public static class Smoother extends RichMapFunction, Tuple2>{ 63 | private transient ValueState> averageState; 64 | 65 | @Override public void open(Configuration config) throws Exception{ // 由parallel数决定 需要启动执行几次open 66 | System.out.println("execute only one"); 67 | ValueStateDescriptor> descriptor = 68 | new ValueStateDescriptor>( 69 | "test state", 70 | TypeInformation.of(new TypeHint>() {}) 71 | ); 72 | averageState = getRuntimeContext().getState(descriptor); 73 | } 74 | 75 | @Override public Tuple2 map(Tuple2 item) throws Exception { 76 | Tuple2 average = averageState.value(); 77 | 78 | if(average == null) { 79 | average = new Tuple2<>("test", 0); 80 | } 81 | 82 | //System.out.println("=======" + average.toString()); 83 | 84 | //更新 85 | average.f1 += item.f1; 86 | averageState.update(average); 87 | 88 | if(average.f1 >= 2){ 89 | averageState.clear(); // 执行state清除 90 | return new Tuple2<>(item.f0, (item.f1 + average.f1)); 91 | } 92 | return new Tuple2<>("test",0); 93 | } 94 | } 95 | } 96 | -------------------------------------------------------------------------------- /flink-taxi-demos/flinktaxidemos.iml: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /flink-taxi-demos/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | flink-demos 7 | com.jdd.streaming 8 | 1.0-SNAPSHOT 9 | 10 | 4.0.0 11 | 12 | flink-taxi-demos 13 | flink-taxi 14 | jar 15 | 16 | 17 | 18 | org.apache.flink 19 | flink-clients_2.11 20 | 21 | 22 | org.apache.flink 23 | flink-streaming-java_2.11 24 | 25 | 26 | 27 | redis.clients 28 | jedis 29 | 30 | 31 | redis.clients 32 | jedis 33 | 34 | 35 | 36 | 37 | joda-time 38 | joda-time 39 | 2.7 40 | 41 | 42 | org.slf4j 43 | slf4j-log4j12 44 | 45 | 46 | org.apache.flink 47 | flink-table_2.11 48 | 49 | 50 | org.apache.flink 51 | flink-streaming-scala_2.11 52 | 53 | 54 | org.apache.flink 55 | flink-table_2.11 56 | 57 | 58 | org.apache.flink 59 | flink-connector-elasticsearch6_2.11 60 | ${flink.version} 61 | 62 | 63 | 64 | 65 | 66 | 67 | org.apache.maven.plugins 68 | maven-compiler-plugin 69 | 3.1 70 | 71 | 1.8 72 | 1.8 73 | 74 | 75 | 76 | org.apache.maven.plugins 77 | maven-jar-plugin 78 | 79 | 80 | 81 | com.jdd.streaming.demos.TaxiRideCount 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/DataStreamDemo.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import com.jdd.streaming.demos.entity.TaxiFare; 4 | import com.jdd.streaming.demos.entity.TaxiRide; 5 | import com.jdd.streaming.demos.source.TaxiFareSource; 6 | import com.jdd.streaming.demos.source.TaxiRideSource; 7 | import org.apache.flink.api.common.functions.FlatMapFunction; 8 | import org.apache.flink.api.common.functions.MapFunction; 9 | import org.apache.flink.api.common.io.FileInputFormat; 10 | import org.apache.flink.api.java.tuple.Tuple; 11 | import org.apache.flink.api.java.tuple.Tuple1; 12 | import org.apache.flink.api.java.tuple.Tuple2; 13 | import org.apache.flink.api.java.utils.ParameterTool; 14 | import org.apache.flink.streaming.api.datastream.DataStream; 15 | import org.apache.flink.streaming.api.datastream.KeyedStream; 16 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 17 | import org.apache.flink.streaming.api.functions.source.SourceFunction; 18 | import org.apache.flink.streaming.api.functions.windowing.AllWindowFunction; 19 | import org.apache.flink.streaming.api.functions.windowing.WindowFunction; 20 | import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows; 21 | import org.apache.flink.streaming.api.windowing.time.Time; 22 | import org.apache.flink.streaming.api.windowing.windows.TimeWindow; 23 | import org.apache.flink.util.Collector; 24 | import org.slf4j.Logger; 25 | import org.slf4j.LoggerFactory; 26 | 27 | import java.util.Iterator; 28 | import java.util.Random; 29 | import java.util.concurrent.TimeUnit; 30 | 31 | /** 32 | * @Auther: dalan 33 | * @Date: 19-3-25 15:19 34 | * @Description: 35 | */ 36 | public class DataStreamDemo { 37 | /** logger */ 38 | private static final Logger LOGGER = LoggerFactory.getLogger(DataStreamDemo.class); 39 | // main 40 | public static void main(String[] args) throws Exception { 41 | final ParameterTool params = ParameterTool.fromArgs(args); 42 | String taxiRide = params.get("taxi-ride-path","/home/wmm/go_bench/flink_sources/nycTaxiRides.gz"); 43 | String taxiFare = params.get("taxi-fare-path","/home/wmm/go_bench/flink_sources/nycTaxiFares.gz"); 44 | 45 | // 46 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 47 | //DataStream rides = env.addSource(new TaxiRideSource(taxiRide, maxEventDelay, servingSpeedFactor)); 48 | //DataStream fares = env.addSource(new TaxiFareSource(taxiFare, maxEventDelay, servingSpeedFactor)); 49 | env.getConfig().setParallelism(1); 50 | 51 | DataStream strs = env.addSource(new SourceFunction() { 52 | @Override 53 | public void run(SourceContext ctx) throws Exception { 54 | Random rand = new Random(); 55 | while (true) { 56 | ctx.collect(TESTSTR[rand.nextInt(TESTSTR.length)]); 57 | Thread.sleep(100); // 模拟 让数据collect延迟100ms 便于window操作的输出:窗口大小=1s 每隔500ms执行一次 58 | } 59 | } 60 | 61 | @Override 62 | public void cancel() { // 此处不需要处理 63 | } 64 | }); 65 | KeyedStream, Tuple> keyedStrs = strs 66 | .map(new MapFunction>() { 67 | @Override 68 | public Tuple2 map(String s) throws Exception { 69 | return new Tuple2(s, 1); 70 | } 71 | }) 72 | .keyBy(0) 73 | ; 74 | 75 | DataStream result = keyedStrs 76 | .timeWindow(Time.seconds(1)) 77 | .apply(new WindowFunction, String, Tuple, TimeWindow>() { 78 | @Override 79 | public void apply(Tuple tuple, TimeWindow timeWindow, Iterable> iterable, Collector collector) throws Exception { 80 | Iterator> iter = iterable.iterator(); 81 | long sum = 0; 82 | while (iter.hasNext()){ 83 | Tuple2 t = iter.next(); 84 | sum += t.f1; 85 | } 86 | 87 | LOGGER.info("key = " + tuple.toString() 88 | + "\tval = " + sum + "\n" 89 | + "window start = " + timeWindow.getStart() + "\t window end " + timeWindow.getEnd() + "\n"); 90 | 91 | collector.collect("key = " + tuple.toString() + "\tval = " + sum); 92 | } 93 | }).map(new MapFunction() { 94 | @Override 95 | public String map(String s) throws Exception { 96 | return s; 97 | } 98 | }); 99 | // 100 | result.print(); 101 | 102 | // DataStream> flatStrs = strs.flatMap(new FlatMapFunction>() { 103 | // @Override 104 | // public void flatMap(String s, Collector> collector) throws Exception { 105 | // collector.collect(new Tuple2(s, 1)); 106 | // } 107 | // }).setParallelism(3); 108 | // DataStream> reStrs = flatStrs.rebalance(); 109 | // DataStream mapStrs = reStrs.map(new MapFunction, String>() { 110 | // @Override 111 | // public String map(Tuple2 s) throws Exception { 112 | // return new String("key= " + s.f0 + "\tval= " + s.f1); 113 | // } 114 | // }); 115 | // mapStrs.print(); 116 | 117 | env.execute("the datastream api demos"); 118 | } 119 | 120 | // 测试模拟数据 121 | private static String[] TESTSTR = {"hello","world","good","yes","no","it","no","come"}; 122 | } 123 | -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/SimpleFunctions.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import com.jdd.streaming.demos.entity.TaxiRide; 4 | import com.jdd.streaming.demos.source.TaxiRideSource; 5 | import org.apache.flink.api.common.functions.FlatMapFunction; 6 | import org.apache.flink.api.java.functions.KeySelector; 7 | import org.apache.flink.api.java.tuple.Tuple2; 8 | import org.apache.flink.api.java.utils.ParameterTool; 9 | import org.apache.flink.streaming.api.datastream.DataStream; 10 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 11 | import org.apache.flink.util.Collector; 12 | import org.joda.time.Interval; 13 | import org.joda.time.Minutes; 14 | import org.slf4j.Logger; 15 | import org.slf4j.LoggerFactory; 16 | import com.jdd.streaming.demos.utils.GeoUtils; 17 | 18 | /** 19 | * @Auther: dalan 20 | * @Date: 19-3-28 11:50 21 | * @Description: 22 | */ 23 | public class SimpleFunctions { 24 | /** logger */ 25 | private static final Logger LOGGER = LoggerFactory.getLogger(SimpleFunctions.class); 26 | 27 | public static void main(String[] args) throws Exception { 28 | final ParameterTool params = ParameterTool.fromArgs(args); 29 | String taxiRide = params.get("taxi-ride-path","/home/wmm/go_bench/flink_sources/nycTaxiRides.gz"); 30 | //String taxiFare = params.get("taxi-fare-path","/home/wmm/go_bench/flink_sources/nycTaxiFares.gz"); 31 | 32 | int maxEventDelay = 60; 33 | int servingSpeedFactor = 600; 34 | 35 | // 36 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 37 | DataStream rides = env.addSource(new TaxiRideSource(taxiRide, maxEventDelay, servingSpeedFactor)); 38 | 39 | // 实现每个司机在NY范围内 每趟旅程的时间 40 | rides.flatMap(new FlatMapFunction() { 41 | @Override public void flatMap(TaxiRide taxiRide, Collector out) throws Exception { 42 | if (GeoUtils.isInNYC(taxiRide.startLon, taxiRide.endLat)){ 43 | out.collect(taxiRide); 44 | } 45 | } 46 | }).keyBy(new KeySelector() { 47 | @Override public Long getKey(TaxiRide taxiRide) throws Exception { 48 | return taxiRide.driverId; 49 | } 50 | }).flatMap(new FlatMapFunction>() { 51 | @Override public void flatMap(TaxiRide taxiRide, Collector> out) 52 | throws Exception { 53 | if(!taxiRide.isStart){ 54 | Interval rideInterval = new Interval(taxiRide.startTime, taxiRide.endTime); 55 | Minutes duration = rideInterval.toDuration().toStandardMinutes(); 56 | out.collect(new Tuple2(taxiRide.driverId, duration)); 57 | } 58 | } 59 | }).keyBy(0) 60 | .maxBy(1) 61 | .print(); 62 | 63 | env.execute("simpel flink operators used."); 64 | } 65 | } 66 | -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/SimpleSideOutput.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.commons.collections.IteratorUtils; 4 | import org.apache.flink.api.common.functions.MapFunction; 5 | import org.apache.flink.api.java.tuple.Tuple; 6 | import org.apache.flink.streaming.api.TimeCharacteristic; 7 | import org.apache.flink.streaming.api.datastream.DataStream; 8 | import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator; 9 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 10 | import org.apache.flink.streaming.api.functions.source.SourceFunction; 11 | import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor; 12 | import org.apache.flink.streaming.api.functions.windowing.WindowFunction; 13 | import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows; 14 | import org.apache.flink.streaming.api.windowing.time.Time; 15 | import org.apache.flink.streaming.api.windowing.windows.TimeWindow; 16 | import org.apache.flink.util.Collector; 17 | import org.apache.flink.util.OutputTag; 18 | 19 | import java.util.*; 20 | 21 | import org.slf4j.Logger; 22 | import org.slf4j.LoggerFactory; 23 | 24 | import java.util.concurrent.*; 25 | 26 | import static java.util.concurrent.TimeUnit.SECONDS; 27 | 28 | /** 29 | * @Auther: dalan 30 | * @Date: 19-4-2 11:35 31 | * @Description: 32 | */ 33 | public class SimpleSideOutput { 34 | /** logger */ 35 | private static final Logger LOGGER = LoggerFactory.getLogger(SimpleSideOutput.class); 36 | public static void main(String[] args) throws Exception { 37 | final OutputTag REJECTEDWORDSTAG = new OutputTag("rejected_words_tag"){}; 38 | 39 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 40 | String[] datas = {"hello","world","good","yes","ok","here"}; 41 | String[] ops = {"-","+"}; 42 | 43 | env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); 44 | env.setParallelism(4); 45 | 46 | DataStream strs = env.addSource(new SourceFunction() { 47 | private Random rand = new Random(9527); 48 | private volatile boolean isRunning = true; 49 | private volatile Long nums =0L; 50 | 51 | 52 | @Override public void run(SourceContext out) throws Exception { 53 | final long cts = System.currentTimeMillis(); 54 | 55 | // 模拟延迟数据 56 | final ScheduledExecutorService exec = new ScheduledThreadPoolExecutor(1); 57 | exec.scheduleAtFixedRate(new Runnable() { 58 | @Override public void run() { 59 | SimpleWaterMark.Event e = new SimpleWaterMark.Event(datas[rand.nextInt(datas.length)], ops[rand.nextInt(2)].equals("+")? (cts + rand.nextInt(100)) : (cts - rand.nextInt(100)) ); 60 | System.out.println( 61 | "======single thread event=====" + e + " current_thread_id " + Thread.currentThread().getId()); 62 | out.collect(e); 63 | }}, 3, 4, TimeUnit.SECONDS); 64 | 65 | // 模拟正常数据 66 | while (isRunning && nums < 500){ 67 | long ts = System.currentTimeMillis(); 68 | SimpleWaterMark.Event e = new SimpleWaterMark.Event(datas[rand.nextInt(datas.length)], ops[rand.nextInt(2)].equals("+")? (ts + rand.nextInt(100)) : (ts - rand.nextInt(100)) ); 69 | System.out.println("======event=====" + e + " current_thread_id " + Thread.currentThread().getId()); 70 | out.collect(e); 71 | 72 | nums++; 73 | Thread.sleep(rand.nextInt(50)+10); 74 | } 75 | exec.shutdownNow(); 76 | } 77 | 78 | @Override public void cancel() { 79 | isRunning = false; 80 | } 81 | }); 82 | 83 | SingleOutputStreamOperator sides = strs 84 | .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor(Time.of(2L, SECONDS)) { 85 | private volatile Long currentTimestamp = 0L; 86 | @Override public long extractTimestamp(SimpleWaterMark.Event event) { 87 | long ts = event.ts; 88 | currentTimestamp = ts > currentTimestamp ? ts : currentTimestamp; 89 | return ts; 90 | } 91 | }) 92 | .keyBy("name") 93 | // .process(new KeyedProcessFunction() { 94 | // @Override 95 | // public void processElement(SimpleWaterMark.Event event, Context ctx, Collector out) 96 | // throws Exception { 97 | // String key = event.name; 98 | // if(key.length() >= 5){ 99 | // ctx.output(REJECTEDWORDSTAG, event); 100 | // }else if (key.length() > 0){ 101 | // out.collect(event); 102 | // } 103 | // } 104 | // }) 105 | //.timeWindow(Time.of(2, SECONDS)) 106 | .window(TumblingEventTimeWindows.of(Time.seconds(2))) 107 | .sideOutputLateData(REJECTEDWORDSTAG) 108 | .apply(new WindowFunction() { 109 | @Override 110 | public void apply(Tuple tuple, TimeWindow timeWindow, Iterable iterable, 111 | Collector out) throws Exception { 112 | Iterator iter = iterable.iterator(); 113 | List events = IteratorUtils.toList(iter); 114 | Collections.sort(events); 115 | for (SimpleWaterMark.Event e: events) { 116 | out.collect(e); 117 | } 118 | 119 | System.out.println("the time window " + 120 | "\tstart " + timeWindow.getStart()+ 121 | "\tend " + timeWindow.getEnd() + 122 | "\tkey " + tuple.toString() + 123 | "\telement_size " + events.size()); 124 | 125 | } 126 | }); 127 | 128 | // 记录延迟数据可单独做处理 129 | DataStream events = 130 | sides.getSideOutput(REJECTEDWORDSTAG) 131 | .map(new MapFunction() { 132 | @Override public String map(SimpleWaterMark.Event event) throws Exception { 133 | return "rejected_"+event; 134 | } 135 | }); 136 | events.print(); 137 | 138 | env.execute("a simple sideoutput demo"); 139 | } 140 | } 141 | -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/SimpleTable.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.streaming.api.datastream.DataStream; 4 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 5 | import org.apache.flink.table.api.Table; 6 | import org.apache.flink.table.api.TableEnvironment; 7 | import org.apache.flink.table.api.java.StreamTableEnvironment; 8 | import org.joda.time.DateTime; 9 | import org.joda.time.format.DateTimeFormat; 10 | import org.joda.time.format.DateTimeFormatter; 11 | import org.slf4j.Logger; 12 | import org.slf4j.LoggerFactory; 13 | 14 | import java.util.Arrays; 15 | import java.util.Locale; 16 | 17 | /** 18 | * @Auther: dalan 19 | * @Date: 19-3-28 14:51 20 | * @Description: 21 | */ 22 | public class SimpleTable { 23 | /** logger */ 24 | private static final Logger LOGGER = LoggerFactory.getLogger(SimpleTable.class); 25 | 26 | private static transient DateTimeFormatter timeFormatter = 27 | DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss").withLocale(Locale.CHINA).withZoneUTC(); 28 | 29 | public static void main(String[] args) throws Exception { 30 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 31 | final StreamTableEnvironment tEnv = TableEnvironment.getTableEnvironment(env); 32 | 33 | DataStream orders = env.fromCollection(Arrays.asList( 34 | new SimpleOrder(12345L, new DateTime(), (short)3), 35 | new SimpleOrder(23456L, new DateTime(), (short)4), 36 | new SimpleOrder(34567L, new DateTime(), (short)2), 37 | new SimpleOrder(45678L, new DateTime(), (short)1) 38 | )); 39 | 40 | // DataStream orders = env.fromCollection(Arrays.asList( 41 | // new SimpleOrder(12345L, new DateTime(), 3), 42 | // new SimpleOrder(23456L, new DateTime(), 4), 43 | // new SimpleOrder(34567L, new DateTime(), 2), 44 | // new SimpleOrder(45678L, new DateTime(), 1) 45 | // )); 46 | 47 | Table order = tEnv.fromDataStream(orders); 48 | Table result = tEnv.sqlQuery("SELECT * FROM " + order); 49 | tEnv.toAppendStream(result, SimpleOrder.class) // table中对应count字段为是int类型 实际pojo类中需要的是short类型 故而在该处会抛出异常 50 | .print(); 51 | 52 | 53 | env.execute(); 54 | } 55 | 56 | // 当定义的POJO类在进行from table to datastream出现类型转换错误:一般是由于table和datastream指定的字段类型不兼容: 57 | // 通过定义bean并设置对应的getter/setter方法解决 58 | // 直接使用pojo的public字段 会导致类型不兼容问题 59 | // public static class SimpleOrder{ 60 | // private long id; 61 | // private DateTime createTime; 62 | // private short count; 63 | // // public int count; 64 | // 65 | // public SimpleOrder(){} 66 | // public SimpleOrder(long id, DateTime createTime, short count){ 67 | // this.id = id; 68 | // this.createTime = createTime; 69 | // this.count = count; 70 | // } 71 | // 72 | // @Override 73 | // public String toString(){ 74 | // return "id = " + this.id 75 | // + "\tcreateTime= " + createTime.toString(timeFormatter) 76 | // + "\tcount=" + this.count; 77 | // } 78 | // } 79 | 80 | public static class SimpleOrder{ 81 | public long getId() { 82 | return id; 83 | } 84 | 85 | public void setId(long id) { 86 | this.id = id; 87 | } 88 | 89 | public DateTime getCreateTime() { 90 | return createTime; 91 | } 92 | 93 | public void setCreateTime(DateTime createTime) { 94 | this.createTime = createTime; 95 | } 96 | 97 | public short getCount() { 98 | return count; 99 | } 100 | 101 | public void setCount(int count) { 102 | this.count = (short)count; // 强制类型转换 103 | } 104 | 105 | private long id; 106 | private DateTime createTime; 107 | private short count; 108 | // public int count; 109 | 110 | public SimpleOrder(){} 111 | public SimpleOrder(long id, DateTime createTime, short count){ 112 | this.id = id; 113 | this.createTime = createTime; 114 | this.count = count; 115 | } 116 | 117 | @Override 118 | public String toString(){ 119 | return "id = " + this.id 120 | + "\tcreateTime= " + createTime.toString(timeFormatter) 121 | + "\tcount=" + this.count; 122 | } 123 | 124 | } 125 | } 126 | -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/SimpleWaterMark.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.commons.collections.IteratorUtils; 4 | import org.apache.flink.api.java.tuple.Tuple; 5 | import org.apache.flink.streaming.api.TimeCharacteristic; 6 | import org.apache.flink.streaming.api.datastream.DataStream; 7 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 8 | import org.apache.flink.streaming.api.functions.source.SourceFunction; 9 | import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor; 10 | import org.apache.flink.streaming.api.functions.windowing.WindowFunction; 11 | import org.apache.flink.streaming.api.windowing.time.Time; 12 | import org.apache.flink.streaming.api.windowing.windows.TimeWindow; 13 | import org.apache.flink.util.Collector; 14 | 15 | import java.util.*; 16 | import static java.util.concurrent.TimeUnit.SECONDS; 17 | 18 | /** 19 | * @Auther: dalan 20 | * @Date: 19-4-1 17:49 21 | * @Description: 22 | */ 23 | public class SimpleWaterMark { 24 | public static void main(String[] args) throws Exception { 25 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 26 | env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); 27 | env.setParallelism(4); 28 | 29 | String[] datas = {"hello","world","good","yes","ok","here"}; 30 | String[] ops = {"-","+"}; 31 | 32 | DataStream strs = env.addSource(new SourceFunction() { 33 | private Random rand = new Random(9527); 34 | private volatile boolean isRunning = true; 35 | 36 | @Override public void run(SourceContext out) throws Exception { 37 | while (isRunning){ 38 | long ts = System.currentTimeMillis(); 39 | Event e = new Event(datas[rand.nextInt(datas.length)], ops[rand.nextInt(2)].equals("+")? (ts + rand.nextInt(100)) : (ts - rand.nextInt(100)) ); 40 | System.out.println("======event=====" + e); 41 | out.collect(e); 42 | Thread.sleep(rand.nextInt(50)+10); 43 | } 44 | } 45 | 46 | @Override public void cancel() { 47 | isRunning = false; 48 | } 49 | }); 50 | 51 | DataStream events = strs.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor(Time.of(5, SECONDS)) { 52 | @Override public long extractTimestamp(Event e) { 53 | System.out.println("===watermark generation==="); 54 | return e.ts; 55 | } 56 | }).keyBy("name") 57 | .timeWindow(Time.of(5, SECONDS)) 58 | .apply(new WindowFunction() { 59 | @Override public void apply(Tuple tuple, // 由于timewindow需要和KeyedStream结合使用 故而此处提供的是key 60 | TimeWindow timeWindow, // 此处指定了TimeWindow窗口的[start, end) 61 | Iterable iterable, // 包括[start, end)范围TimeWindow内所有的数据: 在window中的数据需要结合实际业务进行设定防止Window数据过大 62 | Collector out) throws Exception { // 将TimeWindow中的内容投递到下一个transform 63 | String key = tuple.toString(); 64 | Iterator iter = iterable.iterator(); 65 | List events = IteratorUtils.toList(iter); 66 | Collections.sort(events); 67 | 68 | for (Event e : events){ 69 | out.collect(e); 70 | } 71 | 72 | System.out.println(timeWindow + 73 | "\tkey=="+ key + 74 | "\telement_size==" + events.size() + 75 | "\twindow_start==" + timeWindow.getStart() + 76 | "\twindow_end=="+timeWindow.getEnd()); 77 | } 78 | }); 79 | 80 | events.print(); 81 | 82 | env.execute("a simple watermark demo"); 83 | } 84 | 85 | public static class Event implements Comparable{ 86 | public String name; 87 | public long ts; 88 | 89 | public Event(){} 90 | public Event(String name, Long ts){ 91 | this.name = name; 92 | this.ts = ts; 93 | } 94 | 95 | @Override public String toString(){ 96 | return "the event name = " + this.name 97 | + "\tts=" + this.ts; 98 | } 99 | 100 | @Override public int hashCode(){ 101 | return (int)this.ts; 102 | } 103 | 104 | @Override public int compareTo(Object other){ 105 | if (other == null){ 106 | return -1; 107 | } 108 | if(other instanceof Event){ 109 | Event e = (Event) other; 110 | if(this.ts < e.ts){ 111 | return -1; 112 | }else if(this.ts > e.ts){ 113 | return 1; 114 | }else{ 115 | return 0; 116 | } 117 | } 118 | 119 | return 0; 120 | } 121 | } 122 | } 123 | -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/SocketSideOutput.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.api.common.functions.MapFunction; 4 | import org.apache.flink.api.java.tuple.Tuple; 5 | import org.apache.flink.api.java.tuple.Tuple2; 6 | import org.apache.flink.streaming.api.TimeCharacteristic; 7 | import org.apache.flink.streaming.api.datastream.DataStream; 8 | import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator; 9 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 10 | import org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks; 11 | import org.apache.flink.streaming.api.functions.windowing.WindowFunction; 12 | import org.apache.flink.streaming.api.watermark.Watermark; 13 | import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows; 14 | import org.apache.flink.streaming.api.windowing.time.Time; 15 | import org.apache.flink.streaming.api.windowing.windows.TimeWindow; 16 | import org.apache.flink.util.Collector; 17 | import org.apache.flink.util.OutputTag; 18 | 19 | import javax.annotation.Nullable; 20 | import java.text.SimpleDateFormat; 21 | import java.util.ArrayList; 22 | import java.util.Collections; 23 | import java.util.Iterator; 24 | import java.util.List; 25 | import java.util.concurrent.TimeUnit; 26 | 27 | /** 28 | * 测试: nc -ln 9000 29 | * 0001,1538359882000 30 | * 0002,1538359886000 31 | * 0003,1538359892000 32 | * 0004,1538359893000 33 | * 0005,1538359894000 34 | * 0006,1538359896000 35 | * 0007,1538359897000 36 | * 0008,1538359897000 37 | * 0009,1538359872000 此条信息比较触发sideoutput的存储 已超出Window的有效时间 38 | * @Auther: dalan 39 | * @Date: 19-4-2 15:36 40 | * @Description: 41 | */ 42 | public class SocketSideOutput { 43 | public static void main(String[] args) throws Exception { 44 | //定义socket的端口号 45 | int port = 9000; 46 | //获取运行环境 47 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 48 | //设置使用eventtime,默认是使用processtime 49 | env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); 50 | //设置并行度为1,默认并行度是当前机器的cpu数量 51 | env.setParallelism(4); 52 | //连接socket获取输入的数据 53 | DataStream text = env.socketTextStream("localhost", port, "\n"); 54 | //解析输入的数据 55 | DataStream> inputMap = text.map(new MapFunction>() { 56 | @Override public Tuple2 map(String value) throws Exception { 57 | String[] arr = value.split(","); 58 | return new Tuple2<>(arr[0], Long.parseLong(arr[1])); 59 | } 60 | }); 61 | 62 | //抽取timestamp和生成watermark 63 | // 设定水印current_watermark = max(event.timestamp) 同时设置最大可忍受延迟时间=1s; 64 | // 通过使用current_watermark - 最大可忍受event延迟时间,将对应的watermark代表的窗口结束时间前移来接受延迟的event 65 | DataStream> waterMarkStream = inputMap.assignTimestampsAndWatermarks(new AssignerWithPeriodicWatermarks>() { 66 | Long currentMaxTimestamp = 0L; 67 | final Long maxOutOfOrderness = 1000L; // 最大可忍受延迟时间1s 68 | // 最大允许的乱序时间是10s 69 | SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS"); 70 | 71 | /** 72 | * 定义生成watermark的逻辑 * 默认100ms被调用一次 73 | */ 74 | @Nullable @Override public Watermark getCurrentWatermark() { 75 | return new Watermark(currentMaxTimestamp - maxOutOfOrderness); 76 | } 77 | 78 | //定义如何提取timestamp 79 | @Override public long extractTimestamp(Tuple2 element, long previousElementTimestamp) { 80 | long timestamp = element.f1; 81 | currentMaxTimestamp = Math.max(timestamp, currentMaxTimestamp); 82 | // System.out.println("key:" + element.f0 + ",eventtime:[" + element.f1 + "|" + sdf.format(element.f1) + "],currentMaxTimestamp:[" + currentMaxTimestamp + "|" + sdf.format(currentMaxTimestamp) 83 | // + "],watermark:[" + getCurrentWatermark().getTimestamp() + "|" + sdf.format(getCurrentWatermark().getTimestamp()) 84 | // + "]"); 85 | return timestamp; 86 | } 87 | }); 88 | 89 | //保存被丢弃的数据 90 | OutputTag> outputTag = new OutputTag>("late-data"){}; 91 | //注意,由于getSideOutput方法是SingleOutputStreamOperator子类中的特有方法,所以这里的类型,不能使用它的父类dataStream。 92 | SingleOutputStreamOperator window = 93 | waterMarkStream.keyBy(0) 94 | .window(TumblingEventTimeWindows.of(Time.of(3, TimeUnit.SECONDS))) 95 | //按照消息的EventTime分配窗口,和调用TimeWindow效果一样 96 | // .allowedLateness(Time.seconds(2)) //允许数据迟到2秒 97 | .sideOutputLateData(outputTag) 98 | .apply(new WindowFunction, String, Tuple, TimeWindow>() { 99 | /** 100 | * 对window内的数据进行排序,保证数据的顺序 101 | * 102 | * @param tuple 103 | * @param window 104 | * @param input 105 | * @param out 106 | * @throws Exception 107 | */ 108 | @Override public void apply(Tuple tuple, TimeWindow window, Iterable> input, Collector out) throws Exception { 109 | String key = tuple.toString(); 110 | List arrarList = new ArrayList(); 111 | Iterator> it = input.iterator(); 112 | while (it.hasNext()) { 113 | Tuple2 next = it.next(); 114 | arrarList.add(next.f1); 115 | } 116 | Collections.sort(arrarList); 117 | SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS"); 118 | String result = key + "," + arrarList.size() + "," + sdf.format(arrarList.get(0)) + "," + sdf 119 | .format(arrarList.get(arrarList.size() - 1)) + "," + sdf.format(window.getStart()) + "," + sdf.format(window.getEnd()); 120 | out.collect(result); 121 | 122 | //System.out.println(result); 123 | } 124 | }); 125 | 126 | //把迟到的数据暂时打印到控制台,实际中可以保存到其他存储介质中 127 | // 本处延迟的event已经超过指定Window的[start,end)有效范围,并且在已忍受可延迟最大周期的基础上出现延迟的信息 128 | DataStream> sideOutput = window.getSideOutput(outputTag); 129 | sideOutput.print(); 130 | //测试-把结果打印到控制台即可 window.print(); 131 | // 注意:因为flink是懒加载的,所以必须调用execute方法,上面的代码才会执行 132 | env.execute("eventtime-watermark"); 133 | } 134 | } -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/TaxiCheckpointDemo.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import com.jdd.streaming.demos.entity.TaxiFare; 4 | import com.jdd.streaming.demos.entity.TaxiRide; 5 | import com.jdd.streaming.demos.source.CheckpointTaxiFareSource; 6 | import com.jdd.streaming.demos.source.CheckpointTaxiRideSource; 7 | import com.jdd.streaming.demos.source.TaxiFareSource; 8 | import com.jdd.streaming.demos.source.TaxiRideSource; 9 | import org.apache.flink.api.common.functions.FlatMapFunction; 10 | import org.apache.flink.api.common.state.ValueState; 11 | import org.apache.flink.api.common.state.ValueStateDescriptor; 12 | import org.apache.flink.api.java.tuple.Tuple2; 13 | import org.apache.flink.configuration.Configuration; 14 | import org.apache.flink.streaming.api.TimeCharacteristic; 15 | import org.apache.flink.streaming.api.datastream.DataStream; 16 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 17 | import org.apache.flink.streaming.api.functions.co.RichCoFlatMapFunction; 18 | import org.apache.flink.table.api.StreamTableEnvironment; 19 | import org.apache.flink.util.Collector; 20 | import org.elasticsearch.search.aggregations.support.ValuesSourceType; 21 | import org.slf4j.Logger; 22 | import org.slf4j.LoggerFactory; 23 | 24 | /** 25 | * @Auther: dalan 26 | * @Date: 19-3-29 19:26 27 | * @Description: 28 | */ 29 | public class TaxiCheckpointDemo { 30 | /** logger */ 31 | private static final Logger LOGGER = LoggerFactory.getLogger(TaxiCheckpointDemo.class); 32 | 33 | public static void main(String[] args) throws Exception { 34 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 35 | env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); 36 | env.setParallelism(4); 37 | 38 | final int delay = 60; 39 | final int servingSpeed = 1800; 40 | 41 | final String ridePath = "/home/wmm/go_bench/flink_sources/nycTaxiRides.gz"; 42 | final String farePath = "/home/wmm/go_bench/flink_sources/nycTaxiFares.gz"; 43 | 44 | DataStream rides = env.addSource(new TaxiRideSource(ridePath, delay, servingSpeed)) 45 | .filter((TaxiRide ride)-> ride.isStart) 46 | .keyBy("rideId"); 47 | DataStream fares = env.addSource(new TaxiFareSource(farePath, delay, servingSpeed)) 48 | .keyBy("rideId"); 49 | 50 | DataStream> rideFares = rides.connect(fares) 51 | .flatMap(new RichCoFlatMapFunction>(){ 52 | // 使用内存state方式存储对应的ride或fare 53 | private ValueState rideStates; 54 | private ValueState fareStates; 55 | 56 | @Override 57 | public void open(Configuration config) throws Exception { 58 | //throw new MissingSolutionException(); 59 | rideStates = getRuntimeContext().getState(new ValueStateDescriptor("ride state", TaxiRide.class)); 60 | fareStates = getRuntimeContext().getState(new ValueStateDescriptor("fare state", TaxiFare.class)); 61 | } 62 | 63 | @Override 64 | public void flatMap1(TaxiRide ride, Collector> out) throws Exception { 65 | TaxiFare fare = fareStates.value(); 66 | if (fare != null){ 67 | fareStates.clear(); 68 | out.collect(new Tuple2<>(ride, fare)); 69 | } else{ 70 | rideStates.update(ride); 71 | } 72 | 73 | } 74 | 75 | @Override 76 | public void flatMap2(TaxiFare fare, Collector> out) throws Exception { 77 | TaxiRide ride = rideStates.value(); 78 | if (ride != null){ 79 | rideStates.clear(); 80 | 81 | out.collect(new Tuple2<>(ride, fare)); 82 | }else { 83 | fareStates.update(fare); 84 | } 85 | } 86 | }); 87 | 88 | rideFares.print(); 89 | env.execute("rides connect fares demo"); 90 | 91 | } 92 | 93 | public static class MissingSolutionException extends Exception { 94 | public MissingSolutionException() {}; 95 | 96 | public MissingSolutionException(String message) { 97 | super(message); 98 | }; 99 | 100 | public MissingSolutionException(Throwable cause) { 101 | super(cause); 102 | } 103 | 104 | public MissingSolutionException(String message, Throwable cause) { 105 | super(message, cause); 106 | } 107 | } 108 | } 109 | -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/TaxiRideCleansing.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import com.jdd.streaming.demos.entity.TaxiRide; 4 | import com.jdd.streaming.demos.sink.RedisConfig; 5 | import com.jdd.streaming.demos.source.TaxiRideSource; 6 | import com.jdd.streaming.demos.utils.GeoUtils; 7 | import org.apache.flink.api.common.functions.FilterFunction; 8 | import org.apache.flink.api.common.functions.FlatMapFunction; 9 | import org.apache.flink.api.common.functions.MapFunction; 10 | import org.apache.flink.api.java.utils.ParameterTool; 11 | import org.apache.flink.configuration.Configuration; 12 | import org.apache.flink.streaming.api.TimeCharacteristic; 13 | import org.apache.flink.streaming.api.datastream.DataStream; 14 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 15 | import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; 16 | import org.apache.flink.util.Collector; 17 | import org.slf4j.Logger; 18 | import org.slf4j.LoggerFactory; 19 | import redis.clients.jedis.Jedis; 20 | import redis.clients.jedis.JedisPool; 21 | import redis.clients.jedis.JedisPoolConfig; 22 | 23 | /** 24 | * @Auther: dalan 25 | * @Date: 19-3-19 11:24 26 | * @Description: 27 | */ 28 | public class TaxiRideCleansing { 29 | /** logger */ 30 | private static final Logger LOGGER = LoggerFactory.getLogger(TaxiRideCleansing.class); 31 | // main 32 | public static void main(String[] args) throws Exception { 33 | final ParameterTool params = ParameterTool.fromArgs(args); 34 | final String input = params.get("input","/home/wmm/go_bench/flink_sources/nycTaxiRides.gz"); 35 | final int parallelism = params.getInt("parallelism",4); 36 | 37 | final int maxEventDelay = 60; 38 | final int servingSpeedFactor = 600; 39 | 40 | StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 41 | env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); 42 | env.setParallelism(parallelism); 43 | 44 | DataStream rides = env.addSource(new TaxiRideSource(input, maxEventDelay, servingSpeedFactor)); 45 | DataStream filterRides = rides 46 | .filter(new FilterFunction() { // 过滤数据 47 | @Override 48 | public boolean filter(TaxiRide taxiRide) throws Exception { // 剔除行程超出纽约城的 49 | return GeoUtils.isInNYC(taxiRide.startLon, taxiRide.startLat) && 50 | GeoUtils.isInNYC(taxiRide.endLon, taxiRide.endLat); 51 | } 52 | }) 53 | // .map(new MapFunction() { // 定义一个map函数 54 | // @Override 55 | // public TaxiRide map(TaxiRide taxiRide) throws Exception { 56 | // return new TaxiRide.EnrichedRide(taxiRide); 57 | // } 58 | // }) 59 | .flatMap(new FlatMapFunction() { // 定义一个flatMap: one-to-one 60 | @Override 61 | public void flatMap(TaxiRide taxiRide, Collector collector) throws Exception { 62 | FilterFunction valid = new FilterFunction() { 63 | @Override 64 | public boolean filter(TaxiRide taxiRide) throws Exception { 65 | return GeoUtils.isInNYC(taxiRide.startLon, taxiRide.startLat) && 66 | GeoUtils.isInNYC(taxiRide.endLon, taxiRide.endLat); 67 | } 68 | }; 69 | if(valid.filter(taxiRide)){ 70 | collector.collect(taxiRide); 71 | } 72 | 73 | } 74 | }) 75 | ; 76 | 77 | // 自定义一个sink 78 | /** 79 | RedisConfig redisConfig = new RedisConfig(); 80 | redisConfig.setHost(params.get("output-redis","127.0.0.1")); 81 | redisConfig.setPort(6379); 82 | redisConfig.setPassword(null); 83 | 84 | filterRides.addSink(new RichSinkFunction() { 85 | private transient JedisPool jedisPool; 86 | @Override 87 | public void open(Configuration parameters) throws Exception { 88 | try { 89 | super.open(parameters); 90 | 91 | JedisPoolConfig config = new JedisPoolConfig(); 92 | config.setMaxIdle(redisConfig.getMaxIdle()); 93 | config.setMinIdle(redisConfig.getMinIdle()); 94 | config.setMaxTotal(redisConfig.getMaxTotal()); 95 | jedisPool = new JedisPool(config, redisConfig.getHost(), redisConfig.getPort(), 96 | redisConfig.getConnectionTimeout(), redisConfig.getPassword(), redisConfig.getDatabase()); 97 | } catch (Exception e) { 98 | LOGGER.error("redis sink error {}", e); 99 | } 100 | } 101 | 102 | @Override 103 | public void close() throws Exception { 104 | try { 105 | jedisPool.close(); 106 | } catch (Exception e) { 107 | LOGGER.error("redis sink error {}", e); 108 | } 109 | } 110 | 111 | @Override 112 | public void invoke(TaxiRide val, Context context) throws Exception { 113 | Jedis jedis = null; 114 | try { 115 | jedis = jedisPool.getResource(); 116 | jedis.set("taxi:ride:nyc:" + val.rideId,val.toString()); 117 | } catch (Exception e) { 118 | e.printStackTrace(); 119 | } finally { 120 | if (null != jedis){ 121 | if (jedis != null) { 122 | try { 123 | jedis.close(); 124 | } catch (Exception e) { 125 | e.printStackTrace(); 126 | } 127 | } 128 | } 129 | } 130 | } 131 | }); 132 | */ 133 | filterRides.print(); 134 | 135 | env.execute("Taxi Ride Cleansing Not In NYC"); 136 | } 137 | } 138 | -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/TaxiRideCount.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import com.jdd.streaming.demos.entity.TaxiFare; 4 | import com.jdd.streaming.demos.entity.TaxiRide; 5 | import com.jdd.streaming.demos.sink.RedisCommand; 6 | import com.jdd.streaming.demos.sink.RedisConfig; 7 | import com.jdd.streaming.demos.sink.RedisPushCommand; 8 | import com.jdd.streaming.demos.sink.RedisSink; 9 | import com.jdd.streaming.demos.source.TaxiRideSource; 10 | import org.apache.flink.api.common.JobExecutionResult; 11 | import org.apache.flink.api.common.functions.MapFunction; 12 | import org.apache.flink.api.common.functions.ReduceFunction; 13 | import org.apache.flink.api.java.tuple.Tuple; 14 | import org.apache.flink.api.java.tuple.Tuple2; 15 | import org.apache.flink.api.java.utils.ParameterTool; 16 | import org.apache.flink.configuration.Configuration; 17 | import org.apache.flink.streaming.api.TimeCharacteristic; 18 | import org.apache.flink.streaming.api.datastream.DataStream; 19 | import org.apache.flink.streaming.api.datastream.KeyedStream; 20 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 21 | import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; 22 | import org.apache.flink.streaming.api.functions.source.SourceFunction; 23 | import org.apache.flink.util.StringUtils; 24 | import org.slf4j.Logger; 25 | import org.slf4j.LoggerFactory; 26 | import redis.clients.jedis.Jedis; 27 | import redis.clients.jedis.JedisPool; 28 | import redis.clients.jedis.JedisPoolConfig; 29 | import sun.util.resources.ga.LocaleNames_ga; 30 | 31 | /** 32 | * @Auther: dalan 33 | * @Date: 19-3-18 15:09 34 | * @Description: 35 | */ 36 | public class TaxiRideCount { 37 | /** logger */ 38 | private static final Logger LOGGER = LoggerFactory.getLogger(TaxiRideCount.class); 39 | // main 40 | public static void main(String[] args) throws Exception { 41 | // 读取配置参数: 文件路径/最大延迟时间/ 42 | final ParameterTool params = ParameterTool.fromArgs(args); 43 | String path = params.get("file-path","/home/wmm/go_bench/flink_sources/nycTaxiRides.gz"); 44 | int maxDeply = params.getInt("max-delay",60); 45 | int servingSpeed = params.getInt("serving-speed",600); 46 | 47 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 48 | env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); 49 | env.getConfig().disableSysoutLogging(); 50 | 51 | // 指定TaxiRide 52 | DataStream rides = env.addSource(new TaxiRideSource(path, maxDeply, servingSpeed)); 53 | 54 | DataStream> tuples = rides.map(new MapFunction>() { 55 | @Override 56 | public Tuple2 map(TaxiRide ride) throws Exception { 57 | return new Tuple2(ride.driverId, 1L); // 基于行程中的司机id划分数据 并进行统计 58 | } 59 | }); 60 | 61 | KeyedStream, Tuple> keyByDriverId = tuples.keyBy(0); // 基于司机id进行数据划分 62 | DataStream> rideCounts = keyByDriverId.sum(1); // 累计每个司机的里程数 63 | 64 | RedisConfig redisConfig = new RedisConfig(); 65 | redisConfig.setHost(params.get("output-redis","127.0.0.1")); 66 | redisConfig.setPort(6379); 67 | redisConfig.setPassword(null); 68 | //RedisSink redisSink = new RedisSink(redisConfig); 69 | 70 | // rideCounts.map(new MapFunction, RedisCommand>() { // 落地redis 71 | // @Override 72 | // public RedisCommand map(Tuple2 in) throws Exception { 73 | // return new RedisPushCommand("taxi:ride:" + in.f0, Long.toString(in.f1)); 74 | // //return new RedisPushCommand("taxi:ride:" + in.f0, new String[]{Long.toString(in.f1)}); 75 | // } 76 | // }).addSink(redisSink); 77 | 78 | // 直接使用匿名类实现redis sink 79 | rideCounts.addSink(new RichSinkFunction>() { // 定义sink 80 | private transient JedisPool jedisPool; 81 | @Override 82 | public void open(Configuration parameters) throws Exception { // 新建redis pool 83 | try { 84 | super.open(parameters); 85 | JedisPoolConfig config = new JedisPoolConfig(); 86 | config.setMaxIdle(redisConfig.getMaxIdle()); 87 | config.setMinIdle(redisConfig.getMinIdle()); 88 | config.setMaxTotal(redisConfig.getMaxTotal()); 89 | jedisPool = new JedisPool(config, redisConfig.getHost(), redisConfig.getPort(), 90 | redisConfig.getConnectionTimeout(), redisConfig.getPassword(), redisConfig.getDatabase()); 91 | } catch (Exception e) { 92 | LOGGER.error("redis sink error {}", e); 93 | } 94 | } 95 | 96 | @Override 97 | public void close() throws Exception { // 关闭redis链接 98 | try { 99 | jedisPool.close(); 100 | } catch (Exception e) { 101 | LOGGER.error("redis sink error {}", e); 102 | } 103 | } 104 | 105 | @Override 106 | public void invoke(Tuple2 val, Context context) throws Exception { // 执行将内容落地redis 107 | Jedis jedis = null; 108 | try { 109 | jedis = jedisPool.getResource(); 110 | jedis.set("taxi:ride:" + val.f0,val.f1.toString()); 111 | } catch (Exception e) { 112 | e.printStackTrace(); 113 | } finally { 114 | if (null != jedis){ 115 | if (jedis != null) { 116 | try { 117 | jedis.close(); 118 | } catch (Exception e) { 119 | e.printStackTrace(); 120 | } 121 | } 122 | } 123 | } 124 | } 125 | }); 126 | //rideCounts.print(); 127 | 128 | JobExecutionResult result = env.execute("Ride Count By DriverID"); 129 | } 130 | } 131 | -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/TaxiTableDemo.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import com.jdd.streaming.demos.entity.TaxiFare; 4 | import com.jdd.streaming.demos.entity.TaxiRide; 5 | import com.jdd.streaming.demos.source.TaxiFareSource; 6 | import com.jdd.streaming.demos.source.TaxiRideSource; 7 | import org.apache.flink.api.java.utils.ParameterTool; 8 | import org.apache.flink.streaming.api.datastream.DataStream; 9 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 10 | import org.apache.flink.table.api.Table; 11 | import org.apache.flink.table.api.TableEnvironment; 12 | import org.apache.flink.table.api.java.StreamTableEnvironment; 13 | import org.slf4j.Logger; 14 | import org.slf4j.LoggerFactory; 15 | 16 | /** 17 | * @Auther: dalan 18 | * @Date: 19-3-28 10:45 19 | * @Description: 20 | */ 21 | public class TaxiTableDemo { 22 | /** logger */ 23 | private static final Logger LOGGER = LoggerFactory.getLogger(TaxiTableDemo.class); 24 | 25 | public static void main(String[] args) throws Exception { 26 | final ParameterTool params = ParameterTool.fromArgs(args); 27 | String taxiRide = params.get("taxi-ride-path","/home/wmm/go_bench/flink_sources/nycTaxiRides.gz"); 28 | String taxiFare = params.get("taxi-fare-path","/home/wmm/go_bench/flink_sources/nycTaxiFares.gz"); 29 | 30 | // 31 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 32 | final StreamTableEnvironment tEnv = TableEnvironment.getTableEnvironment(env); 33 | 34 | final int maxEventDelay = 60; 35 | final int servingSpeedFactor = 600; 36 | 37 | DataStream rides = env.addSource(new TaxiRideSource(taxiRide, maxEventDelay, servingSpeedFactor)); 38 | DataStream fares = env.addSource(new TaxiFareSource(taxiFare, maxEventDelay, servingSpeedFactor)); 39 | Table rideTable = tEnv.fromDataStream(rides); 40 | Table fareTable = tEnv.fromDataStream(fares); 41 | 42 | // 注册DataStream到TableEnvironment 43 | //tEnv.registerDataStream("rides", rides); 44 | //tEnv.registerDataStream("fares", fares); 45 | 46 | // Table result = tEnv.sqlQuery("SELECT * FROM " + fareTable) 47 | // //.leftOuterJoin(fareTable,"rideId") 48 | // ; 49 | // 50 | // //result.printSchema(); 51 | // tEnv.toAppendStream(result, TaxiFare.class).print(); 52 | 53 | Table result = tEnv.sqlQuery("SELECT * FROM " + rideTable); 54 | result.printSchema(); 55 | tEnv.toAppendStream(result, TaxiRide.class).print(); 56 | env.execute("taxi table demo"); 57 | } 58 | } 59 | -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/entity/TaxiFare.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos.entity; 2 | 3 | import org.joda.time.DateTime; 4 | import org.joda.time.format.DateTimeFormat; 5 | import org.joda.time.format.DateTimeFormatter; 6 | 7 | import java.io.Serializable; 8 | import java.util.Locale; 9 | 10 | /** 11 | * @Auther: dalan 12 | * @Date: 19-3-18 15:23 13 | * @Description: 14 | */ 15 | public class TaxiFare implements Serializable { 16 | private static transient DateTimeFormatter format = 17 | DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss").withLocale(Locale.US).withZoneUTC(); 18 | 19 | public TaxiFare(){ 20 | this.startTime = new DateTime(); 21 | } 22 | 23 | public TaxiFare(long rideId, long taxiId, long driverId, DateTime startTime, String paymentType, float tip, float tolls, float totalFare) { 24 | 25 | this.rideId = rideId; 26 | this.taxiId = taxiId; 27 | this.driverId = driverId; 28 | this.startTime = startTime; 29 | this.paymentType = paymentType; 30 | this.tip = tip; 31 | this.tolls = tolls; 32 | this.totalFare = totalFare; 33 | } 34 | 35 | public long rideId; 36 | public long taxiId; 37 | public long driverId; 38 | public DateTime startTime; 39 | public String paymentType; 40 | public float tip; 41 | public float tolls; 42 | public float totalFare; 43 | 44 | @Override 45 | public String toString() { 46 | StringBuilder sb = new StringBuilder(); 47 | sb.append(rideId).append(","); 48 | sb.append(taxiId).append(","); 49 | sb.append(driverId).append(","); 50 | sb.append(startTime.toString(format)).append(","); 51 | sb.append(paymentType).append(","); 52 | sb.append(tip).append(","); 53 | sb.append(tolls).append(","); 54 | sb.append(totalFare); 55 | 56 | return sb.toString(); 57 | } 58 | 59 | // 将记录转为entity 60 | public static TaxiFare fromString(String line) { 61 | 62 | String[] tokens = line.split(","); 63 | if (tokens.length != 8) { 64 | throw new RuntimeException("Invalid record: " + line); 65 | } 66 | 67 | TaxiFare ride = new TaxiFare(); 68 | 69 | try { 70 | ride.rideId = Long.parseLong(tokens[0]); 71 | ride.taxiId = Long.parseLong(tokens[1]); 72 | ride.driverId = Long.parseLong(tokens[2]); 73 | ride.startTime = DateTime.parse(tokens[3], format); 74 | ride.paymentType = tokens[4]; 75 | ride.tip = tokens[5].length() > 0 ? Float.parseFloat(tokens[5]) : 0.0f; 76 | ride.tolls = tokens[6].length() > 0 ? Float.parseFloat(tokens[6]) : 0.0f; 77 | ride.totalFare = tokens[7].length() > 0 ? Float.parseFloat(tokens[7]) : 0.0f; 78 | 79 | } catch (NumberFormatException nfe) { 80 | throw new RuntimeException("Invalid record: " + line, nfe); 81 | } 82 | 83 | return ride; 84 | } 85 | 86 | @Override 87 | public boolean equals(Object obj) { 88 | return obj instanceof TaxiFare && 89 | this.rideId == ((TaxiFare)obj).rideId; 90 | } 91 | 92 | @Override 93 | public int hashCode() { 94 | return (int)this.rideId; 95 | } 96 | 97 | public long getEventTime(){ 98 | return startTime.getMillis(); 99 | } 100 | } 101 | -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/sink/RedisCommand.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos.sink; 2 | 3 | /** 4 | * @Auther: dalan 5 | * @Date: 19-3-15 14:39 6 | * @Description: 7 | */ 8 | import redis.clients.jedis.Jedis; 9 | 10 | import java.io.Serializable; 11 | 12 | public abstract class RedisCommand implements Serializable { 13 | String key; 14 | Object value; 15 | int expire; 16 | 17 | public RedisCommand(){} 18 | 19 | public RedisCommand(String key, Object value, int expire) { 20 | this.key = key; 21 | this.value = value; 22 | this.expire = expire; 23 | } 24 | 25 | 26 | public RedisCommand(String key, Object value) { 27 | this.key = key; 28 | this.value = value; 29 | this.expire = -1; 30 | } 31 | 32 | public void execute(Jedis jedis) { 33 | invokeByCommand(jedis); 34 | if (-1 < this.expire) { 35 | jedis.expire(key, expire); 36 | } 37 | } 38 | 39 | public abstract void invokeByCommand(Jedis jedis); 40 | 41 | public String getKey() { 42 | return key; 43 | } 44 | 45 | public void setKey(String key) { 46 | this.key = key; 47 | } 48 | 49 | public Object getValue() { 50 | return value; 51 | } 52 | 53 | public void setValue(Object value) { 54 | this.value = value; 55 | } 56 | 57 | public int getExpire() { 58 | return expire; 59 | } 60 | 61 | public void setExpire(int expire) { 62 | this.expire = expire; 63 | } 64 | } 65 | -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/sink/RedisConfig.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos.sink; 2 | 3 | /** 4 | * @Auther: dalan 5 | * @Date: 19-3-15 14:39 6 | * @Description: 7 | */ 8 | import java.io.Serializable; 9 | 10 | public class RedisConfig implements Serializable { 11 | private static final long serialVersionUID = 1L; 12 | 13 | private String host = "127.0.0.1"; 14 | private int port = 6379; 15 | private int database = 0; 16 | private String password = null; 17 | protected int maxTotal = 8; 18 | protected int maxIdle = 8; 19 | protected int minIdle = 0; 20 | protected int connectionTimeout = 2000; 21 | 22 | public RedisConfig host(String host) { 23 | this.host = (host); 24 | return this; 25 | } 26 | 27 | public RedisConfig port(int port) { 28 | this.port = (port); 29 | return this; 30 | } 31 | 32 | public RedisConfig database(int database) { 33 | this.database = (database); 34 | return this; 35 | } 36 | 37 | public RedisConfig password(String password) { 38 | this.password = (password); 39 | return this; 40 | } 41 | 42 | public RedisConfig maxTotal(int maxTotal) { 43 | this.maxTotal = (maxTotal); 44 | return this; 45 | } 46 | 47 | public RedisConfig maxIdle(int maxIdle) { 48 | this.maxIdle = (maxIdle); 49 | return this; 50 | } 51 | 52 | public RedisConfig minIdle(int minIdle) { 53 | this.minIdle = (minIdle); 54 | return this; 55 | } 56 | 57 | public RedisConfig connectionTimeout(int connectionTimeout) { 58 | this.connectionTimeout = (connectionTimeout); 59 | return this; 60 | } 61 | 62 | public static long getSerialVersionUID() { 63 | return serialVersionUID; 64 | } 65 | 66 | public String getHost() { 67 | return host; 68 | } 69 | 70 | public void setHost(String host) { 71 | this.host = host; 72 | } 73 | 74 | public int getPort() { 75 | return port; 76 | } 77 | 78 | public void setPort(int port) { 79 | this.port = port; 80 | } 81 | 82 | public int getDatabase() { 83 | return database; 84 | } 85 | 86 | public void setDatabase(int database) { 87 | this.database = database; 88 | } 89 | 90 | public String getPassword() { 91 | return password; 92 | } 93 | 94 | public void setPassword(String password) { 95 | this.password = password; 96 | } 97 | 98 | public int getMaxTotal() { 99 | return maxTotal; 100 | } 101 | 102 | public void setMaxTotal(int maxTotal) { 103 | this.maxTotal = maxTotal; 104 | } 105 | 106 | public int getMaxIdle() { 107 | return maxIdle; 108 | } 109 | 110 | public void setMaxIdle(int maxIdle) { 111 | this.maxIdle = maxIdle; 112 | } 113 | 114 | public int getMinIdle() { 115 | return minIdle; 116 | } 117 | 118 | public void setMinIdle(int minIdle) { 119 | this.minIdle = minIdle; 120 | } 121 | 122 | public int getConnectionTimeout() { 123 | return connectionTimeout; 124 | } 125 | 126 | public void setConnectionTimeout(int connectionTimeout) { 127 | this.connectionTimeout = connectionTimeout; 128 | } 129 | } 130 | -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/sink/RedisPushCommand.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos.sink; 2 | 3 | /** 4 | * @Auther: dalan 5 | * @Date: 19-3-15 14:41 6 | * @Description: 7 | */ 8 | import redis.clients.jedis.Jedis; 9 | 10 | /** 11 | * override rpush command 12 | */ 13 | public class RedisPushCommand extends RedisCommand { 14 | public RedisPushCommand(){super();} 15 | 16 | public RedisPushCommand(String key, Object value) { 17 | super(key, value); 18 | } 19 | 20 | public RedisPushCommand(String key, Object value, int expire) { 21 | super(key, value, expire); 22 | } 23 | 24 | @Override 25 | public void invokeByCommand(Jedis jedis) { 26 | //jedis.rpush(getKey(), (String[]) getValue()); 27 | jedis.set(getKey(),getValue().toString()); 28 | } 29 | 30 | 31 | } -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/sink/RedisSink.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos.sink; 2 | 3 | /** 4 | * @Auther: dalan 5 | * @Date: 19-3-15 14:38 6 | * @Description: 7 | */ 8 | import org.apache.flink.configuration.Configuration; 9 | import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; 10 | import org.apache.flink.util.Preconditions; 11 | import org.slf4j.Logger; 12 | import org.slf4j.LoggerFactory; 13 | import redis.clients.jedis.Jedis; 14 | import redis.clients.jedis.JedisPool; 15 | import redis.clients.jedis.JedisPoolConfig; 16 | 17 | /*** 18 | * redis sink 19 | * 20 | * support any operation 21 | * support set expire 22 | */ 23 | public class RedisSink extends RichSinkFunction { 24 | private static final long serialVersionUID = 1L; 25 | 26 | private static final Logger LOG = LoggerFactory.getLogger(RedisSink.class); 27 | 28 | private final RedisConfig redisConfig; 29 | 30 | private transient JedisPool jedisPool; 31 | 32 | public RedisSink(RedisConfig redisConfig) { 33 | this.redisConfig = Preconditions.checkNotNull(redisConfig, "Redis client config should not be null"); 34 | } 35 | 36 | 37 | @Override 38 | public void open(Configuration parameters) throws Exception { 39 | try { 40 | super.open(parameters); 41 | JedisPoolConfig config = new JedisPoolConfig(); 42 | config.setMaxIdle(redisConfig.getMaxIdle()); 43 | config.setMinIdle(redisConfig.getMinIdle()); 44 | config.setMaxTotal(redisConfig.getMaxTotal()); 45 | jedisPool = new JedisPool(config, redisConfig.getHost(), redisConfig.getPort(), 46 | redisConfig.getConnectionTimeout(), redisConfig.getPassword(), redisConfig.getDatabase()); 47 | } catch (Exception e) { 48 | LOG.error("redis sink error {}", e); 49 | } 50 | } 51 | 52 | @Override 53 | public void close() throws Exception { 54 | try { 55 | jedisPool.close(); 56 | } catch (Exception e) { 57 | LOG.error("redis sink error {}", e); 58 | } 59 | } 60 | 61 | 62 | private Jedis getJedis() { 63 | Jedis jedis = jedisPool.getResource(); 64 | return jedis; 65 | } 66 | 67 | public void closeResource(Jedis jedis) { 68 | if (jedis != null) { 69 | try { 70 | jedis.close(); 71 | } catch (Exception e) { 72 | e.printStackTrace(); 73 | } 74 | } 75 | } 76 | 77 | @Override 78 | public void invoke(RedisCommand command, Context context) { 79 | Jedis jedis = null; 80 | try { 81 | jedis = getJedis(); 82 | command.execute(jedis); 83 | } catch (Exception e) { 84 | e.printStackTrace(); 85 | } finally { 86 | if (null != jedis) 87 | closeResource(jedis); 88 | } 89 | } 90 | 91 | } 92 | -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/source/CheckpointTaxiFareSource.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos.source; 2 | 3 | import com.jdd.streaming.demos.entity.TaxiFare; 4 | import com.jdd.streaming.demos.entity.TaxiRide; 5 | import org.apache.flink.streaming.api.checkpoint.ListCheckpointed; 6 | import org.apache.flink.streaming.api.functions.source.SourceFunction; 7 | import org.apache.flink.streaming.api.watermark.Watermark; 8 | 9 | import java.io.*; 10 | import java.util.Collections; 11 | import java.util.List; 12 | import java.util.zip.GZIPInputStream; 13 | 14 | /** 15 | * @Auther: dalan 16 | * @Date: 19-3-29 18:51 17 | * @Description: 18 | */ 19 | public class CheckpointTaxiFareSource implements SourceFunction, ListCheckpointed { 20 | private final String dataFilePath; 21 | private final int servingSpeed; 22 | 23 | private transient BufferedReader reader; 24 | private transient InputStream gzipStream; 25 | 26 | // record the number of emitted events 27 | private long eventCnt = 0; 28 | 29 | public CheckpointTaxiFareSource(String dataFilePath){this(dataFilePath, 1);} 30 | public CheckpointTaxiFareSource(String dataFilePath, int servingSpeed) { 31 | this.dataFilePath = dataFilePath; 32 | this.servingSpeed = servingSpeed; 33 | } 34 | 35 | @Override public List snapshotState(long l, long l1) throws Exception { 36 | return Collections.singletonList(eventCnt); 37 | } 38 | 39 | @Override public void restoreState(List list) throws Exception { 40 | for (Long s: list) { 41 | this.eventCnt = s; 42 | } 43 | } 44 | 45 | @Override public void run(SourceContext sourceContext) throws Exception { 46 | final Object lock = sourceContext.getCheckpointLock(); 47 | 48 | gzipStream = new GZIPInputStream(new FileInputStream(dataFilePath)); 49 | reader = new BufferedReader(new InputStreamReader(gzipStream, "UTF-8")); 50 | 51 | Long prevRideTime = null; 52 | 53 | String line; 54 | long cnt = 0; 55 | 56 | // skip emitted events 57 | while (cnt < eventCnt && reader.ready() && (line = reader.readLine()) != null) { 58 | cnt++; 59 | TaxiFare fare = TaxiFare.fromString(line); 60 | prevRideTime = getEventTime(fare); 61 | } 62 | 63 | // emit all subsequent events proportial to their timestamp 64 | while (reader.ready() && (line = reader.readLine()) != null) { 65 | 66 | TaxiFare fare = TaxiFare.fromString(line); 67 | long rideTime = getEventTime(fare); 68 | 69 | if (prevRideTime != null) { 70 | long diff = (rideTime - prevRideTime) / servingSpeed; 71 | Thread.sleep(diff); 72 | } 73 | 74 | synchronized (lock) { 75 | eventCnt++; 76 | sourceContext.collectWithTimestamp(fare, rideTime); 77 | sourceContext.emitWatermark(new Watermark(rideTime - 1)); 78 | } 79 | 80 | prevRideTime = rideTime; 81 | } 82 | 83 | this.reader.close(); 84 | this.reader = null; 85 | this.gzipStream.close(); 86 | } 87 | 88 | @Override public void cancel() { 89 | try { 90 | if (this.reader != null) { 91 | this.reader.close(); 92 | } 93 | if (this.gzipStream != null) { 94 | this.gzipStream.close(); 95 | } 96 | } catch(IOException ioe) { 97 | throw new RuntimeException("Could not cancel SourceFunction", ioe); 98 | } finally { 99 | this.reader = null; 100 | this.gzipStream = null; 101 | } 102 | } 103 | 104 | public long getEventTime(TaxiFare fare) { 105 | return fare.getEventTime(); 106 | } 107 | } 108 | -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/java/com/jdd/streaming/demos/source/CheckpointTaxiRideSource.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos.source; 2 | 3 | import com.jdd.streaming.demos.entity.TaxiFare; 4 | import com.jdd.streaming.demos.entity.TaxiRide; 5 | import org.apache.flink.streaming.api.checkpoint.ListCheckpointed; 6 | import org.apache.flink.streaming.api.functions.source.SourceFunction; 7 | import org.apache.flink.streaming.api.watermark.Watermark; 8 | 9 | import java.io.*; 10 | import java.util.Collections; 11 | import java.util.List; 12 | import java.util.zip.GZIPInputStream; 13 | 14 | /** 15 | * @Auther: dalan 16 | * @Date: 19-3-29 18:52 17 | * @Description: 18 | */ 19 | public class CheckpointTaxiRideSource implements SourceFunction, ListCheckpointed { 20 | private final String dataFilePath; 21 | private final int servingSpeed; 22 | 23 | private transient BufferedReader reader; 24 | private transient InputStream gzipStream; 25 | 26 | private long eventCnt = 0; 27 | 28 | public CheckpointTaxiRideSource(String dataFilePath){this(dataFilePath,1);} 29 | public CheckpointTaxiRideSource(String dataFilePath, int servingSpeedFactor) { 30 | this.dataFilePath = dataFilePath; 31 | this.servingSpeed = servingSpeedFactor; 32 | } 33 | 34 | @Override public List snapshotState(long l, long l1) throws Exception { 35 | return Collections.singletonList(eventCnt); 36 | } 37 | 38 | @Override public void restoreState(List list) throws Exception { 39 | for (Long s: list) { 40 | this.eventCnt = s; 41 | } 42 | } 43 | 44 | // source 产生数据 45 | @Override public void run(SourceContext sourceContext) throws Exception { 46 | final Object lock = sourceContext.getCheckpointLock(); 47 | 48 | gzipStream = new GZIPInputStream(new FileInputStream(dataFilePath)); 49 | reader = new BufferedReader(new InputStreamReader(gzipStream,"UTF-8")); 50 | 51 | Long prevRideTime = null; 52 | String line; 53 | long cnt = 0; 54 | 55 | // 从gz中读取文件内容 并逐行转为TaxiRide 56 | while (cnt < eventCnt && reader.ready() && (line = reader.readLine()) != null){ 57 | cnt++; 58 | TaxiRide ride = TaxiRide.fromString(line); 59 | long rideTime = getEventTime(ride); 60 | 61 | if(prevRideTime != null){ // 控制数据产生的速度: 当前taxiride与上一次taxiride时间差越大 sleep时间就越久 62 | long diff = (rideTime - prevRideTime) / servingSpeed; 63 | Thread.sleep(diff); 64 | } 65 | 66 | synchronized (lock){ 67 | eventCnt++; 68 | sourceContext.collectWithTimestamp(ride, rideTime); 69 | sourceContext.emitWatermark(new Watermark(rideTime - 1)); 70 | } 71 | prevRideTime = rideTime; 72 | } 73 | 74 | this.reader.close(); 75 | this.reader = null; 76 | 77 | this.gzipStream.close(); 78 | this.gzipStream = null; 79 | } 80 | 81 | // 放弃数据产生: 常用于释放资源释放 82 | @Override public void cancel() { 83 | try{ 84 | if(this.reader != null){ 85 | this.reader.close(); 86 | this.reader = null; 87 | } 88 | if(this.gzipStream != null){ 89 | this.gzipStream.close(); 90 | this.gzipStream = null; 91 | } 92 | }catch (IOException io){ 93 | throw new RuntimeException("Could not cancel sourcefuntion", io); 94 | }finally { 95 | this.reader = null; 96 | this.gzipStream = null; 97 | } 98 | } 99 | 100 | public long getEventTime(TaxiRide ride) { 101 | return ride.getEventTime(); 102 | } 103 | } 104 | -------------------------------------------------------------------------------- /flink-taxi-demos/src/main/resources/log4j.properties: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | # Licensed to the Apache Software Foundation (ASF) under one 3 | # or more contributor license agreements. See the NOTICE file 4 | # distributed with this work for additional information 5 | # regarding copyright ownership. The ASF licenses this file 6 | # to you under the Apache License, Version 2.0 (the 7 | # "License"); you may not use this file except in compliance 8 | # with the License. You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | ################################################################################ 18 | 19 | log4j.rootLogger=INFO, console 20 | 21 | log4j.appender.console=org.apache.log4j.ConsoleAppender 22 | log4j.appender.console.layout=org.apache.log4j.PatternLayout 23 | log4j.appender.console.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n 24 | -------------------------------------------------------------------------------- /flink-userdefine-pojo/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | flink-demos 7 | com.jdd.streaming 8 | 1.0-SNAPSHOT 9 | 10 | 4.0.0 11 | flink-userdefine-pojo 12 | flink-pojo 13 | jar 14 | 15 | 16 | 17 | org.apache.flink 18 | flink-streaming-java_2.11 19 | 20 | 21 | org.apache.flink 22 | flink-clients_2.11 23 | 24 | 25 | 26 | org.slf4j 27 | slf4j-log4j12 28 | 29 | 30 | 31 | 32 | 33 | 34 | org.apache.maven.plugins 35 | maven-jar-plugin 36 | 37 | 38 | 39 | com.jdd.streaming.demos.UserDefinePoJo 40 | 41 | 42 | 43 | 44 | 45 | org.apache.maven.plugins 46 | maven-compiler-plugin 47 | 48 | 1.8 49 | 1.8 50 | 51 | 52 | 53 | 54 | 55 | -------------------------------------------------------------------------------- /flink-userdefine-pojo/src/main/java/com/jdd/streaming/demos/AsyncOperatorDemo.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.streaming.api.functions.async.AsyncFunction; 4 | 5 | /** 6 | * @Auther: dalan 7 | * @Date: 19-3-26 09:42 8 | * @Description: 9 | */ 10 | public class AsyncOperatorDemo { 11 | // main 12 | public static void main(String[] args) { 13 | 14 | } 15 | } 16 | -------------------------------------------------------------------------------- /flink-userdefine-pojo/src/main/java/com/jdd/streaming/demos/FoldDemo.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.api.common.functions.FoldFunction; 4 | import org.apache.flink.api.common.functions.MapFunction; 5 | import org.apache.flink.api.java.tuple.Tuple; 6 | import org.apache.flink.api.java.tuple.Tuple2; 7 | import org.apache.flink.streaming.api.datastream.DataStream; 8 | import org.apache.flink.streaming.api.datastream.KeyedStream; 9 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 10 | import org.slf4j.Logger; 11 | import org.slf4j.LoggerFactory; 12 | 13 | /** 14 | * @Auther: dalan 15 | * @Date: 19-3-25 11:33 16 | * @Description: 17 | */ 18 | public class FoldDemo { 19 | /** logger */ 20 | //private static final Logger LOGGER = LoggerFactory.getLogger(FoldDemo.class); 21 | // main 22 | public static void main(String[] args) throws Exception { 23 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 24 | //env.getConfig().disableSysoutLogging(); 25 | 26 | DataStream source = env.fromElements("hello","world","hello","world","hello","world"); 27 | 28 | source.print(); 29 | 30 | KeyedStream, Tuple> keyedStream = source.map(new MapFunction>() { 31 | @Override 32 | public Tuple2 map(String s) throws Exception { 33 | return new Tuple2(s, 1); 34 | } 35 | }) 36 | .keyBy(0); 37 | 38 | DataStream> aggregationDatas = 39 | keyedStream 40 | .sum(1); 41 | 42 | aggregationDatas.print(); 43 | 44 | 45 | 46 | // fold的使用: 该操作不建议使用 47 | // DataStream foldDatas = keyedStream.fold("test", new FoldFunction, String>() { 48 | // @Override 49 | // public String fold(String s, Tuple2 o) throws Exception { 50 | // return s + "=" + o; 51 | // } 52 | // }); 53 | // 54 | // foldDatas.print(); 55 | 56 | env.execute("the fold method is used."); 57 | } 58 | } 59 | -------------------------------------------------------------------------------- /flink-userdefine-pojo/src/main/java/com/jdd/streaming/demos/KeySelectorDemo.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.commons.lang3.StringUtils; 4 | import org.apache.flink.api.common.functions.MapFunction; 5 | import org.apache.flink.api.java.tuple.Tuple; 6 | import org.apache.flink.api.java.tuple.Tuple2; 7 | import org.apache.flink.api.java.tuple.Tuple3; 8 | import org.apache.flink.streaming.api.TimeCharacteristic; 9 | import org.apache.flink.streaming.api.datastream.DataStream; 10 | import org.apache.flink.streaming.api.datastream.WindowedStream; 11 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 12 | import org.apache.flink.streaming.api.functions.windowing.WindowFunction; 13 | import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows; 14 | import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows; 15 | import org.apache.flink.streaming.api.windowing.time.Time; 16 | import org.apache.flink.streaming.api.windowing.windows.TimeWindow; 17 | import org.apache.flink.util.Collector; 18 | 19 | import java.util.Iterator; 20 | import java.util.StringJoiner; 21 | import java.util.concurrent.TimeUnit; 22 | 23 | 24 | /** 25 | * @Auther: dalan 26 | * @Date: 19-3-25 13:25 27 | * @Description: 28 | */ 29 | public class KeySelectorDemo { 30 | // main 31 | public static void main(String[] args) throws Exception { 32 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 33 | DataStream ints = env.fromElements("a","b","c","aa","bb","cc","aaa","bbb","ccc"); 34 | //env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); 35 | 36 | DataStream> mins = ints 37 | .map(new MapFunction>() { 38 | @Override 39 | public Tuple2 map(String data) throws Exception { 40 | return new Tuple2(data,1L); 41 | } 42 | }) 43 | .keyBy(0) 44 | .timeWindow(Time.of(5, TimeUnit.SECONDS)) 45 | .sum(1); 46 | // .apply(new WindowFunction, Tuple2, Tuple, TimeWindow>() { 47 | // @Override 48 | // public void apply(Tuple tuple, TimeWindow timeWindow, Iterable> iterable, Collector> collector) throws Exception { 49 | // Iterator> itor = iterable.iterator(); 50 | // Long sum = 0L; 51 | // while (itor.hasNext()){ 52 | // Tuple2 t = itor.next(); 53 | // sum += t.f1; 54 | // } 55 | // 56 | // System.out.println( 57 | // "\n the key is " + tuple.toString() 58 | // + "\n (key,value) sets is " + itor.toString() 59 | // + "\n the sum is " + sum 60 | // ); 61 | // 62 | // collector.collect(new Tuple2(tuple.toString(), sum)); 63 | // } 64 | // }); 65 | 66 | 67 | 68 | mins.print(); 69 | 70 | env.execute("the key aggregations demo"); 71 | 72 | Thread.sleep(10000); 73 | } 74 | } 75 | -------------------------------------------------------------------------------- /flink-userdefine-pojo/src/main/java/com/jdd/streaming/demos/TimeCharacteristicDemo.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.api.common.functions.ReduceFunction; 4 | import org.apache.flink.streaming.api.TimeCharacteristic; 5 | import org.apache.flink.streaming.api.datastream.DataStream; 6 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 7 | import org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks; 8 | import org.apache.flink.streaming.api.functions.source.SourceFunction; 9 | import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor; 10 | import org.apache.flink.streaming.api.watermark.Watermark; 11 | import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows; 12 | import org.apache.flink.streaming.api.windowing.time.Time; 13 | import org.slf4j.Logger; 14 | import org.slf4j.LoggerFactory; 15 | 16 | import javax.annotation.Nullable; 17 | import java.io.Serializable; 18 | import java.util.Random; 19 | 20 | /** 21 | * @Auther: dalan 22 | * @Date: 19-3-27 10:35 23 | * @Description: 24 | */ 25 | public class TimeCharacteristicDemo { 26 | /** logger */ 27 | private static final Logger LOGGER = LoggerFactory.getLogger(TimeCharacteristicDemo.class); 28 | // main 29 | public static void main(String[] args) throws Exception { 30 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 31 | env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); 32 | 33 | // 指定数据源 34 | DataStream users = env.addSource(new SourceFunction() { 35 | private boolean isRunnable = true; 36 | @Override public void run(SourceContext ctx) throws Exception { 37 | Random rand = new Random(7569); 38 | while(isRunnable){ 39 | User user = new User("User_" + rand.nextInt(10), 40 | rand.nextInt(100), 41 | System.currentTimeMillis() + rand.nextInt(100) - 100 42 | ); 43 | ctx.collect(user); 44 | Thread.sleep(100); 45 | } 46 | } 47 | 48 | @Override public void cancel() { 49 | isRunnable = false; 50 | } 51 | }); 52 | users.print(); 53 | 54 | 55 | DataStream ages = users 56 | .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor(Time.minutes(1L)) { // 在Source后 指定assign timestamp 和 generetic 57 | @Override public long extractTimestamp(User user) { 58 | return user.timestamp; 59 | } 60 | }) 61 | .keyBy(user -> user.user) // 基于user字段进行分组 62 | .timeWindow(Time.minutes(1L)) 63 | .reduce((t1,t2) -> new User(t1.user,t1.age + t2.age, t2.timestamp)); // 针对Window的记录进行reduce操作 64 | //.map((user)->user); 65 | 66 | ages.print(); 67 | 68 | env.execute("a event time window demo"); 69 | } 70 | 71 | public static class User implements Serializable { 72 | public String user; 73 | public Integer age; 74 | public Long timestamp; 75 | 76 | public User(){} 77 | public User(String user, Integer age, Long timestamp){ 78 | this.user = user; 79 | this.age = age; 80 | this.timestamp = timestamp; 81 | } 82 | 83 | @Override 84 | public String toString(){ 85 | return "the user " + this.user 86 | + "\tage " + this.age 87 | + "\ttimestamp " + this.timestamp; 88 | } 89 | } 90 | } 91 | -------------------------------------------------------------------------------- /flink-userdefine-pojo/src/main/java/com/jdd/streaming/demos/UserDefinePoJo.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.api.common.functions.FilterFunction; 4 | import org.apache.flink.streaming.api.datastream.DataStream; 5 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 6 | import org.slf4j.Logger; 7 | import org.slf4j.LoggerFactory; 8 | 9 | /** 10 | * @Auther: dalan 11 | * @Date: 19-3-18 11:07 12 | * @Description: 13 | */ 14 | public class UserDefinePoJo { 15 | /** logger */ 16 | private static final Logger LOGGER = LoggerFactory.getLogger(UserDefinePoJo.class); 17 | 18 | // main 19 | public static void main(String[] args) throws Exception { 20 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 21 | 22 | DataStream stones = env.fromElements( 23 | new Person("Fred",35), 24 | new Person("Wilma", 35), 25 | new Person("Tom", 45), 26 | new Person("Pebbles", 2) 27 | ); 28 | 29 | stones.filter(new FilterFunction() { 30 | public boolean filter(Person person) throws Exception { 31 | return person.age >= 18; 32 | } 33 | }) 34 | .print(); 35 | 36 | env.execute("a simple filnk demo"); 37 | } 38 | 39 | public static class Person{ 40 | public String name; 41 | public Integer age; 42 | public Person(){} 43 | 44 | public Person(String name, Integer age){ 45 | this.age = age; 46 | this.name = name; 47 | } 48 | 49 | public String toString(){ 50 | return this.name.toString() + " :age= " + this.age.toString(); 51 | } 52 | } 53 | } 54 | -------------------------------------------------------------------------------- /flink-userdefine-pojo/src/main/resources/log4j.properties: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | # Licensed to the Apache Software Foundation (ASF) under one 3 | # or more contributor license agreements. See the NOTICE file 4 | # distributed with this work for additional information 5 | # regarding copyright ownership. The ASF licenses this file 6 | # to you under the Apache License, Version 2.0 (the 7 | # "License"); you may not use this file except in compliance 8 | # with the License. You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | ################################################################################ 18 | 19 | log4j.rootLogger=INFO, console 20 | 21 | log4j.appender.console=org.apache.log4j.ConsoleAppender 22 | log4j.appender.console.layout=org.apache.log4j.PatternLayout 23 | log4j.appender.console.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n 24 | -------------------------------------------------------------------------------- /flink-wikipedia-demo/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | flink-demos 7 | com.jdd.streaming 8 | 1.0-SNAPSHOT 9 | 10 | 4.0.0 11 | flink-wikipedia-demo 12 | flink-wiki 13 | jar 14 | 15 | 16 | 17 | org.apache.flink 18 | flink-java 19 | 20 | 21 | org.apache.flink 22 | flink-streaming-java_2.11 23 | 24 | 25 | org.apache.flink 26 | flink-clients_2.11 27 | 28 | 29 | org.apache.flink 30 | flink-connector-wikiedits_2.11 31 | 1.6.4 32 | 33 | 34 | org.apache.flink 35 | flink-connector-kafka-0.11_2.11 36 | 37 | 38 | 39 | org.slf4j 40 | slf4j-log4j12 41 | 42 | 43 | 44 | 45 | 46 | 47 | org.apache.maven.plugins 48 | maven-compiler-plugin 49 | 50 | 1.8 51 | 1.8 52 | 53 | 54 | 55 | org.apache.maven.plugins 56 | maven-jar-plugin 57 | 58 | 59 | 60 | com.jdd.streaming.demos.WikipediaCount 61 | 62 | 63 | 64 | 65 | 66 | 67 | -------------------------------------------------------------------------------- /flink-wikipedia-demo/src/main/java/com/jdd/streaming/demos/WikipediaCount.java: -------------------------------------------------------------------------------- 1 | package com.jdd.streaming.demos; 2 | 3 | import org.apache.flink.api.common.functions.FoldFunction; 4 | import org.apache.flink.api.common.functions.MapFunction; 5 | import org.apache.flink.api.common.serialization.SimpleStringSchema; 6 | import org.apache.flink.api.java.functions.KeySelector; 7 | import org.apache.flink.api.java.tuple.Tuple2; 8 | import org.apache.flink.streaming.api.datastream.DataStream; 9 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 10 | import org.apache.flink.streaming.api.windowing.time.Time; 11 | import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011; 12 | import org.apache.flink.streaming.connectors.wikiedits.WikipediaEditEvent; 13 | import org.apache.flink.streaming.connectors.wikiedits.WikipediaEditsSource; 14 | import org.slf4j.Logger; 15 | import org.slf4j.LoggerFactory; 16 | 17 | 18 | /** 19 | * @Auther: dalan 20 | * @Date: 19-3-20 14:17 21 | * @Description: 22 | */ 23 | public class WikipediaCount { 24 | /** logger */ 25 | private static final Logger LOGGER = LoggerFactory.getLogger(WikipediaCount.class); 26 | 27 | // main 28 | public static void main(String[] args) throws Exception { 29 | // 创建environment 30 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 31 | // wiki source 32 | DataStream edits = env.addSource(new WikipediaEditsSource()); 33 | // 处理数据 34 | DataStream> datas = edits 35 | .map(new MapFunction>() { 36 | @Override 37 | public Tuple2 map(WikipediaEditEvent event) throws Exception { 38 | return new Tuple2<>(event.getUser(), (long)(event.getByteDiff())); 39 | } 40 | }) 41 | .keyBy(0) 42 | .timeWindow(Time.seconds(5)) 43 | .sum(1); 44 | 45 | datas.map(new MapFunction, String>() { 46 | @Override 47 | public String map(Tuple2 t) throws Exception { 48 | return t.toString(); 49 | } 50 | }) 51 | .addSink(new FlinkKafkaProducer011("localhost:9092,localhost:9092","wiki-result",new SimpleStringSchema())); 52 | // kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic wiki-result --from-beginning 查看写入的message 53 | 54 | // 默认实现自定义的key selector: 选择user为key; 55 | // 定义一个滚动窗口时效5s 56 | // 使用fold完成统计 57 | // 58 | // .keyBy(new KeySelector() { // 实现key selector 59 | // @Override 60 | // public Object getKey(WikipediaEditEvent event) throws Exception { 61 | // return event.getUser(); 62 | // } 63 | // }) 64 | // .timeWindow(Time.seconds(5)) // 定义一个滚动窗口:5s周期的 65 | // .fold(new Tuple2<>("", 0L), new FoldFunction>() { 66 | // @Override 67 | // public Tuple2 fold(Tuple2 t, WikipediaEditEvent o) throws Exception { 68 | // t.f0 = o.getUser(); 69 | // t.f1 += (o.getByteDiff()); 70 | // return t; 71 | // } 72 | // }); 73 | 74 | datas.print(); 75 | 76 | env.execute("Wikipedia diff bytes count"); 77 | } 78 | } 79 | -------------------------------------------------------------------------------- /flink-wikipedia-demo/src/main/resources/log4j.properties: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | # Licensed to the Apache Software Foundation (ASF) under one 3 | # or more contributor license agreements. See the NOTICE file 4 | # distributed with this work for additional information 5 | # regarding copyright ownership. The ASF licenses this file 6 | # to you under the Apache License, Version 2.0 (the 7 | # "License"); you may not use this file except in compliance 8 | # with the License. You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | ################################################################################ 18 | 19 | log4j.rootLogger=INFO, console 20 | 21 | log4j.appender.console=org.apache.log4j.ConsoleAppender 22 | log4j.appender.console.layout=org.apache.log4j.PatternLayout 23 | log4j.appender.console.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n 24 | -------------------------------------------------------------------------------- /flink-wikis/flinkwikis.iml: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /window_watermark.md: -------------------------------------------------------------------------------- 1 | 一.分类 2 | TunbingWindow:滚动窗口 3 | 4 | 1.前后两个计算不存在重叠 5 | 6 | 子主题 1 7 | 子主题 2 8 | 9 | SlidingWindow:滑动窗口 10 | 11 | 1.元素会在多个窗口中存在,存在重叠 12 | 13 | 子主题 1 14 | 子主题 2 15 | 16 | 二.方式 17 | 基于Time方式 18 | EventTime: 19 | 1.每个独立event在其生产设备上产生的时间; 20 | 2.event记录的时间戳在进入flink时已经存在; 21 | 在使用的时候需要提供时间戳提取方法 22 | (实现AssignTimeStampAndWaterMark接口) 23 | 3.使用eventtime时,数据决定了数据进度时间,并不受系统的物理时钟影响; 24 | 4.基于EventTime实现的程序需要指定如何生成TimeStamp和WaterMark这样能够显示event处理进度; 25 | 26 | IngestionTime: 27 | 1.该time记录的是event进入flink的时间;一般是在source操作时每个event记录都能得到当前source的时间,而后续的基于time的操作使用的时间戳即为该时间戳; 28 | 2.IngestTime处于EventTime和ProcessTime间;对比ProcessTime提供稳定的timestamp,成本却有点高;同时在进行每个Window操作时对应的timestamp却是相同的,不同于ProcessTime进行每个Window操作使用的不同时间戳; 29 | 对比EventTime来说面对out-of-order或late data记录时却无能为力.除此之外两者是类似的,由于IngestTime对应的timestamp是自动生成的,则watermark不需要指定; 30 | 31 | ProcessTime: 32 | 1.event在flink中被执行的时间,是基于当前执行机器的物理时钟(会导致不同的机器上ProcessTime存在差异) 33 | 2.执行Window的操作是基于机器物理时钟周期内达到的所有记录的操作; 34 | (比如当应用09:15开始,对应的窗口大小1h,则第一个window[9:15, 10:00],第二个window[10:00,11:00]等等) 35 | 3.ProcessTime相对来说是一个比较简单,同时也不需要streams和machine间的协调的Window时间机制,并能保证最好的吞吐性能又保障了低延迟. 36 | 4.在分布式和异构的环境下,ProcessTime会受event到达系统的影响其确定性会出现不确定性; 37 | 38 | 基于Count方式 39 | 40 | 三.应用 41 | 类结构 42 | TimeCharacteristic 43 | 目前只提供:ProcessingTime/IngestionTime/EventTime三类时间类型 44 | 45 | Window: 46 | 1.窗口Window主要用来将不同event分组到不同的buckets中; 47 | 2.maxTimestamp()用来标记在某一时刻,<=maxTimestamp的记录均会到达对应的Window; 48 | 3.任何实现Window抽象类的子类,需要实现equals()和hashCode()方法来保证逻辑相同的Window得到同样的处理; 49 | 4.每种Window都需要提供的Serialzer实现用于Window类型的序列化 50 | TimeWindow: 51 | 1.时间类型窗口:具有一个从[start,end)间隔的窗口; 52 | 2.在使用过程中能够产生多个Window 53 | maxTimestamp=end-1; 54 | 例如当前创建时间10:05,对应的窗口间隔=5min,则窗口的有效间隔[10:05, 10:10);结束点 ≈ 10:09 59:999 55 | 实现equals:针对相同TimeWindow比较其窗口start和end 56 | 实现hashCode: 基于start + end将long转为int 57 | intersects:判断指定Window是否包含在当前窗口内 58 | cover:根据指定Window和当前窗口生成新的包含两个窗口的新Window 59 | 60 | GlobalWindow: 61 | 1.默认使用的Window,将所有数据放置到一个窗口;对应窗口时间戳不超过Long.MAX_VALUE即可; 62 | 2.在使用过程中只会存在一个GlobalWindow; 63 | * maxTimestamp=Long.MAX_VALUE 64 | * 实现equals:只要属于相同类型即可 65 | * 实现hashCode: return 0; 66 | 67 | Serializer: 68 | 1.主要用于完成对Window序列化 69 | 2.通过继承抽象类TpyeSerializerSingleton 70 | 71 | 接口: TypeSerializer 72 | 1.描述Flink运行时处理数据类型所需的序列化和复制方法。在该接口中的方法被假定为无状态的,因此它实际上是线程安全的。 73 | (有状态的这些方法的实现可能会导致不可预测的副作用,并且会损害程序的稳定性和正确性) 74 | 2.duplicate() 75 | 创建一个serializer的deep copy: 76 | a.若serializer是无状态的 则直接返回this 77 | b.若是serializer是有状态的,则需要创建该serializer的deep-copy 78 | 由于serializer可能会在多个thread中被使用,对应无状态的serializer是线程安全的,而有状态的则是存在非线程安全的风险; 79 | 3.snapshotConfiguration() 80 | 创建serializer当前配置snapshot及其关联的managed state一起存储; 81 | 配置snapshot需要包括serializer的parameter设置以及序列化格式等信息; 82 | 当一个新的serializer注册用来序列化相同的Managed State,配置snapshot需要确保新的Serializer的兼容性,也会存在状态迁移的需要; 83 | 4.ensureCompatibility() 84 | 用于完成不同的Serializer间的兼容性: 85 | a.snapshot配置类型属于ParameterlessTypeSerializerConfig同时当前Serializer标识相同则进行兼容处理 86 | b.当不满足a情况 则需要进行状态迁移 87 | 88 | 关于TimeWindow的mergeWindows: 89 | 针对TimeWindow定义的窗口集将重叠/交叉部分进行合并,减少Window的数量; 90 | 首先会将所有的Window基于其start字段进行排序,便于Window合并. 91 | a.当前记录的Window包含迭代的Window,则会以当前Window作为key,并将迭代Window放置到Set中 92 | b.当前记录的Window并不包含迭代的Window,重新添加一条新的记录> 93 | 以下是使用伪码 94 | ``` 95 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 96 | // 指定使用eventtime 97 | env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime); 98 | 99 | DataStream stream = env.addSource(new FlinkKafkaConsumer09(topic, schema, props)); 100 | 101 | stream 102 | .keyBy( (event) -> event.getUser() ) 103 | .timeWindow(Time.hours(1)) // 指定窗口:大小=1h,以自然小时为周期 104 | .reduce( (a, b) -> a.add(b) ) 105 | .addSink(...); 106 | ``` 107 | 108 | 四. Watermark 109 | 在Flink中提供了使用Eventtime来衡量event被处理额机制: Watermark.会作为DataStream的一部分进行传输并携带timestamp, 110 | 比如Watermark(t)声明了达到Window数据的结束时间,换句话说也是没有DataStream中的element对应的timestamp t' <= t 111 | --------------------------------------------------------------------------------