() {
269 | @Override
270 | public WordWithCount reduce(WordWithCount a, WordWithCount b) {
271 | return new WordWithCount(a.getWord(), a.getCount() + b.getCount());
272 | }
273 | });
274 |
275 | // print the results with a single thread, rather than in parallel
276 | windowCounts.print().setParallelism(1);
277 | env.execute("Socket Window WordCount");
278 | }
279 | }
280 | ```
281 |
282 | 打开 9000 端口,往窗口发送数据,程序接收到数据进行解析,接着使用 `timeWindow` 时间窗口搜集数据,窗口大小为 5s,滑动统计的时间间隔为 1s,于是就有了下图的输出结果:
283 |
284 | 
285 |
286 | 每秒输出当前 5s 内搜集到的数据,时间窗口统计正确运行。
287 |
288 |
289 | # 总结
290 |
291 | `Time` 时间和 `Window` 窗口是 `Flink` 的亮点,所以这两篇对它们的学习是十分必要的,了解它们的概念和原理,可以更好的去使用它们。
292 |
293 | `Flink` 是一个流处理器,提供了非常强大的运算符,帮助我们计算、聚合数据。同时提供了窗口机制,将流划分成一个个区间,对于区间,也就是窗口中的元素进行计算,达到批处理的作用。
294 |
295 | 本篇介绍了 `Window` 是什么,它的分类,滑动窗口 `Sliding` 和滚动窗口 `Tumbling`,时间 `Time` 和计数 `Count` 驱动,介绍了三大核心组件以及 `Window` 的机制,剩下的源码分析,建议小伙伴们去看下 `wuchong` 大神写的分析 :[Flink 原理与实现:Window 机制](http://wuchong.me/blog/2016/05/25/flink-internals-window-mechanism/)
296 |
297 | 如有其它学习建议或文章不对之处,请与我联系~(可在 `Github` 中提 `Issue` 或掘金中联系)
298 |
299 | ---
300 | # 项目地址
301 |
302 | [https://github.com/Vip-Augus/flink-learning-note](https://github.com/Vip-Augus/flink-learning-note)
303 |
304 | ```sh
305 | git clone https://github.com/Vip-Augus/flink-learning-note
306 | ```
307 |
308 | ---
309 | # 参考资料
310 |
311 | 1. [Flink 原理与实现:Window 机制](http://wuchong.me/blog/2016/05/25/flink-internals-window-mechanism/)
312 | 2. [Flink 从 0 到 1 学习 —— 介绍Flink中的Stream Windows](http://www.54tianzhisheng.cn/2018/12/08/Flink-Stream-Windows/)
313 | 3. [Windows](https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/windows.html#windows)
314 | 4. [Introducing Stream Windows in Apache Flink](https://flink.apache.org/news/2015/12/04/Introducing-windows.html)
315 |
316 |
317 |
318 |
319 |
320 |
321 |
--------------------------------------------------------------------------------
/note/pics/Flink_study_diagram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/Flink_study_diagram.png
--------------------------------------------------------------------------------
/note/pics/datasink/flink_demo_flow_chart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/datasink/flink_demo_flow_chart.png
--------------------------------------------------------------------------------
/note/pics/datasink/sinkfuncaiton_diagram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/datasink/sinkfuncaiton_diagram.png
--------------------------------------------------------------------------------
/note/pics/datasink/sinkfunction_customsink.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/datasink/sinkfunction_customsink.png
--------------------------------------------------------------------------------
/note/pics/datasink/sinkfunction_data_flow.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/datasink/sinkfunction_data_flow.png
--------------------------------------------------------------------------------
/note/pics/datasink/sinkfunction_datasource.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/datasink/sinkfunction_datasource.png
--------------------------------------------------------------------------------
/note/pics/datasink/sinkfunction_printsinkfunction_diagram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/datasink/sinkfunction_printsinkfunction_diagram.png
--------------------------------------------------------------------------------
/note/pics/datasink/sinkfunction_source&job.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/datasink/sinkfunction_source&job.png
--------------------------------------------------------------------------------
/note/pics/datasink/sinkfunction_verify.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/datasink/sinkfunction_verify.png
--------------------------------------------------------------------------------
/note/pics/datasource/StreamExecutionEnvironment_DataSource.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/datasource/StreamExecutionEnvironment_DataSource.png
--------------------------------------------------------------------------------
/note/pics/datasource/flink_data_source_connector.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/datasource/flink_data_source_connector.png
--------------------------------------------------------------------------------
/note/pics/datasource/flink_datasource_kafka.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/datasource/flink_datasource_kafka.png
--------------------------------------------------------------------------------
/note/pics/datasource/flink_datasource_kafka_result.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/datasource/flink_datasource_kafka_result.png
--------------------------------------------------------------------------------
/note/pics/datasource/flink_datasource_mind.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/datasource/flink_datasource_mind.png
--------------------------------------------------------------------------------
/note/pics/datasource/flink_rich_source_function.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/datasource/flink_rich_source_function.png
--------------------------------------------------------------------------------
/note/pics/helloworld/flink_dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/helloworld/flink_dashboard.png
--------------------------------------------------------------------------------
/note/pics/helloworld/flink_debug_method.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/helloworld/flink_debug_method.png
--------------------------------------------------------------------------------
/note/pics/helloworld/flink_demo_overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/helloworld/flink_demo_overview.png
--------------------------------------------------------------------------------
/note/pics/helloworld/flink_hello_world_time_winodw.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/helloworld/flink_hello_world_time_winodw.png
--------------------------------------------------------------------------------
/note/pics/helloworld/flink_helloworld_process.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/helloworld/flink_helloworld_process.png
--------------------------------------------------------------------------------
/note/pics/helloworld/flink_run_demo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/helloworld/flink_run_demo.png
--------------------------------------------------------------------------------
/note/pics/introduction/alibaba_flink_preview.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/introduction/alibaba_flink_preview.jpeg
--------------------------------------------------------------------------------
/note/pics/introduction/api-stack.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/introduction/api-stack.png
--------------------------------------------------------------------------------
/note/pics/introduction/bounded-unbounded.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/introduction/bounded-unbounded.png
--------------------------------------------------------------------------------
/note/pics/introduction/flink_application_semantic.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/introduction/flink_application_semantic.png
--------------------------------------------------------------------------------
/note/pics/introduction/flink_architecture.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/introduction/flink_architecture.jpeg
--------------------------------------------------------------------------------
/note/pics/introduction/flink_distract_architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/introduction/flink_distract_architecture.png
--------------------------------------------------------------------------------
/note/pics/introduction/flink_jobManager_worker.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/introduction/flink_jobManager_worker.jpg
--------------------------------------------------------------------------------
/note/pics/introduction/flink_state_checkpoint_introducation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/introduction/flink_state_checkpoint_introducation.png
--------------------------------------------------------------------------------
/note/pics/introduction/flink_storm_throughput_contrast.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/introduction/flink_storm_throughput_contrast.png
--------------------------------------------------------------------------------
/note/pics/introduction/flink_strom_delayed_contrast.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/introduction/flink_strom_delayed_contrast.png
--------------------------------------------------------------------------------
/note/pics/introduction/levels_of_abstraction.svg:
--------------------------------------------------------------------------------
1 |
2 |
20 |
21 |
194 |
--------------------------------------------------------------------------------
/note/pics/introduction/system_architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/introduction/system_architecture.png
--------------------------------------------------------------------------------
/note/pics/time/flink_time_introduction.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/time/flink_time_introduction.png
--------------------------------------------------------------------------------
/note/pics/time/stream_watermark_in_order.svg:
--------------------------------------------------------------------------------
1 |
2 |
3 |
21 |
22 |
315 |
--------------------------------------------------------------------------------
/note/pics/time/stream_watermark_out_of_order.svg:
--------------------------------------------------------------------------------
1 |
2 |
3 |
21 |
22 |
315 |
--------------------------------------------------------------------------------
/note/pics/transformation/flink_transformation_demo_methods.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/transformation/flink_transformation_demo_methods.png
--------------------------------------------------------------------------------
/note/pics/transformation/flink_transformation_min.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/transformation/flink_transformation_min.png
--------------------------------------------------------------------------------
/note/pics/transformation/flink_transformation_minBy&min_diff.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/transformation/flink_transformation_minBy&min_diff.png
--------------------------------------------------------------------------------
/note/pics/transformation/flink_transformation_minBy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/transformation/flink_transformation_minBy.png
--------------------------------------------------------------------------------
/note/pics/transformation/flink_transformation_official_desc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/transformation/flink_transformation_official_desc.png
--------------------------------------------------------------------------------
/note/pics/window/evictor_impls.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/window/evictor_impls.png
--------------------------------------------------------------------------------
/note/pics/window/flink_window_mechanism.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/window/flink_window_mechanism.png
--------------------------------------------------------------------------------
/note/pics/window/flink_window_method.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/window/flink_window_method.png
--------------------------------------------------------------------------------
/note/pics/window/flink_window_sensor_intro.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/window/flink_window_sensor_intro.png
--------------------------------------------------------------------------------
/note/pics/window/flink_window_xmind.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/window/flink_window_xmind.png
--------------------------------------------------------------------------------
/note/pics/window/jark_window_intro.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/window/jark_window_intro.png
--------------------------------------------------------------------------------
/note/pics/window/trigger_impls.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/window/trigger_impls.png
--------------------------------------------------------------------------------
/note/pics/window/window_assigner_impls.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vip-Augus/flink-learning-note/d787dca3d432a4f1f654a86d7bc5671c5e08e4fb/note/pics/window/window_assigner_impls.png
--------------------------------------------------------------------------------
/pom.xml:
--------------------------------------------------------------------------------
1 |
19 |
21 | 4.0.0
22 |
23 | cn.sevenyuan
24 | flink-quick-start
25 | 1.0-SNAPSHOT
26 | jar
27 |
28 | Flink Quickstart Job
29 | http://www.myorganization.org
30 |
31 |
32 | UTF-8
33 | 1.9.0
34 | 1.8
35 | 2.11
36 | ${java.version}
37 | ${java.version}
38 |
39 |
40 |
41 |
42 | apache.snapshots
43 | Apache Development Snapshot Repository
44 | https://repository.apache.org/content/repositories/snapshots/
45 |
46 | false
47 |
48 |
49 | true
50 |
51 |
52 |
53 |
54 |
55 |
56 |
57 |
58 | org.apache.flink
59 | flink-java
60 | ${flink.version}
61 | provided
62 |
63 |
64 | org.apache.flink
65 | flink-streaming-java_${scala.binary.version}
66 | ${flink.version}
67 | provided
68 |
69 |
70 |
71 | org.projectlombok
72 | lombok
73 | 1.16.4
74 |
75 |
76 |
77 |
78 |
86 |
87 |
88 |
89 |
90 | org.slf4j
91 | slf4j-log4j12
92 | 1.7.7
93 | runtime
94 |
95 |
96 | log4j
97 | log4j
98 | 1.2.17
99 | runtime
100 |
101 |
102 |
103 | com.google.guava
104 | guava
105 | 28.1-jre
106 |
107 |
108 |
109 |
110 | org.apache.flink
111 | flink-connector-kafka_2.11
112 | 1.9.0
113 |
114 |
115 |
116 |
117 | com.alibaba
118 | fastjson
119 | 1.2.61
120 |
121 |
122 |
123 |
124 | redis.clients
125 | jedis
126 | 3.1.0
127 |
128 |
129 |
130 |
131 |
132 | mysql
133 | mysql-connector-java
134 | 8.0.16
135 |
136 |
137 |
138 |
139 | com.alibaba
140 | druid
141 | 1.1.20
142 |
143 |
144 |
145 |
146 |
147 |
148 |
149 |
150 |
151 |
152 | org.apache.maven.plugins
153 | maven-compiler-plugin
154 | 3.1
155 |
156 | ${java.version}
157 | ${java.version}
158 |
159 |
160 |
161 |
162 |
163 |
164 | org.apache.maven.plugins
165 | maven-shade-plugin
166 | 3.0.0
167 |
168 |
169 |
170 | package
171 |
172 | shade
173 |
174 |
175 |
176 |
177 | org.apache.flink:force-shading
178 | com.google.code.findbugs:jsr305
179 | org.slf4j:*
180 | log4j:*
181 |
182 |
183 |
184 |
185 |
187 | *:*
188 |
189 | META-INF/*.SF
190 | META-INF/*.DSA
191 | META-INF/*.RSA
192 |
193 |
194 |
195 |
196 |
197 | cn.sevenyuan.watermark.UnOrderEventTimeWindowDemo
198 |
199 |
200 |
201 |
202 |
203 |
204 |
205 |
206 |
207 |
208 |
209 |
210 |
211 |
212 |
213 |
214 |
215 |
216 |
217 |
218 |
219 |
220 | add-dependencies-for-IDEA
221 |
222 |
223 |
224 | idea.version
225 |
226 |
227 |
228 |
229 |
230 | org.apache.flink
231 | flink-java
232 | ${flink.version}
233 | compile
234 |
235 |
236 | org.apache.flink
237 | flink-streaming-java_${scala.binary.version}
238 | ${flink.version}
239 | compile
240 |
241 |
242 |
243 |
244 |
245 |
246 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/BatchJob.java:
--------------------------------------------------------------------------------
1 | /*
2 | * Licensed to the Apache Software Foundation (ASF) under one
3 | * or more contributor license agreements. See the NOTICE file
4 | * distributed with this work for additional information
5 | * regarding copyright ownership. The ASF licenses this file
6 | * to you under the Apache License, Version 2.0 (the
7 | * "License"); you may not use this file except in compliance
8 | * with the License. You may obtain a copy of the License at
9 | *
10 | * http://www.apache.org/licenses/LICENSE-2.0
11 | *
12 | * Unless required by applicable law or agreed to in writing, software
13 | * distributed under the License is distributed on an "AS IS" BASIS,
14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 | * See the License for the specific language governing permissions and
16 | * limitations under the License.
17 | */
18 |
19 | package cn.sevenyuan;
20 |
21 | import org.apache.flink.api.java.ExecutionEnvironment;
22 |
23 | /**
24 | * Skeleton for a Flink Batch Job.
25 | *
26 | * For a tutorial how to write a Flink batch application, check the
27 | * tutorials and examples on the Flink Website.
28 | *
29 | *
To package your application into a JAR file for execution,
30 | * change the main class in the POM.xml file to this class (simply search for 'mainClass')
31 | * and run 'mvn clean package' on the command line.
32 | */
33 | public class BatchJob {
34 |
35 | public static void main(String[] args) throws Exception {
36 | // set up the batch execution environment
37 | final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
38 |
39 | /*
40 | * Here, you can start creating your execution plan for Flink.
41 | *
42 | * Start with getting some data from the environment, like
43 | * env.readTextFile(textPath);
44 | *
45 | * then, transform the resulting DataSet using operations
46 | * like
47 | * .filter()
48 | * .flatMap()
49 | * .join()
50 | * .coGroup()
51 | *
52 | * and many more.
53 | * Have a look at the programming guide for the Java API:
54 | *
55 | * http://flink.apache.org/docs/latest/apis/batch/index.html
56 | *
57 | * and the examples
58 | *
59 | * http://flink.apache.org/docs/latest/apis/batch/examples.html
60 | *
61 | */
62 |
63 | // execute program
64 | env.execute("Flink Batch Java API Skeleton");
65 | }
66 | }
67 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/StreamingJob.java:
--------------------------------------------------------------------------------
1 | /*
2 | * Licensed to the Apache Software Foundation (ASF) under one
3 | * or more contributor license agreements. See the NOTICE file
4 | * distributed with this work for additional information
5 | * regarding copyright ownership. The ASF licenses this file
6 | * to you under the Apache License, Version 2.0 (the
7 | * "License"); you may not use this file except in compliance
8 | * with the License. You may obtain a copy of the License at
9 | *
10 | * http://www.apache.org/licenses/LICENSE-2.0
11 | *
12 | * Unless required by applicable law or agreed to in writing, software
13 | * distributed under the License is distributed on an "AS IS" BASIS,
14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 | * See the License for the specific language governing permissions and
16 | * limitations under the License.
17 | */
18 |
19 | package cn.sevenyuan;
20 |
21 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
22 |
23 | /**
24 | * Skeleton for a Flink Streaming Job.
25 | *
26 | * For a tutorial how to write a Flink streaming application, check the
27 | * tutorials and examples on the Flink Website.
28 | *
29 | *
To package your application into a JAR file for execution, run
30 | * 'mvn clean package' on the command line.
31 | *
32 | *
If you change the name of the main class (with the public static void main(String[] args))
33 | * method, change the respective entry in the POM.xml file (simply search for 'mainClass').
34 | */
35 | public class StreamingJob {
36 |
37 | public static void main(String[] args) throws Exception {
38 | // set up the streaming execution environment
39 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
40 |
41 | /*
42 | * Here, you can start creating your execution plan for Flink.
43 | *
44 | * Start with getting some data from the environment, like
45 | * env.readTextFile(textPath);
46 | *
47 | * then, transform the resulting DataStream using operations
48 | * like
49 | * .filter()
50 | * .flatMap()
51 | * .join()
52 | * .coGroup()
53 | *
54 | * and many more.
55 | * Have a look at the programming guide for the Java API:
56 | *
57 | * http://flink.apache.org/docs/latest/apis/streaming/index.html
58 | *
59 | */
60 |
61 | // execute program
62 | env.execute("Flink Streaming Java API Skeleton");
63 | }
64 | }
65 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/datasource/DataSourceFromCollection.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.datasource;
2 |
3 | import cn.sevenyuan.domain.Student;
4 | import com.google.common.collect.Lists;
5 | import org.apache.flink.api.common.functions.FlatMapFunction;
6 | import org.apache.flink.api.common.functions.MapFunction;
7 | import org.apache.flink.streaming.api.datastream.DataStreamSource;
8 | import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
9 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
10 | import org.apache.flink.util.Collector;
11 |
12 | import java.util.List;
13 |
14 | /**
15 | * 来源于集合
16 | * @author JingQ at 2019-09-22
17 | */
18 | public class DataSourceFromCollection {
19 |
20 | public static void main(String[] args) throws Exception {
21 | StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
22 |
23 | // DataStreamSource source1 = collection1(env);
24 | // SingleOutputStreamOperator operator1 = source1.map(new MapFunction() {
25 | // @Override
26 | // public Student map(Student student) throws Exception {
27 | // Student result = new Student();
28 | // result.setId(student.getId());
29 | // result.setName(student.getName());
30 | // result.setAge(student.getAge());
31 | // result.setAddress("加密地址");
32 | // return result;
33 | // }
34 | // });
35 | // operator1.print();
36 |
37 |
38 | DataStreamSource source2 = collection2(env);
39 | SingleOutputStreamOperator operator2 = source2.flatMap(new FlatMapFunction() {
40 | @Override
41 | public void flatMap(Long aLong, Collector collector) throws Exception {
42 | if (aLong % 2 == 0) {
43 | collector.collect(new Student(aLong.intValue(), "name" + aLong, aLong.intValue(), "加密地址"));
44 | }
45 | }
46 | });
47 | operator2.print();
48 |
49 |
50 | env.execute("test collection source");
51 | }
52 |
53 | private static DataStreamSource collection1(StreamExecutionEnvironment env) {
54 | List studentList = Lists.newArrayList(
55 | new Student(1, "name1", 23, "address1"),
56 | new Student(2, "name2", 23, "address2"),
57 | new Student(3, "name3", 23, "address3")
58 | );
59 | return env.fromCollection(studentList);
60 | }
61 |
62 | /**
63 | * 生成 一段序列
64 | * @param env 运行环境
65 | * @return 序列输入流
66 | */
67 | private static DataStreamSource collection2(StreamExecutionEnvironment env) {
68 | return env.generateSequence(1, 20);
69 | }
70 | }
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/datasource/DataSourceFromFile.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.datasource;
2 |
3 | import cn.sevenyuan.domain.Student;
4 | import org.apache.flink.api.common.functions.FlatMapFunction;
5 | import org.apache.flink.api.java.io.TextInputFormat;
6 | import org.apache.flink.api.java.typeutils.TypeExtractor;
7 | import org.apache.flink.streaming.api.datastream.DataStreamSource;
8 | import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
9 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
10 | import org.apache.flink.streaming.api.functions.source.FileProcessingMode;
11 |
12 |
13 | import org.apache.flink.core.fs.Path;
14 | import org.apache.flink.util.Collector;
15 |
16 | import java.net.URL;
17 |
18 | /**
19 | * 文件输入流
20 | * @author JingQ at 2019-09-22
21 | */
22 | public class DataSourceFromFile {
23 |
24 | public static void main(String[] args) throws Exception {
25 | StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
26 | // 读取文件地址
27 | URL fileUrl = DataSourceFromFile.class.getClassLoader().getResource("datasource/student.txt");
28 | String filePath = fileUrl.getPath();
29 |
30 | // 简单的文字文件输入流
31 | // DataStreamSource textFileSource =
32 | // env.readTextFile(filePath);
33 | // SingleOutputStreamOperator textFileOperator = textFileSource.map(new MapFunction() {
34 | // @Override
35 | // public Student map(String s) throws Exception {
36 | // String[] tokens = s.split("\\W+");
37 | // return new Student(Integer.valueOf(tokens[0]), tokens[1], Integer.valueOf(tokens[2]), "加密地址");
38 | // }
39 | // });
40 | // textFileOperator.print();
41 |
42 |
43 | // 指定格式和监听类型
44 | Path pa = new Path(filePath);
45 | TextInputFormat inputFormat = new TextInputFormat(pa);
46 | DataStreamSource complexFileSource =
47 | env.readFile(inputFormat, filePath, FileProcessingMode.PROCESS_CONTINUOUSLY, 100L,
48 | TypeExtractor.getInputFormatTypes(inputFormat));
49 | SingleOutputStreamOperator complexFileOperator = complexFileSource.flatMap(new FlatMapFunction() {
50 | @Override
51 | public void flatMap(String value, Collector out) throws Exception {
52 | String[] tokens = value.split("\\W+");
53 | if (tokens.length > 1) {
54 | out.collect(new Student(Integer.valueOf(tokens[0]), tokens[1], Integer.valueOf(tokens[2]), "加密地址"));
55 | }
56 | }
57 | });
58 | complexFileOperator.print();
59 |
60 |
61 | env.execute("test file source");
62 | }
63 |
64 |
65 | }
66 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/datasource/DataSourceFromSocket.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.datasource;
2 |
3 | import org.apache.flink.api.common.functions.FlatMapFunction;
4 | import org.apache.flink.api.java.tuple.Tuple2;
5 | import org.apache.flink.streaming.api.datastream.DataStreamSource;
6 | import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
7 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
8 | import org.apache.flink.util.Collector;
9 |
10 | /**
11 | * 读出 socket 数据源
12 | * @author JingQ at 2019-09-22
13 | */
14 | public class DataSourceFromSocket {
15 |
16 | public static void main(String[] args) throws Exception {
17 |
18 | StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
19 |
20 | DataStreamSource source = env
21 | .socketTextStream("localhost", 9000);
22 |
23 | SingleOutputStreamOperator> operator = source
24 | .flatMap(new FlatMapFunction>() {
25 | @Override
26 | public void flatMap(String value, Collector> out) throws Exception {
27 | String[] tokens = value.split("\\W+");
28 | for (String token : tokens) {
29 | out.collect(Tuple2.of(token, 1));
30 | }
31 | }
32 | })
33 | .keyBy(0)
34 | .sum(1);
35 | operator.print();
36 |
37 | env.execute("test socket datasource");
38 | }
39 | }
40 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/datasource/custom/DataSourceFromKafka.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.datasource.custom;
2 |
3 | import cn.sevenyuan.domain.Student;
4 | import cn.sevenyuan.sink.SinkToMySQL;
5 | import cn.sevenyuan.util.KafkaUtils;
6 | import com.alibaba.fastjson.JSONObject;
7 | import com.google.common.collect.Lists;
8 | import org.apache.flink.api.common.serialization.SimpleStringSchema;
9 | import org.apache.flink.streaming.api.datastream.DataStream;
10 | import org.apache.flink.streaming.api.datastream.DataStreamSource;
11 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
12 | import org.apache.flink.streaming.api.functions.windowing.AllWindowFunction;
13 | import org.apache.flink.streaming.api.windowing.time.Time;
14 | import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
15 | import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
16 | import org.apache.flink.util.Collector;
17 |
18 | import java.util.List;
19 | import java.util.Properties;
20 |
21 | /**
22 | * 官方库 kafka 数据源(准确来说是 kafka 连接器 connector)
23 | * @author JingQ at 2019-09-22
24 | */
25 | public class DataSourceFromKafka {
26 |
27 | public static void main(String[] args) throws Exception {
28 | StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
29 |
30 | Properties props = new Properties();
31 | props.put("bootstrap.servers", KafkaUtils.BROKER_LIST);
32 | props.put("zookeeper.connect", "localhost:2181");
33 | props.put("group.id", KafkaUtils.TOPIC_STUDENT);
34 | props.put("key.deserializer", KafkaUtils.KEY_SERIALIZER);
35 | props.put("value.deserializer", KafkaUtils.VALUE_SERIALIZER);
36 | props.put("auto.offset.reset", "latest");
37 |
38 | DataStreamSource dataStreamSource = env.addSource(new FlinkKafkaConsumer(
39 | KafkaUtils.TOPIC_STUDENT,
40 | new SimpleStringSchema(),
41 | props
42 | )).setParallelism(1);
43 |
44 |
45 | // 数据下沉
46 | addMySQLSink(dataStreamSource);
47 | env.execute("test custom kafka datasource");
48 | }
49 |
50 | private static void addMySQLSink(DataStreamSource dataStreamSource) {
51 | // 从 kafka 读数据,然后进行 map 映射转换
52 | DataStream dataStream = dataStreamSource.map(value -> JSONObject.parseObject(value, Student.class));
53 | // 不需要 keyBy 分类,所以使用 windowAll,每 10s 统计接收到的数据,批量插入到数据库中
54 | dataStream
55 | .timeWindowAll(Time.seconds(10))
56 | .apply(new AllWindowFunction, TimeWindow>() {
57 | @Override
58 | public void apply(TimeWindow window, Iterable values, Collector> out) throws Exception {
59 | List students = Lists.newArrayList(values);
60 | if (students.size() > 0) {
61 | System.out.println("最近 10 秒汇集到 " + students.size() + " 条数据");
62 | out.collect(students);
63 | }
64 | }
65 | })
66 | .addSink(new SinkToMySQL());
67 | // 输出结果如下:
68 | // 最近 10 秒汇集到 3 条数据
69 | // success insert number : 3
70 | }
71 | }
72 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/datasource/custom/DataSourceFromRedis.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.datasource.custom;
2 |
3 | import org.apache.flink.api.common.functions.MapFunction;
4 | import org.apache.flink.streaming.api.datastream.DataStreamSource;
5 | import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
6 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
7 |
8 | /**
9 | * 自定义 DataSource,测试从 redis 取数据
10 | * @author JingQ at 2019-09-22
11 | */
12 | public class DataSourceFromRedis {
13 |
14 | public static void main(String[] args) throws Exception {
15 | StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
16 |
17 | DataStreamSource customSource = env.addSource(new MyRedisDataSourceFunction());
18 | SingleOutputStreamOperator operator = customSource
19 | .map((MapFunction) value -> "当前最大值为 : " + value);
20 | operator.print();
21 | env.execute("test custom redis datasource function");
22 | }
23 | }
24 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/datasource/custom/MyRedisDataSourceFunction.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.datasource.custom;
2 |
3 | import cn.sevenyuan.util.RedisUtils;
4 | import org.apache.commons.lang3.StringUtils;
5 | import org.apache.flink.configuration.Configuration;
6 | import org.apache.flink.streaming.api.functions.source.RichSourceFunction;
7 |
8 | /**
9 | * @author JingQ at 2019-09-22
10 | */
11 | public class MyRedisDataSourceFunction extends RichSourceFunction {
12 |
13 | @Override
14 | public void open(Configuration parameters) throws Exception {
15 | super.open(parameters);
16 | // noop
17 | }
18 |
19 | @Override
20 | public void run(SourceContext ctx) throws Exception {
21 | while (true) {
22 | String maxNumber = RedisUtils.get("maxNumber", String.class);
23 | ctx.collect(StringUtils.isBlank(maxNumber) ? "0" : maxNumber);
24 | // 隔 1 s 执行程序
25 | Thread.sleep(1000);
26 | }
27 | }
28 |
29 | @Override
30 | public void cancel() {
31 | // noop
32 | }
33 |
34 | @Override
35 | public void close() throws Exception {
36 | super.close();
37 | RedisUtils.close();
38 | }
39 | }
40 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/domain/Student.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.domain;
2 |
3 | import lombok.Data;
4 |
5 | import java.util.Date;
6 |
7 | /**
8 | * @author JingQ at 2019-09-22
9 | */
10 | @Data
11 | public class Student {
12 |
13 | private int id;
14 |
15 | private String name;
16 |
17 | private int age;
18 |
19 | private String address;
20 |
21 | private Date checkInTime;
22 |
23 | private long successTimeStamp;
24 |
25 | public Student() {
26 | }
27 |
28 | public Student(int id, String name, int age, String address) {
29 | this.id = id;
30 | this.name = name;
31 | this.age = age;
32 | this.address = address;
33 | }
34 |
35 | public static Student of(int id, String name, int age, String address, long timeStamp) {
36 | Student student = new Student(id, name, age, address);
37 | student.setSuccessTimeStamp(timeStamp);
38 | return student;
39 | }
40 | }
41 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/domain/WordWithCount.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.domain;
2 |
3 | import lombok.Data;
4 |
5 | /**
6 | * @author JingQ at 2019-09-18
7 | */
8 | @Data
9 | public class WordWithCount {
10 |
11 | private String word;
12 |
13 | private long count;
14 |
15 | public WordWithCount() {}
16 |
17 | public WordWithCount(String word, long count) {
18 | this.word = word;
19 | this.count = count;
20 | }
21 |
22 | @Override
23 | public String toString() {
24 | return word + " : " + count;
25 | }
26 |
27 | }
28 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/hotest/itemcount/CountAgg.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.hotest.itemcount;
2 |
3 | import org.apache.flink.api.common.functions.AggregateFunction;
4 |
5 | /**
6 | * COUNT 统计的聚合函数实现,每出现一条记录加一
7 | * @author JingQ at 2019-09-28
8 | */
9 | public class CountAgg implements AggregateFunction {
10 |
11 | @Override
12 | public Long createAccumulator() {
13 | return 0L;
14 | }
15 |
16 | @Override
17 | public Long add(UserBehavior value, Long accumulator) {
18 | return accumulator + 1;
19 | }
20 |
21 | @Override
22 | public Long getResult(Long accumulator) {
23 | return accumulator;
24 | }
25 |
26 | @Override
27 | public Long merge(Long a, Long b) {
28 | // 返回输出结果,这里提前聚合掉数据,减少 state 的存储压力
29 | return a + b;
30 | }
31 | }
32 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/hotest/itemcount/HotItems.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.hotest.itemcount;
2 |
3 | import cn.sevenyuan.datasource.DataSourceFromFile;
4 | import org.apache.flink.api.common.functions.FilterFunction;
5 | import org.apache.flink.api.java.io.PojoCsvInputFormat;
6 | import org.apache.flink.api.java.typeutils.PojoTypeInfo;
7 | import org.apache.flink.api.java.typeutils.TypeExtractor;
8 | import org.apache.flink.streaming.api.TimeCharacteristic;
9 | import org.apache.flink.streaming.api.datastream.DataStream;
10 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
11 |
12 | import java.io.File;
13 | import java.net.URL;
14 | import org.apache.flink.core.fs.Path;
15 | import org.apache.flink.streaming.api.functions.timestamps.AscendingTimestampExtractor;
16 | import org.apache.flink.streaming.api.windowing.time.Time;
17 |
18 | /**
19 | * reference http://wuchong.me/blog/2018/11/07/use-flink-calculate-hot-items/
20 | * @author JingQ at 2019-09-28
21 | */
22 | public class HotItems {
23 |
24 | public static void main(String[] args) throws Exception {
25 |
26 | StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
27 | // 为了打印到控制台的结果不乱序,配置全局的并发度为 1(对正确性没有影响)
28 | env.setParallelism(1);
29 | // 以下步骤,是为了使用 PojoCsvInputFormat,读取 csv 文件并转成 POJO
30 | URL fileUrl = HotItems.class.getClassLoader().getResource("UserBehavior.csv");
31 | Path filePath = Path.fromLocalFile(new File("/Users/jingqi/Deploy/Project/IdeaProject/flink-quick-start/src/main/resources/UserBehavior.csv"));
32 | PojoTypeInfo pojoTypeInfo = (PojoTypeInfo) TypeExtractor.createTypeInfo(UserBehavior.class);
33 | // 由于 Java 反射抽取出来的字段顺序是不确定的,需要显示指定文件中字段的顺序
34 | String[] fieldOrder = new String[]{"userId", "itemId", "categoryId", "behavior", "timestamp"};
35 | // 创建 PojoCsvInputFormat
36 | PojoCsvInputFormat csvInputFormat = new PojoCsvInputFormat<>(filePath, pojoTypeInfo, fieldOrder);
37 |
38 | // 开始创建输入源
39 | DataStream dataSource = env.createInput(csvInputFormat, pojoTypeInfo);
40 |
41 | // 一、设置 EventTime 模式,用基于数据中自带的时间戳(默认是【事件处理】processing time)
42 | env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
43 | // 二、指定获取业务时间,以及生成 Watermark(用来追踪事件的概念,用来指示当前处理到什么时刻的数据)
44 | DataStream timeData = dataSource
45 | .assignTimestampsAndWatermarks(new AscendingTimestampExtractor() {
46 | @Override
47 | public long extractAscendingTimestamp(UserBehavior element) {
48 | // 原始单位是秒,需要乘以 1000 ,转换成毫秒
49 | return element.getTimestamp() * 1000;
50 | }
51 | });
52 |
53 | // 使用过滤算子 filter,筛选出操作行为中是 pv 的数据
54 | DataStream pvData = timeData
55 | .filter(new FilterFunction() {
56 | @Override
57 | public boolean filter(UserBehavior value) throws Exception {
58 | return "pv".equals(value.getBehavior());
59 | }
60 | });
61 |
62 | // 设定滑动窗口 sliding window,每隔五分钟统计最近一个小时的每个商品的点击量
63 | // 经历过程 dataStream -> keyStream -> dataStream
64 | DataStream windowData = pvData
65 | .keyBy("itemId")
66 | .timeWindow(Time.minutes(60), Time.minutes(5))
67 | .aggregate(new CountAgg(), new WindowResultFunction());
68 |
69 | // 统计最热门商品
70 | DataStream topItems = windowData
71 | .keyBy("windowEnd")
72 | .process(new TopNHotItems(3));
73 |
74 | topItems.print();
75 | env.execute("Test Hot Items Job");
76 | }
77 |
78 |
79 | }
80 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/hotest/itemcount/ItemViewCount.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.hotest.itemcount;
2 |
3 | import lombok.Data;
4 |
5 | /**
6 | * 商品点击量(窗口操作的输出类型)
7 | *
8 | * @author JingQ at 2019-09-28
9 | */
10 | @Data
11 | public class ItemViewCount {
12 |
13 | /**
14 | * 商品 ID
15 | */
16 | private long itemId;
17 |
18 | /**
19 | * 窗口结束时间戳
20 | */
21 | private long windowEnd;
22 |
23 | /**
24 | * 商品的点击数
25 | */
26 | private long viewCount;
27 |
28 | public static ItemViewCount of(long itemId, long windowEnd, long viewCount) {
29 | ItemViewCount result = new ItemViewCount();
30 | result.setItemId(itemId);
31 | result.setWindowEnd(windowEnd);
32 | result.setViewCount(viewCount);
33 | return result;
34 | }
35 | }
36 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/hotest/itemcount/TopNHotItems.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.hotest.itemcount;
2 |
3 | import com.google.common.collect.Lists;
4 | import org.apache.flink.api.common.state.ListState;
5 | import org.apache.flink.api.common.state.ListStateDescriptor;
6 | import org.apache.flink.api.java.tuple.Tuple;
7 | import org.apache.flink.configuration.Configuration;
8 | import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
9 | import org.apache.flink.util.Collector;
10 |
11 | import java.sql.Timestamp;
12 | import java.util.ArrayList;
13 | import java.util.Comparator;
14 | import java.util.List;
15 |
16 | /**
17 | * 求某个窗口中前 N 名的热门点击商品,key 为窗口时间,输出 TopN 的字符结果串
18 | * @author JingQ at 2019-09-29
19 | */
20 | public class TopNHotItems extends KeyedProcessFunction {
21 |
22 | private final int topSize;
23 |
24 | public TopNHotItems(int topSize) {
25 | this.topSize = topSize;
26 | }
27 |
28 | /**
29 | * 用于存储商品与点击数的状态,待收取同一个窗口的数据后,再触发 TopN 计算
30 | */
31 | private ListState itemState;
32 |
33 | @Override
34 | public void open(Configuration parameters) throws Exception {
35 | super.open(parameters);
36 | // 重载 open 方法,注册状态
37 | ListStateDescriptor itemViewCountListStateDescriptor =
38 | new ListStateDescriptor(
39 | "itemState-state",
40 | ItemViewCount.class
41 | );
42 | // 用来存储收到的每条 ItemViewCount 状态,保证在发生故障时,状态数据的不丢失和一致性
43 | itemState = getRuntimeContext().getListState(itemViewCountListStateDescriptor);
44 | }
45 |
46 | @Override
47 | public void processElement(ItemViewCount value, Context ctx, Collector out) throws Exception {
48 | // 每条数据都保存到状态中
49 | itemState.add(value);
50 | // 注册 windowEnd + 1 的 EventTime Timer,当触发时,说明收起了属于 windowEnd 窗口的所有数据
51 | ctx.timerService().registerEventTimeTimer(value.getWindowEnd() + 1);
52 | }
53 |
54 | @Override
55 | public void onTimer(long timestamp, OnTimerContext ctx, Collector out) throws Exception {
56 | // 获取收到的所有商品点击量
57 | List allItems = Lists.newArrayList(itemState.get());
58 | // 提前清除状态中的数据,释放空间
59 | itemState.clear();
60 | // 按照点击量从大到小排序(也就是按照某字段正向排序,然后进行反序)
61 | allItems.sort(Comparator.comparing(ItemViewCount::getViewCount).reversed());
62 | // 将排名信息格式化成 String
63 | StringBuilder result = new StringBuilder();
64 | result.append("================================== TEST =================================\n");
65 | result.append("时间: ").append(new Timestamp(timestamp - 1)).append("\n");
66 | // 遍历点击事件到结果中
67 | int realSize = allItems.size() < topSize ? allItems.size() : topSize;
68 | for (int i = 0; i < realSize; i++) {
69 | ItemViewCount item = allItems.get(i);
70 | if (item == null) {
71 | continue;
72 | }
73 | result.append("No ").append(i).append(":")
74 | .append(" 商品 ID=").append(item.getItemId())
75 | .append(" 游览量=").append(item.getViewCount())
76 | .append("\n");
77 | }
78 | result.append("================================== END =================================\n\n");
79 | out.collect(result.toString());
80 | }
81 | }
82 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/hotest/itemcount/UserBehavior.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.hotest.itemcount;
2 |
3 | import lombok.Data;
4 |
5 | /**
6 | * 简单的用户行为
7 | * @author JingQ at 2019-09-28
8 | */
9 | @Data
10 | public class UserBehavior {
11 |
12 | /**
13 | * 用户 ID
14 | */
15 | private long userId;
16 |
17 | /**
18 | * 商品 ID
19 | */
20 | private long itemId;
21 |
22 | /**
23 | * 商品类目 ID
24 | */
25 | private int categoryId;
26 |
27 | /**
28 | * 用户行为,包括("pv", "buy", "cart", "fav")
29 | */
30 | private String behavior;
31 |
32 | /**
33 | * 行为发生的时间戳,模拟数据中自带了时间属性
34 | */
35 | private long timestamp;
36 |
37 | }
38 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/hotest/itemcount/WindowResultFunction.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.hotest.itemcount;
2 |
3 | import org.apache.flink.api.java.tuple.Tuple;
4 | import org.apache.flink.api.java.tuple.Tuple1;
5 | import org.apache.flink.streaming.api.functions.windowing.WindowFunction;
6 | import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
7 | import org.apache.flink.util.Collector;
8 |
9 | /**
10 | * 用于输出窗口的结果
11 | *
12 | * WindowFunction 函数定义 WindowFunction
13 | * @author JingQ at 2019-09-28
14 | */
15 | public class WindowResultFunction implements WindowFunction {
16 |
17 | @Override
18 | public void apply(
19 | Tuple tuple, // 窗口的主键,即 itemId
20 | TimeWindow window, // 窗口
21 | Iterable input, // 聚合函数,即 count 值
22 | Collector out) // 输出类型是 ItemViewCount
23 | throws Exception {
24 | Long itemId = ((Tuple1) tuple).f0;
25 | Long count = input.iterator().next();
26 | out.collect(ItemViewCount.of(itemId, window.getEnd(), count));
27 | }
28 | }
29 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/sink/SinkToMySQL.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.sink;
2 |
3 | import cn.sevenyuan.domain.Student;
4 | import cn.sevenyuan.util.MyDruidUtils;
5 | import org.apache.flink.configuration.Configuration;
6 | import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
7 |
8 | import java.sql.Connection;
9 | import java.sql.PreparedStatement;
10 | import java.util.List;
11 |
12 | /**
13 | * 将数据下沉到 MySQL 数据库中
14 | * @author JingQ at 2019-09-30
15 | */
16 | public class SinkToMySQL extends RichSinkFunction> {
17 |
18 | private PreparedStatement ps;
19 |
20 | private Connection connection;
21 |
22 | @Override
23 | public void open(Configuration parameters) throws Exception {
24 | super.open(parameters);
25 | // 获取链接
26 | connection = MyDruidUtils.getConnection();
27 | String sql = "insert into student(name, age, address) values (?, ?, ?);";
28 | ps = connection.prepareStatement(sql);
29 | }
30 |
31 | @Override
32 | public void close() throws Exception {
33 | super.close();
34 | // 关闭连接和释放资源
35 | if (connection != null) {
36 | connection.close();
37 | }
38 | if (ps != null) {
39 | ps.close();
40 | }
41 | }
42 |
43 | @Override
44 | public void invoke(List value, Context context) throws Exception {
45 | // 遍历数据集合
46 | for (Student student : value) {
47 | ps.setString(1, student.getName());
48 | ps.setInt(2, student.getAge());
49 | ps.setString(3, student.getAddress());
50 | ps.addBatch();
51 | }
52 | // 一次性插入时间窗口聚合起来的数据
53 | int[] count = ps.executeBatch();
54 | System.out.println("success insert number : " + count.length);
55 | }
56 | }
57 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/transformation/MyExtendTimestampExtractor.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.transformation;
2 |
3 | import org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks;
4 | import org.apache.flink.streaming.api.watermark.Watermark;
5 |
6 | import javax.annotation.Nullable;
7 |
8 | /**
9 | * 简单时间抽取器,返回当前时间即可,不需要类型限制
10 | * @author JingQ at 2019-10-19
11 | */
12 | public class MyExtendTimestampExtractor implements AssignerWithPeriodicWatermarks {
13 | @Nullable
14 | @Override
15 | public Watermark getCurrentWatermark() {
16 | return new Watermark(System.currentTimeMillis());
17 | }
18 |
19 | @Override
20 | public long extractTimestamp(T element, long previousElementTimestamp) {
21 | return System.currentTimeMillis();
22 | }
23 | }
24 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/transformation/MyTimestampExtractor.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.transformation;
2 |
3 | import cn.sevenyuan.domain.Student;
4 | import org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks;
5 | import org.apache.flink.streaming.api.watermark.Watermark;
6 |
7 | import javax.annotation.Nullable;
8 |
9 | /**
10 | * @author JingQ at 2019-09-24
11 | */
12 | public class MyTimestampExtractor implements AssignerWithPeriodicWatermarks {
13 |
14 |
15 | @Nullable
16 | @Override
17 | public Watermark getCurrentWatermark() {
18 | return new Watermark(System.currentTimeMillis());
19 | }
20 |
21 | @Override
22 | public long extractTimestamp(Student element, long previousElementTimestamp) {
23 | return System.currentTimeMillis();
24 | }
25 | }
26 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/transformation/StudentTimeAndWindowTransformation.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.transformation;
2 |
3 | import cn.sevenyuan.domain.Student;
4 | import cn.sevenyuan.transformation.studentwindow.CountStudentAgg;
5 | import cn.sevenyuan.transformation.studentwindow.StudentViewCount;
6 | import cn.sevenyuan.transformation.studentwindow.WindowStudentResultFunction;
7 | import com.google.common.collect.Lists;
8 | import org.apache.flink.streaming.api.TimeCharacteristic;
9 | import org.apache.flink.streaming.api.datastream.DataStream;
10 | import org.apache.flink.streaming.api.datastream.DataStreamSource;
11 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
12 | import org.apache.flink.streaming.api.functions.timestamps.AscendingTimestampExtractor;
13 | import org.apache.flink.streaming.api.windowing.time.Time;
14 |
15 | import java.util.List;
16 |
17 |
18 | /**
19 | *
20 | * Time 和 Window 的结合
21 | * @author JingQ at 2019-09-26
22 | */
23 | public class StudentTimeAndWindowTransformation {
24 |
25 | public static void main(String[] args) throws Exception{
26 | StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
27 | // 设定按照【事件发生】的时间进行处理
28 | env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
29 | // 来自集合的数据源
30 | DataStreamSource collectionSource = env.fromCollection(getCollection());
31 | // 分配时间戳和添加水印,接着按照 id 进行分类 keyStream,最后进行聚合
32 | DataStream windowData = collectionSource
33 | .assignTimestampsAndWatermarks(new AscendingTimestampExtractor() {
34 | @Override
35 | public long extractAscendingTimestamp(Student element) {
36 | return element.getSuccessTimeStamp();
37 | }
38 | })
39 | .keyBy("id")
40 | .timeWindow(Time.milliseconds(100), Time.milliseconds(10))
41 | .aggregate(new CountStudentAgg(), new WindowStudentResultFunction());
42 | windowData.print();
43 | env.execute("test Tumbling window");
44 | }
45 |
46 |
47 |
48 |
49 |
50 |
51 | private static List getCollection() {
52 | return Lists.newArrayList(
53 | Student.of(1, "第一种商品名字 1", 0, "test", 1569640890385L),
54 | Student.of(2, "第一种商品名字 2",0, "test", 1569640890386L),
55 | Student.of(3, "第一种商品名字 3",0, "test", 1569640890387L),
56 | Student.of(4, "第一种商品名字 4",0, "test", 1569640890388L),
57 | Student.of(5, "第一种商品名字 5",0, "test", 1569640890389L),
58 | Student.of(6, "第一种商品名字 6",0, "test", 1569640890390L),
59 | Student.of(7, "第一种商品名字 7",0, "test", 1569640890391L),
60 | Student.of(8, "第一种商品名字 8",0, "test", 1569640890392L),
61 | Student.of(9, "第一种商品名字 9",0, "test", 1569640890393L),
62 | Student.of(10, "第一种商品名字 10", 0, "test", 1569640890394L),
63 | Student.of(11, "第一种商品名字 11", 0, "test", 1569640890395L),
64 | Student.of(12, "第一种商品名字 12", 0, "test", 1569640890396L),
65 | Student.of(13, "第一种商品名字 13", 0, "test", 1569640890397L),
66 | Student.of(14, "第一种商品名字 14", 0, "test", 1569640890398L),
67 | Student.of(15, "第一种商品名字 15", 0, "test", 1569640890399L),
68 | Student.of(16, "第一种商品名字 16", 0, "test", 1569640890400L),
69 | Student.of(17, "第一种商品名字 17", 0, "test", 1569640890401L),
70 |
71 | Student.of(1, "第二种商品名字 1", 0, "test", 1569640890401L),
72 | Student.of(2, "第二种商品名字 2", 0, "test",1569640890402L),
73 | Student.of(3, "第二种商品名字 3", 0, "test",1569640890403L),
74 | Student.of(4, "第二种商品名字 4", 0, "test",1569640890404L),
75 | Student.of(5, "第二种商品名字 5", 0, "test",1569640890405L),
76 | Student.of(6, "第二种商品名字 6", 0, "test",1569640890406L),
77 | Student.of(7, "第二种商品名字 7", 0, "test",1569640890407L),
78 | Student.of(8, "第二种商品名字 8", 0, "test",1569640890408L),
79 | Student.of(9, "第二种商品名字 9", 0, "test",1569640890409L),
80 | Student.of(10, "第二种商品名字 10", 0, "test", 1569640890410L),
81 | Student.of(11, "第二种商品名字 11", 0, "test", 1569640890411L),
82 | Student.of(12, "第二种商品名字 12", 0, "test", 1569640890412L),
83 | Student.of(13, "第二种商品名字 13", 0, "test", 1569640890413L),
84 | Student.of(14, "第二种商品名字 14", 0, "test", 1569640890414L),
85 | Student.of(15, "第二种商品名字 15", 0, "test", 1569640890415L),
86 | Student.of(16, "第二种商品名字 16", 0, "test", 1569640890416L),
87 | Student.of(17, "第二种商品名字 17", 0, "test", 1569640890417L),
88 |
89 | Student.of(1, "第三种商品名字 1", 0, "test", 1569640890418L),
90 | Student.of(2, "第三种商品名字 2", 0, "test",1569640890419L),
91 | Student.of(3, "第三种商品名字 3", 0, "test",1569640890420L),
92 | Student.of(4, "第三种商品名字 4", 0, "test",1569640890421L),
93 | Student.of(5, "第三种商品名字 5", 0, "test",1569640890422L),
94 | Student.of(6, "第三种商品名字 6", 0, "test",1569640890423L),
95 | Student.of(7, "第三种商品名字 7", 0, "test",1569640890424L),
96 | Student.of(8, "第三种商品名字 8", 0, "test",1569640890425L),
97 | Student.of(9, "第三种商品名字 9", 0, "test",1569640890426L),
98 | Student.of(10, "第三种商品名字 10", 0, "test", 1569640890427L),
99 | Student.of(11, "第三种商品名字 11", 0, "test", 1569640890428L),
100 | Student.of(12, "第三种商品名字 12", 0, "test", 1569640890429L),
101 | Student.of(13, "第三种商品名字 13", 0, "test", 1569640890430L),
102 | Student.of(14, "第三种商品名字 14", 0, "test", 1569640890431L),
103 | Student.of(15, "第三种商品名字 15", 0, "test", 1569640890432L),
104 | Student.of(16, "第三种商品名字 16", 0, "test", 1569640890433L),
105 | Student.of(17, "第三种商品名字 17", 0, "test", 1569640890434L)
106 | );
107 | }
108 |
109 | }
110 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/transformation/studentwindow/CountStudentAgg.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.transformation.studentwindow;
2 |
3 | import cn.sevenyuan.domain.Student;
4 | import org.apache.flink.api.common.functions.AggregateFunction;
5 |
6 | /**
7 | * COUNT 统计的聚合函数实现,将结果累加,每遇到一条记录进行加一
8 | *
9 | * reference http://wuchong.me/blog/2018/11/07/use-flink-calculate-hot-items/
10 | *
11 | * @author JingQ at 2019-09-28
12 | */
13 | public class CountStudentAgg implements AggregateFunction {
14 |
15 | @Override
16 | public Long createAccumulator() {
17 | return 0L;
18 | }
19 |
20 | @Override
21 | public Long add(Student value, Long accumulator) {
22 | return accumulator + 1;
23 | }
24 |
25 | @Override
26 | public Long getResult(Long accumulator) {
27 | return accumulator;
28 | }
29 |
30 | @Override
31 | public Long merge(Long a, Long b) {
32 | return a + b;
33 | }
34 | }
35 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/transformation/studentwindow/StudentViewCount.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.transformation.studentwindow;
2 |
3 | import lombok.Data;
4 |
5 | /**
6 | * 学生的统计基础类
7 | *
8 | * @author JingQ at 2019-09-28
9 | */
10 | @Data
11 | public class StudentViewCount {
12 |
13 | private int id;
14 |
15 | /**
16 | * 窗口结束时间
17 | */
18 | private long windowEnd;
19 |
20 | /**
21 | * 同一个 id 下的统计数量
22 | */
23 | private long viewCount;
24 |
25 | public static StudentViewCount of(int id, long windowEnd, long count) {
26 | StudentViewCount result = new StudentViewCount();
27 | result.setId(id);
28 | result.setWindowEnd(windowEnd);
29 | result.setViewCount(count);
30 | return result;
31 | }
32 | }
33 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/transformation/studentwindow/WindowStudentResultFunction.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.transformation.studentwindow;
2 |
3 | import org.apache.flink.api.java.tuple.Tuple;
4 | import org.apache.flink.api.java.tuple.Tuple1;
5 | import org.apache.flink.streaming.api.functions.windowing.WindowFunction;
6 | import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
7 | import org.apache.flink.util.Collector;
8 |
9 | /**
10 | * 用于输出统计学生的结果
11 | *
12 | * @author JingQ at 2019-09-28
13 | */
14 | public class WindowStudentResultFunction implements WindowFunction {
15 |
16 |
17 | @Override
18 | public void apply(Tuple tuple, TimeWindow window, Iterable input, Collector out) throws Exception {
19 | int id = ((Tuple1) tuple).f0;
20 | long count = input.iterator().next();
21 | out.collect(StudentViewCount.of(id, window.getEnd(), count));
22 | }
23 | }
24 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/util/KafkaUtils.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.util;
2 |
3 | import cn.sevenyuan.domain.Student;
4 | import com.alibaba.fastjson.JSON;
5 | import org.apache.commons.lang3.RandomUtils;
6 | import org.apache.kafka.clients.producer.KafkaProducer;
7 | import org.apache.kafka.clients.producer.ProducerRecord;
8 |
9 | import java.util.Date;
10 | import java.util.Properties;
11 |
12 | /**
13 | * kafka 工具类
14 | * @author JingQ at 2019-09-22
15 | */
16 | public class KafkaUtils {
17 |
18 | public static final String BROKER_LIST = "localhost:9092";
19 |
20 | public static final String TOPIC_STUDENT = "student";
21 |
22 | public static final String KEY_SERIALIZER = "org.apache.kafka.common.serialization.StringSerializer";
23 |
24 | public static final String VALUE_SERIALIZER = "org.apache.kafka.common.serialization.StringSerializer";
25 |
26 | public static void writeToKafka() throws Exception {
27 | Properties props = new Properties();
28 | props.put("bootstrap.servers", BROKER_LIST);
29 | props.put("key.serializer", KEY_SERIALIZER);
30 | props.put("value.serializer", VALUE_SERIALIZER);
31 | KafkaProducer producer = new KafkaProducer<>(props);
32 |
33 | // 制造传递的对象
34 | int randomInt = RandomUtils.nextInt(1, 100);
35 | Student stu = new Student(randomInt, "name" + randomInt, randomInt, "=-=");
36 | stu.setCheckInTime(new Date());
37 | // 发送数据
38 | ProducerRecord record = new ProducerRecord<>(TOPIC_STUDENT, null, null, JSON.toJSONString(stu));
39 | producer.send(record);
40 | System.out.println("kafka 已发送消息 : " + JSON.toJSONString(stu));
41 | producer.flush();
42 | }
43 |
44 | public static void main(String[] args) throws Exception {
45 | while (true) {
46 | Thread.sleep(3000);
47 | writeToKafka();
48 | }
49 | }
50 | }
51 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/util/MyDruidUtils.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.util;
2 |
3 | import com.alibaba.druid.pool.DruidDataSource;
4 |
5 | import java.sql.Connection;
6 |
7 | /**
8 | * 数据库连接池工具类
9 | * @author JingQ at 2019-09-30
10 | */
11 | public class MyDruidUtils {
12 |
13 | private static DruidDataSource dataSource;
14 |
15 | public static Connection getConnection() throws Exception {
16 | // 使用 Druid 管理链接
17 | dataSource = new DruidDataSource();
18 | dataSource.setDriverClassName("com.mysql.jdbc.Driver");
19 | dataSource.setUrl("jdbc:mysql://localhost:3306/test");
20 | dataSource.setUsername("root");
21 | dataSource.setPassword("12345678");
22 | // 初始链接数、最大连接数、最小闲置数
23 | dataSource.setInitialSize(10);
24 | dataSource.setMaxActive(50);
25 | dataSource.setMinIdle(2);
26 | // 返回链接
27 | return dataSource.getConnection();
28 | }
29 | }
30 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/util/RedisUtils.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.util;
2 |
3 | import com.alibaba.fastjson.JSON;
4 | import lombok.extern.java.Log;
5 | import redis.clients.jedis.Jedis;
6 | import redis.clients.jedis.JedisPool;
7 | import redis.clients.jedis.JedisPoolConfig;
8 |
9 | /**
10 | * 缓存 redis 工具
11 | * @author JingQ at 2019-09-22
12 | */
13 | @Log
14 | public class RedisUtils {
15 |
16 | public static final String HOST = "localhost";
17 |
18 | public static final int PORT = 6379;
19 |
20 | public static final int TIMEOUT = 100000;
21 |
22 | public static final JedisPool JEDIS_POOL = new JedisPool(new JedisPoolConfig(), HOST, PORT, TIMEOUT, null);
23 |
24 | public static final Jedis JEDIS = JEDIS_POOL.getResource();
25 |
26 | /**
27 | * 获取简单 jedis 对象
28 | * @return jedis
29 | */
30 | public static Jedis getJedis() {
31 | return JEDIS;
32 | }
33 |
34 |
35 | public static void close() {
36 | JEDIS_POOL.close();
37 | JEDIS.close();
38 | }
39 |
40 | public static void set(String key, Object value) {
41 | String realValue = JSON.toJSONString(value);
42 | getJedis().set(key, realValue);
43 | }
44 |
45 | public static T get(String key, Class classType) {
46 | String value = getJedis().get(key);
47 | try {
48 | return JSON.parseObject(value, classType);
49 | } catch (Exception e) {
50 | log.info("无法转换对象,确认 json 对应的属性类 classType 正确");
51 | return null;
52 | }
53 | }
54 | }
55 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/watermark/EventTimeData.txt:
--------------------------------------------------------------------------------
1 | 2 3 1 7 3 5 9 6 12 17 10 16 19 11 18
2 |
3 | 001,1575129602000
4 | 001,1575129603000
5 | 001,1575129601000
6 | 001,1575129607000
7 | 001,1575129603000
8 | 001,1575129605000
9 | 001,1575129609000
10 | 001,1575129606000
11 | 001,1575129612000
12 | 001,1575129617000
13 | 001,1575129610000
14 | 001,1575129616000
15 | 001,1575129619000
16 | 001,1575129611000
17 | 001,1575129618000
18 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/watermark/UnOrderEventTimeWindowDemo.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.watermark;
2 |
3 | import com.google.common.base.Joiner;
4 | import org.apache.flink.api.common.functions.MapFunction;
5 | import org.apache.flink.api.java.tuple.Tuple;
6 | import org.apache.flink.api.java.tuple.Tuple2;
7 | import org.apache.flink.streaming.api.TimeCharacteristic;
8 | import org.apache.flink.streaming.api.datastream.DataStream;
9 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
10 | import org.apache.flink.streaming.api.functions.windowing.WindowFunction;
11 | import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
12 | import org.apache.flink.streaming.api.windowing.time.Time;
13 | import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
14 | import org.apache.flink.util.Collector;
15 |
16 | import java.text.SimpleDateFormat;
17 | import java.util.ArrayList;
18 | import java.util.Collections;
19 | import java.util.Iterator;
20 | import java.util.List;
21 |
22 | /**
23 | * 乱序事件时间窗口,验证 Watermark,解决超过窗口时间达到的事件,成功加入对应的窗口,触发窗口计算
24 | *
25 | * 参考链接 https://juejin.im/post/5bf95810e51d452d705fef33
26 | *
27 | * @author JingQ at 2019-11-26
28 | */
29 | public class UnOrderEventTimeWindowDemo {
30 |
31 | public static void main(String[] args) throws Exception {
32 | //定义socket的端口号
33 | int port = 9010;
34 | //获取运行环境
35 | StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
36 | //设置使用eventtime,默认是使用processtime
37 | env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
38 | //设置并行度为1,默认并行度是当前机器的cpu数量
39 | env.setParallelism(1);
40 | //连接socket获取输入的数据
41 | DataStream text = env.socketTextStream("127.0.0.1", port, "\n");
42 |
43 | //解析输入的数据
44 | DataStream> inputMap = text.map(new MapFunction>() {
45 | @Override
46 | public Tuple2 map(String value) throws Exception {
47 | String[] arr = value.split(",");
48 | return new Tuple2<>(arr[0], Long.parseLong(arr[1]));
49 | }
50 | });
51 |
52 | //抽取timestamp和生成watermark,封装成继承接口的类
53 | DataStream> waterMarkStream = inputMap.assignTimestampsAndWatermarks(new WordCountPeriodicWatermarks());
54 |
55 | DataStream window = waterMarkStream.keyBy(0)
56 | //按照消息的EventTime分配窗口,和调用TimeWindow效果一样
57 | .window(TumblingEventTimeWindows.of(Time.seconds(4)))
58 | .apply(new WindowFunction, String, Tuple, TimeWindow>() {
59 | /**
60 | * 对window内的数据进行排序,保证数据的顺序
61 | * @param tuple
62 | * @param window
63 | * @param input
64 | * @param out
65 | * @throws Exception
66 | */
67 | @Override
68 | public void apply(Tuple tuple, TimeWindow window, Iterable> input, Collector out) throws Exception {
69 | String key = tuple.toString();
70 | List arrarList = new ArrayList<>();
71 | List eventTimeList = new ArrayList<>();
72 | Iterator> it = input.iterator();
73 | while (it.hasNext()) {
74 | Tuple2 next = it.next();
75 | arrarList.add(next.f1);
76 | eventTimeList.add(String.valueOf(next.f1).substring(8,10));
77 | }
78 | Collections.sort(arrarList);
79 | SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
80 | String result = "\n 键值 : " + key + "\n " +
81 | "触发窗内数据个数 : " + arrarList.size() + "\n " +
82 | "触发窗起始数据: " + sdf.format(arrarList.get(0)) + "\n " +
83 | "触发窗最后(可能是延时)数据:" + sdf.format(arrarList.get(arrarList.size() - 1))
84 | + "\n " +
85 | "窗口内的事件数据:" + Joiner.on(",").join(eventTimeList) + "\n" +
86 | "实际窗起始和结束时间: " + sdf.format(window.getStart()) + "《----》" + sdf.format(window.getEnd()) + " \n \n ";
87 |
88 | out.collect(result);
89 | }
90 | });
91 | //测试-把结果打印到控制台即可
92 | window.print();
93 |
94 | //注意:因为flink是懒加载的,所以必须调用execute方法,上面的代码才会执行
95 | env.execute("eventtime-watermark");
96 |
97 | }
98 | }
99 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/watermark/WordCountPeriodicWatermarks.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.watermark;
2 |
3 | import org.apache.flink.api.java.tuple.Tuple2;
4 | import org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks;
5 | import org.apache.flink.streaming.api.watermark.Watermark;
6 |
7 | import java.text.SimpleDateFormat;
8 |
9 | /**
10 | * 计数,周期性生成水印
11 | *
12 | * @author JingQ at 2019-12-01
13 | */
14 | public class WordCountPeriodicWatermarks implements AssignerWithPeriodicWatermarks> {
15 |
16 | private Long currentMaxTimestamp = 0L;
17 |
18 | // 最大允许的乱序时间是 3 s
19 | private final Long maxOutOfOrderness = 3000L;
20 |
21 | private SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
22 |
23 | @Override
24 | public Watermark getCurrentWatermark() {
25 | return new Watermark(currentMaxTimestamp - maxOutOfOrderness);
26 | }
27 |
28 |
29 | @Override
30 | public long extractTimestamp(Tuple2 element, long previousElementTimestamp) {
31 | //定义如何提取timestamp
32 | long timestamp = element.f1;
33 | currentMaxTimestamp = Math.max(timestamp, currentMaxTimestamp);
34 | long id = Thread.currentThread().getId();
35 | System.out.println("线程 ID :"+ id +
36 | " 键值 :" + element.f0 +
37 | ",事件事件:[ "+sdf.format(element.f1)+
38 | " ],currentMaxTimestamp:[ "+
39 | sdf.format(currentMaxTimestamp)+" ],水印时间:[ "+
40 | sdf.format(getCurrentWatermark().getTimestamp())+" ]");
41 | return timestamp;
42 | }
43 | }
44 |
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/wordcount/SocketTextStreamWordCount.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.wordcount;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
import org.apache.flink.api.java.tuple.Tuple2;
/**
* socket 数据源
* 统计输入文字的次数,输入通过空格分割
* 没有时间限制
* @author JingQ at 2019-09-21
*/
public class SocketTextStreamWordCount {
public static void main(String[] args) throws Exception {
String hostName = "127.0.0.1";
int port = 9000;
// 设置运行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 获取数据源
DataStreamSource stream = env.socketTextStream(hostName, port);
// 计数
SingleOutputStreamOperator> sum = stream.flatMap((new LineSplitter()))
.keyBy(0)
.sum(1);
// 输出
sum.print();
env.execute("Java Word from SocketTextStream Example");
}
public static final class LineSplitter implements FlatMapFunction> {
@Override
public void flatMap(String s, Collector> collector) throws Exception {
String[] tokens = s.toLowerCase().split("\\W+");
for (String token : tokens) {
if (token.length() > 0) {
collector.collect(new Tuple2(token, 1));
}
}
}
}
}
--------------------------------------------------------------------------------
/src/main/java/cn/sevenyuan/wordcount/SocketWindowWordCount.java:
--------------------------------------------------------------------------------
1 | package cn.sevenyuan.wordcount;
2 |
3 | import cn.sevenyuan.domain.WordWithCount;
4 | import org.apache.flink.api.common.functions.FlatMapFunction;
5 | import org.apache.flink.api.common.functions.ReduceFunction;
6 | import org.apache.flink.api.java.utils.ParameterTool;
7 | import org.apache.flink.streaming.api.datastream.DataStream;
8 | import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
9 | import org.apache.flink.streaming.api.windowing.time.Time;
10 | import org.apache.flink.util.Collector;
11 |
12 | /**
13 | * socket 数据源
14 | * 统计输入文字的次数,输入通过空格分割
15 | * 有时间限制,统计时间窗口为 5s 之内的字符
16 | * @author JingQ at 2019-09-18
17 | */
18 | public class SocketWindowWordCount {
19 |
20 | public static void main(String[] args) throws Exception {
21 |
22 | // the port to connect to
23 | String hostName = "127.0.0.1";
24 | int port = 9000;
25 |
26 | // get the execution environment
27 | final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
28 |
29 | // get input data by connecting to the socket
30 | DataStream text = env.socketTextStream("localhost", port, "\n");
31 |
32 | // parse the data, group it, window it, and aggregate the counts
33 | DataStream windowCounts = text
34 | .flatMap(new FlatMapFunction() {
35 | @Override
36 | public void flatMap(String value, Collector out) {
37 | for (String word : value.split("\\s")) {
38 | out.collect(new WordWithCount(word, 1L));
39 | }
40 | }
41 | })
42 | .keyBy("word")
43 | .timeWindow(Time.seconds(5), Time.seconds(1))
44 | .reduce(new ReduceFunction() {
45 | @Override
46 | public WordWithCount reduce(WordWithCount a, WordWithCount b) {
47 | return new WordWithCount(a.getWord(), a.getCount() + b.getCount());
48 | }
49 | });
50 |
51 | // print the results with a single thread, rather than in parallel
52 | windowCounts.print().setParallelism(1);
53 |
54 | env.execute("Socket Window WordCount");
55 | }
56 | }
57 |
--------------------------------------------------------------------------------
/src/main/resources/datasource/student.txt:
--------------------------------------------------------------------------------
1 | 1 name1 21 a
2 | 2 name2 22 b
3 | 3 name3 23 c
4 | 4 name4 24 d
5 | 5 name5 25 e
6 | 1 name6 21 f
7 | 2 name7 22 g
8 | 3 name8 23 h
9 | 4 name9 24 i
10 | 5 name10 25 j
11 | 1 name11 26 k
12 | 2 name12 27 l
13 | 3 name13 28 m
14 | 4 name14 29 n
15 | 5 name15 30 o
--------------------------------------------------------------------------------
/src/main/resources/datasource/student_min&minBy.txt:
--------------------------------------------------------------------------------
1 | 1 name19 21 a
2 | 1 name18 21 b
3 | 1 name17 21 c
4 | 1 name16 21 d
5 | 1 name15 21 e
6 | 1 name14 21 f
7 | 1 name13 21 g
8 | 1 name12 21 h
9 | 1 name11 21 i
10 | 1 name10 21 j
11 | 1 name9 20 k
12 | 1 name8 21 l
13 | 1 name7 21 m
14 | 1 name6 21 n
15 | 1 name5 21 o
16 | 1 name4 21 p
17 | 1 name3 21 q
18 | 1 name2 21 r
19 | 1 name1 10 s
--------------------------------------------------------------------------------
/src/main/resources/log4j.properties:
--------------------------------------------------------------------------------
1 | ################################################################################
2 | # Licensed to the Apache Software Foundation (ASF) under one
3 | # or more contributor license agreements. See the NOTICE file
4 | # distributed with this work for additional information
5 | # regarding copyright ownership. The ASF licenses this file
6 | # to you under the Apache License, Version 2.0 (the
7 | # "License"); you may not use this file except in compliance
8 | # with the License. You may obtain a copy of the License at
9 | #
10 | # http://www.apache.org/licenses/LICENSE-2.0
11 | #
12 | # Unless required by applicable law or agreed to in writing, software
13 | # distributed under the License is distributed on an "AS IS" BASIS,
14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 | # See the License for the specific language governing permissions and
16 | # limitations under the License.
17 | ################################################################################
18 |
19 | log4j.rootLogger=INFO, console
20 |
21 | log4j.appender.console=org.apache.log4j.ConsoleAppender
22 | log4j.appender.console.layout=org.apache.log4j.PatternLayout
23 | log4j.appender.console.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n
24 |
--------------------------------------------------------------------------------