├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── PythonKafkaSink
    └── main.py
├── README.md
└── lambda-functions
    ├── kfpLambdaConsumerSNS.py
    ├── kfpLambdaCustomMSKConfig.py
    ├── kfpLambdaStreamProducer.py
    └── requirements.txt


/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing Guidelines
 2 | 
 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
 4 | documentation, we greatly value feedback and contributions from our community.
 5 | 
 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
 7 | information to effectively respond to your bug report or contribution.
 8 | 
 9 | 
10 | ## Reporting Bugs/Feature Requests
11 | 
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 | 
14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 | 
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 | 
22 | 
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 | 
26 | 1. You are working against the latest source on the *main* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 | 
30 | To send us a pull request, please:
31 | 
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 | 
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 | 
42 | 
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
45 | 
46 | 
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 | 
52 | 
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 | 
56 | 
57 | ## Licensing
58 | 
59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this
 4 | software and associated documentation files (the "Software"), to deal in the Software
 5 | without restriction, including without limitation the rights to use, copy, modify,
 6 | merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
 7 | permit persons to whom the Software is furnished to do so.
 8 | 
 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
10 | INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
11 | PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
12 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
13 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
14 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
15 | 


--------------------------------------------------------------------------------
/PythonKafkaSink/main.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | from pyflink.table import EnvironmentSettings, StreamTableEnvironment, StatementSet
  5 | import os
  6 | import json
  7 | 
  8 | env_settings = EnvironmentSettings.new_instance().in_streaming_mode().use_blink_planner().build()
  9 | table_env = StreamTableEnvironment.create(environment_settings=env_settings)
 10 | statement_set = table_env.create_statement_set()
 11 | 
 12 | 
 13 | def create_table_input(table_name, stream_name, broker):
 14 |     return """ CREATE TABLE {0} (
 15 |                 `sensor_id` VARCHAR(64) NOT NULL,
 16 |                 `temperature` BIGINT NOT NULL,
 17 |                 `event_time` TIMESTAMP(3),
 18 |                 WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND
 19 |               )
 20 |               WITH (
 21 |                 'connector' = 'kafka',
 22 |                 'topic' = '{1}',
 23 |                 'properties.bootstrap.servers' = '{2}',
 24 |                 'properties.group.id' = 'testGroup',
 25 |                 'format' = 'json',
 26 |                 'json.timestamp-format.standard' = 'ISO-8601',
 27 |                 'scan.startup.mode' = 'earliest-offset'
 28 |               ) """.format(table_name, stream_name, broker)
 29 | 
 30 | 
 31 | def create_table_output_kafka(table_name, stream_name, broker):
 32 |     return """ CREATE TABLE {0} (
 33 |                 `sensor_id` VARCHAR(64) NOT NULL,
 34 |                 `count_temp` BIGINT NOT NULL ,
 35 |                 `start_event_time` TIMESTAMP(3)
 36 |               )
 37 |               WITH (
 38 |                 'connector' = 'kafka',
 39 |                 'topic' = '{1}',
 40 |                 'properties.bootstrap.servers' = '{2}',
 41 |                 'properties.group.id' = 'testGroup',
 42 |                 'format' = 'json',
 43 |                 'json.timestamp-format.standard' = 'ISO-8601',
 44 |                 'scan.startup.mode' = 'earliest-offset'
 45 |               ) """.format(table_name, stream_name, broker)
 46 | 
 47 | 
 48 | def create_table_output_s3(table_name, stream_name):
 49 |     return """ CREATE TABLE {0} (
 50 |                 `sensor_id` VARCHAR(64) NOT NULL,
 51 |                 `avg_temp` BIGINT NOT NULL ,
 52 |                 `start_event_time` TIMESTAMP(3),
 53 |                 `year` BIGINT,
 54 |                 `month` BIGINT,
 55 |                 `day` BIGINT,
 56 |                 `hour` BIGINT
 57 |               )
 58 |               PARTITIONED BY (`year`,`month`,`day`,`hour`)
 59 |               WITH (
 60 |                 'connector' = 'filesystem',
 61 |                 'path' = 's3a://{1}/',
 62 |                 'format' = 'json',
 63 |                 'sink.partition-commit.policy.kind'='success-file',
 64 |                 'sink.partition-commit.delay' = '1 min'
 65 |               ) """.format(table_name, stream_name)
 66 | 
 67 | 
 68 | def insert_stream_sns(insert_from, insert_into):
 69 |     return """ INSERT INTO {1} 
 70 |               SELECT sensor_id, count(*),
 71 |               TUMBLE_START(event_time, INTERVAL '30' SECOND )  
 72 |               FROM {0}
 73 |               where temperature > 30
 74 |               GROUP BY TUMBLE(event_time, INTERVAL '30' SECOND ),sensor_id 
 75 |               HAVING count(*) > 3 """.format(insert_from, insert_into)
 76 | 
 77 | 
 78 | 
 79 | def insert_stream_s3(insert_from, insert_into):
 80 |     return """INSERT INTO {1}
 81 |               SELECT *, YEAR(start_event_time), MONTH(start_event_time), DAYOFMONTH(start_event_time), HOUR(start_event_time)
 82 |               FROM
 83 |               (SELECT sensor_id, AVG(temperature) as avg_temp, TUMBLE_START(event_time, INTERVAL '60' SECOND ) as start_event_time
 84 |               FROM {0} 
 85 |               GROUP BY TUMBLE(event_time, INTERVAL '60' SECOND ), sensor_id) """.format(insert_from, insert_into)
 86 | 
 87 | 
 88 | def app_properties():
 89 |     file_path = '/etc/flink/application_properties.json'
 90 |     if os.path.isfile(file_path):
 91 |         with open(file_path, 'r') as file:
 92 |             contents = file.read()
 93 |             print('Contents of ' + file_path)
 94 |             print(contents)
 95 |             properties = json.loads(contents)
 96 |             return properties
 97 |     else:
 98 |         print('A file at "{}" was not found'.format(file_path))
 99 | 
100 | 
101 | def property_map(props, property_group_id):
102 |     for prop in props:
103 |         if prop["PropertyGroupId"] == property_group_id:
104 |             return prop["PropertyMap"]
105 | 
106 | 
107 | def main():
108 |     INPUT_PROPERTY_GROUP_KEY = "producer.config.0"
109 |     CONSUMER_PROPERTY_GROUP_KEY = "consumer.config.0"
110 | 
111 |     INPUT_TOPIC_KEY = "input.topic.name"
112 |     OUTPUT_TOPIC_KEY = "output.topic.name"
113 |     OUTPUT_BUCKET_KEY = "output.s3.bucket"
114 |     BROKER_KEY = "bootstrap.servers"
115 | 
116 |     props = app_properties()
117 | 
118 |     input_property_map = property_map(props, INPUT_PROPERTY_GROUP_KEY)
119 |     output_property_map = property_map(props, CONSUMER_PROPERTY_GROUP_KEY)
120 | 
121 |     input_stream = input_property_map[INPUT_TOPIC_KEY]
122 |     broker = input_property_map[BROKER_KEY]
123 | 
124 |     output_stream_sns = output_property_map[OUTPUT_TOPIC_KEY]
125 |     output_s3_bucket = output_property_map[OUTPUT_BUCKET_KEY]
126 | 
127 |     input_table = "input_table"
128 |     output_table_sns = "output_table_sns"
129 |     output_table_s3 = "output_table_s3"
130 | 
131 |     table_env.execute_sql(create_table_input(input_table, input_stream, broker))
132 |     table_env.execute_sql(create_table_output_kafka(output_table_sns, output_stream_sns, broker))
133 |     table_env.execute_sql(create_table_output_s3(output_table_s3, output_s3_bucket))
134 | 
135 |     statement_set.add_insert_sql(insert_stream_sns(input_table, output_table_sns))
136 |     statement_set.add_insert_sql(insert_stream_s3(input_table, output_table_s3))
137 | 
138 |     statement_set.execute()
139 | 
140 | 
141 | if __name__ == '__main__':
142 |     main()
143 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## Build real time streaming application using Apache Flink Python API with Kinesis Data Analytics
 2 | 
 3 | --------
 4 | >  #### 🚨 August 30, 2023: Amazon Kinesis Data Analytics has been renamed to [Amazon Managed Service for Apache Flink](https://aws.amazon.com/managed-service-apache-flink).
 5 | 
 6 | --------
 7 | 
 8 | 
 9 | This repository contains sample code for building a Python application for Apache Flink on Kinesis Data Analytics.
10 | 
11 | This project demonstrates how to use Apache Flink Python API on Kinesis Data Analytics using two working examples. Follow the [blogpost](https://aws.amazon.com/blogs/big-data/build-a-real-time-streaming-application-using-apache-flink-python-api-with-amazon-kinesis-data-analytics/) to get step by step guideline on creating a Flink Python application on Kinesis Data Analytics. 
12 | 
13 | 
14 | ## License
15 | 
16 | This library is licensed under the MIT-0 License. See the LICENSE file.
17 | 
18 | 


--------------------------------------------------------------------------------
/lambda-functions/kfpLambdaConsumerSNS.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | 
 5 | import base64
 6 | import boto3
 7 | import json
 8 | import os
 9 | 
10 | sns = boto3.client('sns')
11 | 
12 | def lambda_handler(event, context):
13 |     topic_arn = os.environ["SNSTopicArn"]
14 |     for partition_key, partition_value in event['records'].items():
15 |         for record_value in partition_value:
16 |             data = json.loads(base64.b64decode(record_value['value']))
17 |             subject = "The sensor reading has exceeded the threshold"
18 |             message = f"Sensor Id: {data['sensor_id']} has exceeded the set threshold at the window start time: {data['start_event_time']}"
19 |             sns.publish(
20 |                 TargetArn=topic_arn,
21 |                 Message=message,
22 |                 Subject=subject
23 |             )
24 | 


--------------------------------------------------------------------------------
/lambda-functions/kfpLambdaCustomMSKConfig.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | from __future__ import print_function
 5 | import boto3
 6 | import json
 7 | import random
 8 | import string
 9 | import urllib3
10 | 
11 | http = urllib3.PoolManager()
12 | 
13 | SUCCESS = "SUCCESS"
14 | FAILED = "FAILED"
15 | SERVER_PROPERTIES = b"""
16 | auto.create.topics.enable=true
17 | default.replication.factor=2
18 |         """
19 | 
20 | 
21 | def lambda_handler(event, context):
22 |     kafka = boto3.client("kafka")
23 |     physical_id = "None"
24 |     random_id = ''.join(random.choices(string.ascii_uppercase + string.digits, k=5))
25 |     revision = 1
26 |     if event["RequestType"] == "Create":
27 |         config = kafka.create_configuration(Name=event["LogicalResourceId"] + "-" + random_id,
28 |                                             ServerProperties=SERVER_PROPERTIES)
29 |         physical_id = config["Arn"]
30 |         revision = config["LatestRevision"]["Revision"]
31 |     elif event["RequestType"] == "Delete":
32 |         kafka.delete_configuration(Arn=event["PhysicalResourceId"])
33 | 
34 |     send(event, context, SUCCESS, {
35 |         "Revision": revision,
36 |         "Arn": physical_id
37 |     }, physical_id)
38 | 
39 | 
40 | def send(event, context, response_status, response_data, physical_resource_id=None):
41 |     response_url = event['ResponseURL']
42 |     response_body = {
43 |         'Status': response_status,
44 |         'Reason': "See the details in CloudWatch Log Stream: {}".format(context.log_stream_name),
45 |         'PhysicalResourceId': physical_resource_id,
46 |         'StackId': event['StackId'],
47 |         'RequestId': event['RequestId'],
48 |         'LogicalResourceId': event['LogicalResourceId'],
49 |         'NoEcho': False,
50 |         'Data': response_data
51 |     }
52 | 
53 |     json_response_body = json.dumps(response_body)
54 | 
55 | 
56 |     headers = {
57 |         'content-type': '',
58 |         'content-length': str(len(json_response_body))
59 |     }
60 | 
61 |     try:
62 |         response = http.request('PUT', response_url, headers=headers, body=json_response_body)
63 |         print("Status code:", response.status)
64 | 
65 |     except Exception as e:
66 |         print("send(..) failed executing http.request(..):", e)
67 | 


--------------------------------------------------------------------------------
/lambda-functions/kfpLambdaStreamProducer.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import boto3
 5 | import datetime
 6 | import json
 7 | import os
 8 | import random
 9 | import time
10 | 
11 | from kafka import KafkaProducer
12 | 
13 | msk = boto3.client("kafka")
14 | 
15 | def lambda_handler(event, context):
16 |     cluster_arn = os.environ["mskClusterArn"]
17 |     response = msk.get_bootstrap_brokers(
18 |         ClusterArn=cluster_arn
19 |     )
20 |     producer = KafkaProducer(security_protocol="PLAINTEXT",
21 |                              bootstrap_servers=response["BootstrapBrokerString"],
22 |                              value_serializer=lambda x: x.encode("utf-8"))
23 |     for _ in range(1, 100):
24 |         data = json.dumps({
25 |             "sensor_id": str(random.randint(1, 5)),
26 |             "temperature": random.randint(27, 32),
27 |             "event_time": datetime.datetime.now().isoformat()
28 |         })
29 |         producer.send(os.environ["topicName"], value=data)
30 |         time.sleep(1)
31 | 


--------------------------------------------------------------------------------
/lambda-functions/requirements.txt:
--------------------------------------------------------------------------------
1 | kafka_python==2.0.1


--------------------------------------------------------------------------------