├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── PythonKafkaSink └── main.py ├── README.md └── lambda-functions ├── kfpLambdaConsumerSNS.py ├── kfpLambdaCustomMSKConfig.py ├── kfpLambdaStreamProducer.py └── requirements.txt /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *main* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this 4 | software and associated documentation files (the "Software"), to deal in the Software 5 | without restriction, including without limitation the rights to use, copy, modify, 6 | merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 7 | permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 10 | INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 11 | PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 12 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 13 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 14 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 15 | -------------------------------------------------------------------------------- /PythonKafkaSink/main.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | from pyflink.table import EnvironmentSettings, StreamTableEnvironment, StatementSet 5 | import os 6 | import json 7 | 8 | env_settings = EnvironmentSettings.new_instance().in_streaming_mode().use_blink_planner().build() 9 | table_env = StreamTableEnvironment.create(environment_settings=env_settings) 10 | statement_set = table_env.create_statement_set() 11 | 12 | 13 | def create_table_input(table_name, stream_name, broker): 14 | return """ CREATE TABLE {0} ( 15 | `sensor_id` VARCHAR(64) NOT NULL, 16 | `temperature` BIGINT NOT NULL, 17 | `event_time` TIMESTAMP(3), 18 | WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND 19 | ) 20 | WITH ( 21 | 'connector' = 'kafka', 22 | 'topic' = '{1}', 23 | 'properties.bootstrap.servers' = '{2}', 24 | 'properties.group.id' = 'testGroup', 25 | 'format' = 'json', 26 | 'json.timestamp-format.standard' = 'ISO-8601', 27 | 'scan.startup.mode' = 'earliest-offset' 28 | ) """.format(table_name, stream_name, broker) 29 | 30 | 31 | def create_table_output_kafka(table_name, stream_name, broker): 32 | return """ CREATE TABLE {0} ( 33 | `sensor_id` VARCHAR(64) NOT NULL, 34 | `count_temp` BIGINT NOT NULL , 35 | `start_event_time` TIMESTAMP(3) 36 | ) 37 | WITH ( 38 | 'connector' = 'kafka', 39 | 'topic' = '{1}', 40 | 'properties.bootstrap.servers' = '{2}', 41 | 'properties.group.id' = 'testGroup', 42 | 'format' = 'json', 43 | 'json.timestamp-format.standard' = 'ISO-8601', 44 | 'scan.startup.mode' = 'earliest-offset' 45 | ) """.format(table_name, stream_name, broker) 46 | 47 | 48 | def create_table_output_s3(table_name, stream_name): 49 | return """ CREATE TABLE {0} ( 50 | `sensor_id` VARCHAR(64) NOT NULL, 51 | `avg_temp` BIGINT NOT NULL , 52 | `start_event_time` TIMESTAMP(3), 53 | `year` BIGINT, 54 | `month` BIGINT, 55 | `day` BIGINT, 56 | `hour` BIGINT 57 | ) 58 | PARTITIONED BY (`year`,`month`,`day`,`hour`) 59 | WITH ( 60 | 'connector' = 'filesystem', 61 | 'path' = 's3a://{1}/', 62 | 'format' = 'json', 63 | 'sink.partition-commit.policy.kind'='success-file', 64 | 'sink.partition-commit.delay' = '1 min' 65 | ) """.format(table_name, stream_name) 66 | 67 | 68 | def insert_stream_sns(insert_from, insert_into): 69 | return """ INSERT INTO {1} 70 | SELECT sensor_id, count(*), 71 | TUMBLE_START(event_time, INTERVAL '30' SECOND ) 72 | FROM {0} 73 | where temperature > 30 74 | GROUP BY TUMBLE(event_time, INTERVAL '30' SECOND ),sensor_id 75 | HAVING count(*) > 3 """.format(insert_from, insert_into) 76 | 77 | 78 | 79 | def insert_stream_s3(insert_from, insert_into): 80 | return """INSERT INTO {1} 81 | SELECT *, YEAR(start_event_time), MONTH(start_event_time), DAYOFMONTH(start_event_time), HOUR(start_event_time) 82 | FROM 83 | (SELECT sensor_id, AVG(temperature) as avg_temp, TUMBLE_START(event_time, INTERVAL '60' SECOND ) as start_event_time 84 | FROM {0} 85 | GROUP BY TUMBLE(event_time, INTERVAL '60' SECOND ), sensor_id) """.format(insert_from, insert_into) 86 | 87 | 88 | def app_properties(): 89 | file_path = '/etc/flink/application_properties.json' 90 | if os.path.isfile(file_path): 91 | with open(file_path, 'r') as file: 92 | contents = file.read() 93 | print('Contents of ' + file_path) 94 | print(contents) 95 | properties = json.loads(contents) 96 | return properties 97 | else: 98 | print('A file at "{}" was not found'.format(file_path)) 99 | 100 | 101 | def property_map(props, property_group_id): 102 | for prop in props: 103 | if prop["PropertyGroupId"] == property_group_id: 104 | return prop["PropertyMap"] 105 | 106 | 107 | def main(): 108 | INPUT_PROPERTY_GROUP_KEY = "producer.config.0" 109 | CONSUMER_PROPERTY_GROUP_KEY = "consumer.config.0" 110 | 111 | INPUT_TOPIC_KEY = "input.topic.name" 112 | OUTPUT_TOPIC_KEY = "output.topic.name" 113 | OUTPUT_BUCKET_KEY = "output.s3.bucket" 114 | BROKER_KEY = "bootstrap.servers" 115 | 116 | props = app_properties() 117 | 118 | input_property_map = property_map(props, INPUT_PROPERTY_GROUP_KEY) 119 | output_property_map = property_map(props, CONSUMER_PROPERTY_GROUP_KEY) 120 | 121 | input_stream = input_property_map[INPUT_TOPIC_KEY] 122 | broker = input_property_map[BROKER_KEY] 123 | 124 | output_stream_sns = output_property_map[OUTPUT_TOPIC_KEY] 125 | output_s3_bucket = output_property_map[OUTPUT_BUCKET_KEY] 126 | 127 | input_table = "input_table" 128 | output_table_sns = "output_table_sns" 129 | output_table_s3 = "output_table_s3" 130 | 131 | table_env.execute_sql(create_table_input(input_table, input_stream, broker)) 132 | table_env.execute_sql(create_table_output_kafka(output_table_sns, output_stream_sns, broker)) 133 | table_env.execute_sql(create_table_output_s3(output_table_s3, output_s3_bucket)) 134 | 135 | statement_set.add_insert_sql(insert_stream_sns(input_table, output_table_sns)) 136 | statement_set.add_insert_sql(insert_stream_s3(input_table, output_table_s3)) 137 | 138 | statement_set.execute() 139 | 140 | 141 | if __name__ == '__main__': 142 | main() 143 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Build real time streaming application using Apache Flink Python API with Kinesis Data Analytics 2 | 3 | -------- 4 | > #### 🚨 August 30, 2023: Amazon Kinesis Data Analytics has been renamed to [Amazon Managed Service for Apache Flink](https://aws.amazon.com/managed-service-apache-flink). 5 | 6 | -------- 7 | 8 | 9 | This repository contains sample code for building a Python application for Apache Flink on Kinesis Data Analytics. 10 | 11 | This project demonstrates how to use Apache Flink Python API on Kinesis Data Analytics using two working examples. Follow the [blogpost](https://aws.amazon.com/blogs/big-data/build-a-real-time-streaming-application-using-apache-flink-python-api-with-amazon-kinesis-data-analytics/) to get step by step guideline on creating a Flink Python application on Kinesis Data Analytics. 12 | 13 | 14 | ## License 15 | 16 | This library is licensed under the MIT-0 License. See the LICENSE file. 17 | 18 | -------------------------------------------------------------------------------- /lambda-functions/kfpLambdaConsumerSNS.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | 5 | import base64 6 | import boto3 7 | import json 8 | import os 9 | 10 | sns = boto3.client('sns') 11 | 12 | def lambda_handler(event, context): 13 | topic_arn = os.environ["SNSTopicArn"] 14 | for partition_key, partition_value in event['records'].items(): 15 | for record_value in partition_value: 16 | data = json.loads(base64.b64decode(record_value['value'])) 17 | subject = "The sensor reading has exceeded the threshold" 18 | message = f"Sensor Id: {data['sensor_id']} has exceeded the set threshold at the window start time: {data['start_event_time']}" 19 | sns.publish( 20 | TargetArn=topic_arn, 21 | Message=message, 22 | Subject=subject 23 | ) 24 | -------------------------------------------------------------------------------- /lambda-functions/kfpLambdaCustomMSKConfig.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | from __future__ import print_function 5 | import boto3 6 | import json 7 | import random 8 | import string 9 | import urllib3 10 | 11 | http = urllib3.PoolManager() 12 | 13 | SUCCESS = "SUCCESS" 14 | FAILED = "FAILED" 15 | SERVER_PROPERTIES = b""" 16 | auto.create.topics.enable=true 17 | default.replication.factor=2 18 | """ 19 | 20 | 21 | def lambda_handler(event, context): 22 | kafka = boto3.client("kafka") 23 | physical_id = "None" 24 | random_id = ''.join(random.choices(string.ascii_uppercase + string.digits, k=5)) 25 | revision = 1 26 | if event["RequestType"] == "Create": 27 | config = kafka.create_configuration(Name=event["LogicalResourceId"] + "-" + random_id, 28 | ServerProperties=SERVER_PROPERTIES) 29 | physical_id = config["Arn"] 30 | revision = config["LatestRevision"]["Revision"] 31 | elif event["RequestType"] == "Delete": 32 | kafka.delete_configuration(Arn=event["PhysicalResourceId"]) 33 | 34 | send(event, context, SUCCESS, { 35 | "Revision": revision, 36 | "Arn": physical_id 37 | }, physical_id) 38 | 39 | 40 | def send(event, context, response_status, response_data, physical_resource_id=None): 41 | response_url = event['ResponseURL'] 42 | response_body = { 43 | 'Status': response_status, 44 | 'Reason': "See the details in CloudWatch Log Stream: {}".format(context.log_stream_name), 45 | 'PhysicalResourceId': physical_resource_id, 46 | 'StackId': event['StackId'], 47 | 'RequestId': event['RequestId'], 48 | 'LogicalResourceId': event['LogicalResourceId'], 49 | 'NoEcho': False, 50 | 'Data': response_data 51 | } 52 | 53 | json_response_body = json.dumps(response_body) 54 | 55 | 56 | headers = { 57 | 'content-type': '', 58 | 'content-length': str(len(json_response_body)) 59 | } 60 | 61 | try: 62 | response = http.request('PUT', response_url, headers=headers, body=json_response_body) 63 | print("Status code:", response.status) 64 | 65 | except Exception as e: 66 | print("send(..) failed executing http.request(..):", e) 67 | -------------------------------------------------------------------------------- /lambda-functions/kfpLambdaStreamProducer.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import boto3 5 | import datetime 6 | import json 7 | import os 8 | import random 9 | import time 10 | 11 | from kafka import KafkaProducer 12 | 13 | msk = boto3.client("kafka") 14 | 15 | def lambda_handler(event, context): 16 | cluster_arn = os.environ["mskClusterArn"] 17 | response = msk.get_bootstrap_brokers( 18 | ClusterArn=cluster_arn 19 | ) 20 | producer = KafkaProducer(security_protocol="PLAINTEXT", 21 | bootstrap_servers=response["BootstrapBrokerString"], 22 | value_serializer=lambda x: x.encode("utf-8")) 23 | for _ in range(1, 100): 24 | data = json.dumps({ 25 | "sensor_id": str(random.randint(1, 5)), 26 | "temperature": random.randint(27, 32), 27 | "event_time": datetime.datetime.now().isoformat() 28 | }) 29 | producer.send(os.environ["topicName"], value=data) 30 | time.sleep(1) 31 | -------------------------------------------------------------------------------- /lambda-functions/requirements.txt: -------------------------------------------------------------------------------- 1 | kafka_python==2.0.1 --------------------------------------------------------------------------------