├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── LambdaRedshiftDataApiETL.py ├── LambdaRedshiftDataApiETLRole.json ├── LambdaSetupRedshiftObjects.py ├── README.md ├── event-driven-redshift-pipeline.yaml ├── images ├── architecture.png ├── cfn_params.png ├── output-email-from-sns.png ├── output-logs.png └── output-redshift-console.png └── resources ├── EventBridgeRedshiftEventRule.txt ├── EventBridgeScheduledEventRule.txt └── RedshiftNotificationTopicSNS.txt /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *master* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 7 | the Software, and to permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 15 | 16 | -------------------------------------------------------------------------------- /LambdaRedshiftDataApiETL.py: -------------------------------------------------------------------------------- 1 | import json 2 | import time 3 | import unicodedata 4 | import traceback 5 | import sys 6 | from pip._internal import main 7 | 8 | # install latest version of boto3 9 | main(['install', '-I', '-q', 'boto3', '--target', '/tmp/', '--no-cache-dir', '--disable-pip-version-check']) 10 | sys.path.insert(0,'/tmp/') 11 | import boto3 12 | 13 | # initiate redshift-data client in boto3 14 | client = boto3.client("redshift-data") 15 | 16 | def handler(event, context): 17 | print(event) 18 | 19 | # input parameters passed from the caller event 20 | # cluster identifier for the Amazon Redshift cluster 21 | redshift_cluster_id = event['Input'].get('redshift_cluster_id') 22 | # database name for the Amazon Redshift cluster 23 | redshift_database = event['Input'].get('redshift_database') 24 | # database user in the Amazon Redshift cluster with access to execute relevant SQL queries 25 | redshift_user = event['Input'].get('redshift_user') 26 | # Amazon SNS topic to be used to publish notifications to end-users 27 | sns_topic_arn = event['Input'].get('sns_topic_arn') 28 | # action to be taken by the lambda function. Allowed values: [execute_sql, status_check, notify] 29 | action = event['Input'].get('action') 30 | # sql text to be executed. e.g. call my_stored_procedure(input_params) 31 | sql_text = event['Input'].get('sql_text') 32 | # subject line to be used while publishing message through Amazon SNS 33 | subject = event['Input'].get('subject') 34 | # detailed body to be used while publishing message through Amazon SNS 35 | body = event['Input'].get('body') 36 | # query_id to input for action status_check 37 | query_id = event['Input'].get('query_id') 38 | 39 | try: 40 | if action == 'execute_sql': 41 | # execute the input SQL statement in the specified Amazon Redshift cluster 42 | res = execute_sql(client, sql_text, redshift_database, redshift_user, redshift_cluster_id) 43 | elif action == "status_check": 44 | # check status of a previously executed query 45 | res = status_check(client, query_id) 46 | elif action == "notify": 47 | # notify to end-users by publishing message in Amazon SNS service 48 | res = notify(sns_topic_arn, subject, body) 49 | else: 50 | raise ValueError("Invalid Action: " + action) 51 | except Exception as e: 52 | subject = "Error:" + action + ":" + str(e) 53 | body = traceback.format_exc() 54 | notify(sns_topic_arn, subject, body) 55 | raise 56 | 57 | return {'statusCode': 200, 'body': json.dumps(res)} 58 | 59 | def execute_sql(client, sql_text, redshift_database, redshift_user, redshift_cluster_id, with_event=True): 60 | print("Executing: {}".format(sql_text)) 61 | res = client.execute_statement(Database=redshift_database, DbUser=redshift_user, Sql=sql_text, 62 | ClusterIdentifier=redshift_cluster_id, WithEvent=with_event) 63 | print(res) 64 | query_id = res["Id"] 65 | done = False 66 | while not done: 67 | time.sleep(1) 68 | status = status_check(client, query_id) 69 | if status in ("STARTED", "FAILED", "FINISHED"): 70 | print("status is: {}".format(status)) 71 | break 72 | return query_id 73 | 74 | def status_check(client, query_id): 75 | desc = client.describe_statement(Id=query_id) 76 | status = desc["Status"] 77 | if status == "FAILED": 78 | raise Exception('SQL query failed:' + query_id + ": " + desc["Error"]) 79 | return status.strip('"') 80 | 81 | def notify(sns_topic_arn, subject, body): 82 | subject = ("".join(ch for ch in subject if unicodedata.category(ch)[0] != "C"))[0:99] 83 | body = str(body) 84 | sns_client = boto3.client('sns') 85 | response = sns_client.publish( 86 | TargetArn=sns_topic_arn, 87 | Message=json.dumps({'default': json.dumps("{}"), 88 | 'sms': subject, 89 | 'email': body}), 90 | Subject=subject, 91 | MessageStructure='json' 92 | ) 93 | return "message sent" 94 | -------------------------------------------------------------------------------- /LambdaRedshiftDataApiETLRole.json: -------------------------------------------------------------------------------- 1 | { 2 | "Version": "2012-10-17", 3 | "Statement": [ 4 | { 5 | "Effect": "Allow", 6 | "Action": "redshift:GetClusterCredentials", 7 | "Resource": [ 8 | "arn:aws:redshift:us-west-2:123456789012:cluster:redshift-cluster-1", 9 | "arn:aws:redshift:us-west-2:123456789012:dbname:redshift-cluster-1/dev", 10 | "arn:aws:redshift:us-west-2:123456789012:dbuser:redshift-cluster-1/awsuser" 11 | ] 12 | }, 13 | { 14 | "Action": [ 15 | "logs:CreateLogGroup", 16 | "logs:CreateLogStream", 17 | "logs:PutLogEvents", 18 | "redshift-data:ExecuteStatement", 19 | "redshift-data:ListStatements", 20 | "redshift-data:GetStatementResult", 21 | "redshift-data:DescribeStatement" 22 | ], 23 | "Resource": "*", 24 | "Effect": "Allow" 25 | }, 26 | { 27 | "Action": [ 28 | "sns:Publish" 29 | ], 30 | "Resource": [ 31 | "arn:aws:sns:us-west-2:123456789012:redshift-notification-sns-topic" 32 | ], 33 | "Effect": "Allow" 34 | } 35 | ] 36 | } 37 | -------------------------------------------------------------------------------- /LambdaSetupRedshiftObjects.py: -------------------------------------------------------------------------------- 1 | import json 2 | import boto3 3 | import cfnresponse 4 | import logging 5 | 6 | logging.basicConfig() 7 | logger = logging.getLogger(__name__) 8 | logger.setLevel(logging.INFO) 9 | 10 | def handler(event, context): 11 | logger.info(json.dumps(event)) 12 | lambda_arn = event['ResourceProperties']['lambda_arn'] 13 | redshift_cluster_id = event['ResourceProperties']['redshift_cluster_id'] 14 | redshift_database = event['ResourceProperties']['redshift_database'] 15 | redshift_user = event['ResourceProperties']['redshift_user'] 16 | redshift_cluster_iam_role = event['ResourceProperties']['redshift_cluster_iam_role'] 17 | sns_topic_arn = event['ResourceProperties']['sns_topic_arn'] 18 | 19 | if event['RequestType'] == 'Delete': 20 | sql_text = ''' 21 | DROP PROCEDURE execute_elt_process(); 22 | DROP MATERIALIZED VIEW IF EXISTS nyc_yellow_taxi_volume_analysis; 23 | DROP TABLE IF EXISTS nyc_yellow_taxi; 24 | ''' 25 | response = invoke_redshift_data_api_lambda(lambda_arn, redshift_cluster_id, redshift_database, redshift_user, sql_text, sns_topic_arn) 26 | logger.info(response) 27 | cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Data': 'Delete complete'}) 28 | else: 29 | sql_text = ''' 30 | CREATE TABLE IF NOT EXISTS nyc_yellow_taxi 31 | (pickup_date DATE 32 | , pickup_datetime TIMESTAMP 33 | , dropoff_datetime TIMESTAMP 34 | , ratecode SMALLINT 35 | , passenger_count SMALLINT 36 | , trip_distance FLOAT4 37 | , fare_amount FLOAT4 38 | , total_amount FLOAT4 39 | , payment_type SMALLINT 40 | , vendorid VARCHAR(20)) 41 | SORTKEY(pickup_date); 42 | 43 | DROP MATERIALIZED VIEW IF EXISTS nyc_yellow_taxi_volume_analysis; 44 | 45 | CREATE MATERIALIZED VIEW nyc_yellow_taxi_volume_analysis 46 | AS 47 | SELECT 48 | DATE_TRUNC('mon',pickup_date) pickup_month 49 | , ROUND(AVG(trip_distance),2) avg_distance 50 | , ROUND(AVG(fare_amount),2) avg_fare 51 | , COUNT(1) total_trips 52 | , SUM(trip_distance) total_distance_per_month 53 | , SUM(fare_amount) total_fare_per_month 54 | FROM nyc_yellow_taxi 55 | GROUP BY 1; 56 | 57 | CREATE OR REPLACE PROCEDURE execute_elt_process() 58 | AS $$ 59 | BEGIN 60 | truncate table nyc_yellow_taxi; 61 | COPY nyc_yellow_taxi FROM 's3://event-driven-app-with-lambda-redshift/nyc_yellow_taxi_raw/' 62 | IAM_ROLE '{}' 63 | region 'us-west-2' delimiter '|'; 64 | REFRESH MATERIALIZED VIEW nyc_yellow_taxi_volume_analysis; 65 | END; 66 | $$ LANGUAGE plpgsql; 67 | ''' 68 | sql_text = sql_text.format(redshift_cluster_iam_role) 69 | response = invoke_redshift_data_api_lambda(lambda_arn, redshift_cluster_id, redshift_database, redshift_user, sql_text, sns_topic_arn) 70 | logger.info(response) 71 | cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Data': 'Create complete'}) 72 | 73 | def invoke_redshift_data_api_lambda(lambda_arn, redshift_cluster_id, redshift_database, redshift_user, sql_text, sns_topic_arn): 74 | client = boto3.client('lambda') 75 | 76 | lambda_payload = { 77 | "Input": { 78 | "redshift_cluster_id": redshift_cluster_id, 79 | "redshift_database": redshift_database, 80 | "redshift_user": redshift_user, 81 | "action": "execute_sql", 82 | "sql_text": sql_text, 83 | "sns_topic_arn": sns_topic_arn 84 | } 85 | } 86 | response = client.invoke( 87 | FunctionName=lambda_arn, 88 | InvocationType='Event', 89 | Payload=json.dumps(lambda_payload) 90 | ) 91 | return response 92 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Building Event Driven Application with AWS Lambda and Amazon Redshift Data API 2 | 3 | ## Introduction 4 | Event driven applications are becoming popular with many customers, where application execution happens in response to events. A primary benefit of this architecture is the decoupling of producer and consumer processes, allowing greater flexibility in application design and building decoupled processes. An example of an event driven application, that we implemented here is an automated workflow being triggered by an event, which executes series of transformations in the data warehouse, leveraging [Amazon Redshift](https://aws.amazon.com/redshift/), [AWS Lambda](https://aws.amazon.com/lambda/), [Amazon EventBridge](https://aws.amazon.com/eventbridge/) and [Amazon Simple Notification (SNS)](https://aws.amazon.com/sns/). 5 | 6 | In response to a schedule event defined in Amazon EventBridge, this application will automatically trigger an AWS Lambda Function to execute a stored procedure performing extract, load and transform (ELT) operations in Amazon Redshift data warehouse leveraging [Amazon Redshift Data API](https://docs.aws.amazon.com/redshift/latest/mgmt/data-api.html). This stored procedure would copy the source data from Amazon Simple Storage Service (Amazon S3) to Amazon Redshift and also aggregate the results. Once complete, it’ll send an event to Amazon EventBridge, which would then trigger a lambda function to send notification to end-users through Amazon SNS Service, to inform them about the availability of updated data in Amazon Redshift. 7 | 8 | This event driven serverless architecture offers greater extensibility and simplicity, making it easier to maintain, faster to release new features and also reduce the impact of changes. It also simplifies adding other components or third-party products to the application without much changes. 9 | 10 | 11 | ## Pre-requisites 12 | 13 | As a pre-requisite for creating the application explained in this blog, you should need to setup an Amazon Redshift cluster and associate it with an [AWS Identity and Access Management (IAM) Role](https://docs.aws.amazon.com/redshift/latest/mgmt/authorizing-redshift-service.html). If you don’t have that provisioned in your AWS account, please follow [Amazon Redshift getting started guide](https://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html) to set it up. 14 | 15 | ## Solution architecture 16 | 17 | We have used [NYC Yellow Taxi public dataset](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) for the year 2015. We have pre-populated this dataset in an Amazon S3 bucket folder “[event-driven-app-with-lambda-redshift/nyc_yellow_taxi_raw/](https://s3.console.aws.amazon.com/s3/buckets/event-driven-app-with-lambda-redshift/nyc_yellow_taxi_raw/?region=us-west-2&tab=overview)”. 18 | 19 | The following architecture diagram highlights the end-to-end solution: 20 | ![Architecture Diagram](images/architecture.png) 21 | 22 | Below is the simple execution flow for this solution, which you may deploy with [CloudFormation template](event-driven-redshift-pipeline.yaml): 23 | 24 | 1. Database objects in the Amazon Redshift cluster: 25 | - Table "nyc_yellow_taxi" which will be used to copy above New York taxi dataset from Amazon S3. 26 | - Materialized view "nyc_yellow_taxi_volume_analysis" providing an aggregated view of above table 27 | - Stored procedure "execute_elt_process", to take care of data transformations 28 | 29 | 2. Amazon EventBridge rule, [EventBridgeScheduledEventRule](resources/EventBridgeScheduledEventRule.txt) to be triggered periodically based on a cron expression. 30 | 31 | 3. AWS IAM Role, “[LambdaRedshiftDataApiETLRole](LambdaRedshiftDataApiETLRole.json)” for AWS Lambda to allow below permissions: 32 | - Federate to the Amazon Redshift cluster through getClusterCredentials permission avoiding password credentials. 33 | - Execute queries in Amazon Redshift cluster through redshift-data API calls 34 | - Logging with AWS CloudWatch for troubleshooting purpose 35 | - Send notifications through Amazon Simple Notification Service (SNS) 36 | 37 | 4. AWS Lambda function, “[LambdaRedshiftDataApiETL](LambdaRedshiftDataApiETL.py)”, which is triggered automatically with action “execute_sql” as soon as above scheduled event gets executed. It performs an asynchronous call to the stored procedure "execute_elt_process" in Amazon Redshift, performing extract, load and transform (ELT) operations leveraging Amazon Redshift Data API functionality. This AWS Lambda function will execute queries in Amazon Redshift leveraging “redshift-data” client. Based on the input parameter “action”, this lambda function can asynchronously execute Structured Query Language (SQL) statements in Amazon Redshift and thus avoid chances of timing-out in case of long running SQL statements. It can also publish custom notifications through Amazon Simple Notification Service (SNS). Also, it uses [Amazon Redshift Data API](https://docs.aws.amazon.com/redshift/latest/mgmt/data-api.html) temporary credentials functionality, which allows it to communicate with Amazon Redshift using AWS Identity and Access Management (IAM)permission, without the need of any password-based authentication. With Data API, there is also no need to configure drivers and connections for your Amazon Redshift cluster, which is handled automatically. 38 | 39 | 5. Amazon EventBridge rule, “[EventBridgeRedshiftEventRule](resources/EventBridgeRedshiftEventRule.txt)” to automatically capture completion event, generated by above stored procedure call. This triggers above AWS Lambda function again with action "notify" 40 | 41 | 6. AWS Simple Notification Service (SNS) topic, [RedshiftNotificationTopicSNS](resources/RedshiftNotificationTopicSNS.txt) and subscription to your emailid send an automated email notification denoting completion of ELT process as triggered by the AWS Lambda function. 42 | 43 | 7. The database objects mentioned in Step#1 above are provisioned automatically by a lambda function, [LambdaSetupRedshiftObjects](LambdaSetupRedshiftObjects.py) as part of the [CloudFormation template](event-driven-redshift-pipeline.yaml) through an invocation of the lambda function, [LambdaRedshiftDataApiETL](LambdaRedshiftDataApiETL.py) created in step# 3 above 44 | 45 | 46 | ## Testing the code: 47 | 1. After setting up above solution, you should have an automated pipeline to trigger based on the schedule you defined in Amazon EventBridge scheduled rule’s cron expression. You may view Amazon CloudWatch logs and troubleshoot issues if any in the lambda function. Below is an example of the execution logs for reference: 48 | ![Amazon CloudWatch logs](images/output-logs.png) 49 | 50 | 2. You could also view the query execution status in Amazon Redshift Console, which would also allow you to view detailed execution plan for the queries you executed. One key thing to note here, though the stored procedure may take around six minutes to complete, both the executions of AWS Lambda function would finish within just few seconds. This is primarily because the executions from AWS Lambda on Amazon Redshift was asynchronous. Therefore, the lambda function gets completed after initiating the process in Amazon Redshift without caring about the query completion. 51 | ![Amazon Redshift Console Output](images/output-redshift-console.png) 52 | 53 | 3. After this process is complete, you will receive the notification email shown below to denote completion of the ELT process: 54 | ![Architecture Diagram](images/output-email-from-sns.png) 55 | 56 | 57 | ## Conclusion 58 | 59 | Amazon Redshift Data API enables you to painlessly interact with Amazon Redshift and enables you to build event-driven and cloud native applications. We demonstrated how to build an event driven application with Amazon Redshift, AWS Lambda and Amazon EventBridge. To learn more about Amazon Redshift Data API, please visit this [blog](https://aws.amazon.com/blogs/big-data/using-the-amazon-redshift-data-api-to-interact-with-amazon-redshift-clusters/) and the [documentation](https://docs.aws.amazon.com/redshift/latest/mgmt/data-api.html). 60 | 61 | 62 | ## Security 63 | 64 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. 65 | 66 | ## License 67 | 68 | This library is licensed under the MIT-0 License. See the LICENSE file. 69 | -------------------------------------------------------------------------------- /event-driven-redshift-pipeline.yaml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: "2010-09-09" 2 | Description: "CloudFormation Template to deploy an Event Driven Application with AWS Lambda and Amazon Redshift Data API" 3 | Parameters: 4 | RedshiftClusterIdentifier: 5 | Description: Cluster Identifier for your redshift cluster 6 | Type: String 7 | Default: 'redshift-cluster-1' 8 | DbUsername: 9 | Description: Redshift database user name which has access to run SQL Script. 10 | Type: String 11 | AllowedPattern: "([a-z])([a-z]|[0-9])*" 12 | Default: 'awsuser' 13 | DatabaseName: 14 | Description: Name of the Redshift database where SQL Script would be run. 15 | Type: String 16 | Default: 'dev' 17 | RedshiftIAMRoleName: 18 | Description: AWS IAM Role Name associated with the Redshift cluster 19 | Type: String 20 | Default: 'myRedshiftRole' 21 | NotificationEmailId: 22 | Type: String 23 | Description: EmailId to send event notifications through Amazon SNS 24 | AllowedPattern: '^(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$' 25 | ConstraintDescription: provide a valid email address. 26 | Default: 'yourname@company.com' 27 | ExecutionSchedule: 28 | Type: String 29 | Description: Cron expression to schedule extract-load-transform (ELT) process through EventBridge rule 30 | Default: 'cron(0 11 ? * MON-FRI *)' 31 | SqlText: 32 | Type: String 33 | Description: SQL Text to be run as part of the extract-load-transform (ELT) process 34 | Default: 'call run_elt_process();' 35 | Metadata: 36 | AWS::CloudFormation::Interface: 37 | ParameterGroups: 38 | - 39 | Label: 40 | default: "Input Parameters" 41 | Parameters: 42 | - RedshiftClusterIdentifier 43 | - DbUsername 44 | - DatabaseName 45 | - RedshiftIAMRoleName 46 | - NotificationEmailId 47 | - ExecutionSchedule 48 | - SqlText 49 | Resources: 50 | RedshiftNotificationTopicSNS: 51 | Type: AWS::SNS::Topic 52 | Properties: 53 | KmsMasterKeyId: alias/aws/sns 54 | Subscription: 55 | - Endpoint: !Ref NotificationEmailId 56 | Protocol: email 57 | LambdaRedshiftDataApiETLRole: 58 | Type: AWS::IAM::Role 59 | Properties: 60 | Description : IAM Role for lambda to access Redshift and SNS topic 61 | ManagedPolicyArns: 62 | - arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole 63 | AssumeRolePolicyDocument: 64 | Version: 2012-10-17 65 | Statement: 66 | - 67 | Effect: Allow 68 | Principal: 69 | Service: 70 | - lambda.amazonaws.com 71 | Action: 72 | - sts:AssumeRole 73 | Path: / 74 | Policies: 75 | - 76 | PolicyName: RedshiftAccessPolicy 77 | PolicyDocument : 78 | Version: 2012-10-17 79 | Statement: 80 | - 81 | Effect: Allow 82 | Action: redshift:GetClusterCredentials 83 | Resource: 84 | - !Sub arn:aws:redshift:${AWS::Region}:${AWS::AccountId}:cluster:${RedshiftClusterIdentifier} 85 | - !Sub arn:aws:redshift:${AWS::Region}:${AWS::AccountId}:dbname:${RedshiftClusterIdentifier}/${DatabaseName} 86 | - !Sub arn:aws:redshift:${AWS::Region}:${AWS::AccountId}:dbuser:${RedshiftClusterIdentifier}/${DbUsername} 87 | - 88 | Effect: "Allow" 89 | Action: 90 | - redshift-data:ExecuteStatement 91 | - redshift-data:ListStatements 92 | - redshift-data:GetStatementResult 93 | - redshift-data:DescribeStatement 94 | Resource: "*" 95 | - 96 | PolicyName: SNSPublishPolicy 97 | PolicyDocument : 98 | Version: 2012-10-17 99 | Statement: 100 | - 101 | Effect: Allow 102 | Action: sns:Publish 103 | Resource: !Ref RedshiftNotificationTopicSNS 104 | EventBridgeScheduledEventRule: 105 | Type: "AWS::Events::Rule" 106 | Properties: 107 | Description: Scheduled Event Rule to be triggered periodically based on cron expression. 108 | ScheduleExpression: !Ref ExecutionSchedule 109 | State: "ENABLED" 110 | Targets: 111 | - 112 | Arn: 113 | Fn::GetAtt: 114 | - "LambdaRedshiftDataApiETL" 115 | - "Arn" 116 | Id: ScheduledEventRedshiftELT 117 | Input: !Sub "{\"Input\":{\"redshift_cluster_id\":\"${RedshiftClusterIdentifier}\",\"redshift_database\":\"${DatabaseName}\",\"redshift_user\":\"${DbUsername}\",\"action\":\"run_sql\",\"sql_text\":\"${SqlText}\",\"sns_topic_arn\":\"${RedshiftNotificationTopicSNS}\"}}" 118 | PermissionForScheduledEventToInvokeLambda: 119 | Type: AWS::Lambda::Permission 120 | Properties: 121 | FunctionName: 122 | Ref: "LambdaRedshiftDataApiETL" 123 | Action: "lambda:InvokeFunction" 124 | Principal: "events.amazonaws.com" 125 | SourceArn: 126 | Fn::GetAtt: 127 | - "EventBridgeScheduledEventRule" 128 | - "Arn" 129 | EventBridgeRedshiftEventRule: 130 | Type: "AWS::Events::Rule" 131 | Description: Redshift Event Rule to automatically capture Redshift stored procedure completion event and trigger Lambda function to notify users 132 | Properties: 133 | EventPattern: !Sub "{\"source\": [\"aws.redshift-data\"],\"detail\": {\"principal\": [\"arn:aws:sts::${AWS::AccountId}:assumed-role/${LambdaRedshiftDataApiETLRole}/${LambdaRedshiftDataApiETL}\"]}}" 134 | Description: Respond to Redshift-data events 135 | State: "ENABLED" 136 | Targets: 137 | - 138 | Arn: !GetAtt 'LambdaRedshiftDataApiETL.Arn' 139 | Id: EventBridgeRedshiftEventRule 140 | InputTransformer: 141 | InputPathsMap: {"body":"$.detail"} 142 | InputTemplate: !Sub "{\"Input\":{\"redshift_cluster_id\":\"${RedshiftClusterIdentifier}\",\"redshift_database\":\"${DatabaseName}\",\"redshift_user\":\"${DbUsername}\",\"action\":\"notify\",\"subject\":\"Extract Load Transform process completed in Amazon Redshift\",\"body\":,\"sns_topic_arn\":\"${RedshiftNotificationTopicSNS}\"}}" 143 | PermissionForRedshiftEventToInvokeLambda: 144 | Type: AWS::Lambda::Permission 145 | Properties: 146 | FunctionName: 147 | Ref: "LambdaRedshiftDataApiETL" 148 | Action: "lambda:InvokeFunction" 149 | Principal: "events.amazonaws.com" 150 | SourceArn: 151 | Fn::GetAtt: 152 | - "EventBridgeRedshiftEventRule" 153 | - "Arn" 154 | LambdaRedshiftDataApiETL: 155 | DependsOn: 156 | - LambdaRedshiftDataApiETLRole 157 | Type: AWS::Lambda::Function 158 | Properties: 159 | Description: Lambda function to asynchronous call stored procedure with Amazon Redshift Data API and publish custom notifications through SNS 160 | Handler: index.handler 161 | Runtime: python3.7 162 | Role: !GetAtt 'LambdaRedshiftDataApiETLRole.Arn' 163 | Timeout: 30 164 | Code: 165 | ZipFile: | 166 | import json 167 | import time 168 | import unicodedata 169 | import traceback 170 | import sys 171 | from pip._internal import main 172 | 173 | # install latest version of boto3 174 | main(['install', '-I', '-q', 'boto3', '--target', '/tmp/', '--no-cache-dir', '--disable-pip-version-check']) 175 | sys.path.insert(0,'/tmp/') 176 | import boto3 177 | 178 | # initiate redshift-data client in boto3 179 | client = boto3.client("redshift-data") 180 | 181 | def handler(event, context): 182 | print(event) 183 | 184 | # input parameters passed from the caller event 185 | # cluster identifier for the Amazon Redshift cluster 186 | redshift_cluster_id = event['Input'].get('redshift_cluster_id') 187 | # database name for the Amazon Redshift cluster 188 | redshift_database = event['Input'].get('redshift_database') 189 | # database user in the Amazon Redshift cluster with access to run relevant SQL queries 190 | redshift_user = event['Input'].get('redshift_user') 191 | # Amazon SNS topic to be used to publish notifications to end-users 192 | sns_topic_arn = event['Input'].get('sns_topic_arn') 193 | # action to be taken by the lambda function. Allowed values: [run_sql, status_check, notify] 194 | action = event['Input'].get('action') 195 | # sql text to be run. e.g. call my_stored_procedure(input_params) 196 | sql_text = event['Input'].get('sql_text') 197 | # subject line to be used while publishing message through Amazon SNS 198 | subject = event['Input'].get('subject') 199 | # detailed body to be used while publishing message through Amazon SNS 200 | body = event['Input'].get('body') 201 | # query_id to input for action status_check 202 | query_id = event['Input'].get('query_id') 203 | 204 | try: 205 | if action == 'run_sql': 206 | # run the input SQL statement in the specified Amazon Redshift cluster 207 | res = run_sql(client, sql_text, redshift_database, redshift_user, redshift_cluster_id) 208 | elif action == "status_check": 209 | # check status of a previously run query 210 | res = status_check(client, query_id) 211 | elif action == "notify": 212 | # notify to end-users by publishing message in Amazon SNS service 213 | res = notify(sns_topic_arn, subject, body) 214 | else: 215 | raise ValueError("Invalid Action: " + action) 216 | except Exception as e: 217 | subject = "Error:" + action + ":" + str(e) 218 | body = traceback.format_exc() 219 | notify(sns_topic_arn, subject, body) 220 | raise 221 | 222 | return {'statusCode': 200, 'body': json.dumps(res)} 223 | 224 | def run_sql(client, sql_text, redshift_database, redshift_user, redshift_cluster_id, with_event=True): 225 | print("Executing: {}".format(sql_text)) 226 | 227 | res = client.execute_statement(Database=redshift_database, DbUser=redshift_user, Sql=sql_text, 228 | ClusterIdentifier=redshift_cluster_id, WithEvent=with_event) 229 | print(res) 230 | query_id = res["Id"] 231 | done = False 232 | while not done: 233 | time.sleep(1) 234 | status = status_check(client, query_id) 235 | if status in ("STARTED", "FAILED", "FINISHED"): 236 | print("status is: {}".format(status)) 237 | break 238 | return query_id 239 | 240 | def status_check(client, query_id): 241 | desc = client.describe_statement(Id=query_id) 242 | status = desc["Status"] 243 | if status == "FAILED": 244 | raise Exception('SQL query failed:' + query_id + ": " + desc["Error"]) 245 | return status.strip('"') 246 | 247 | def notify(sns_topic_arn, subject, body): 248 | subject = ("".join(ch for ch in subject if unicodedata.category(ch)[0] != "C"))[0:99] 249 | body = str(body) 250 | sns_client = boto3.client('sns') 251 | response = sns_client.publish( 252 | TargetArn=sns_topic_arn, 253 | Message=json.dumps({'default': json.dumps("{}"), 254 | 'sms': subject, 255 | 'email': body}), 256 | Subject=subject, 257 | MessageStructure='json' 258 | ) 259 | return "message sent" 260 | RedshiftS3AccessForEventAppBucket: 261 | Type: 'AWS::IAM::ManagedPolicy' 262 | Properties: 263 | PolicyDocument: 264 | Version: '2012-10-17' 265 | Statement: 266 | - Sid: AllowAllUsersToGetS3Objects 267 | Effect: Allow 268 | Action: 269 | - 's3:GetObject' 270 | Resource: 'arn:aws:s3:::event-driven-app-with-lambda-redshift/*' 271 | - Sid: AllowAllUsersToListS3Bucket 272 | Effect: Allow 273 | Action: 274 | - 's3:ListBucket' 275 | Resource: 'arn:aws:s3:::event-driven-app-with-lambda-redshift' 276 | Roles: [!Ref RedshiftIAMRoleName] 277 | SetupRedshiftObjectsLambdaRole: 278 | DependsOn: LambdaRedshiftDataApiETL 279 | Type: AWS::IAM::Role 280 | Properties: 281 | Description : IAM Role for lambda to invoke LambdaRedshiftDataApiETL lambda function 282 | AssumeRolePolicyDocument: 283 | Version: 2012-10-17 284 | Statement: 285 | - 286 | Effect: Allow 287 | Principal: 288 | Service: 289 | - lambda.amazonaws.com 290 | Action: 291 | - sts:AssumeRole 292 | Path: / 293 | Policies: 294 | - 295 | PolicyName: LambdaInvokePolicy 296 | PolicyDocument : 297 | Version: 2012-10-17 298 | Statement: 299 | - 300 | Effect: "Allow" 301 | Action: 302 | - lambda:InvokeFunction 303 | - lambda:InvokeAsync 304 | Resource: !GetAtt [LambdaRedshiftDataApiETL, Arn] 305 | - 306 | Effect: "Allow" 307 | Action: 308 | - logs:CreateLogGroup 309 | - logs:CreateLogStream 310 | - logs:PutLogEvents 311 | Resource: "*" 312 | LambdaSetupRedshiftObjects: 313 | Type: "AWS::Lambda::Function" 314 | DependsOn: SetupRedshiftObjectsLambdaRole 315 | Properties: 316 | Description: Lambda function to provision Redshift objects by invoking LambdaRedshiftDataApiETL lambda function as part of the CloudFormation initiation 317 | Handler: index.handler 318 | Role: !GetAtt 'SetupRedshiftObjectsLambdaRole.Arn' 319 | Runtime: python3.7 320 | Timeout: 30 321 | Code: 322 | ZipFile: | 323 | import json 324 | import boto3 325 | import cfnresponse 326 | import logging 327 | 328 | logging.basicConfig() 329 | logger = logging.getLogger(__name__) 330 | logger.setLevel(logging.INFO) 331 | 332 | def handler(event, context): 333 | logger.info(json.dumps(event)) 334 | lambda_arn = event['ResourceProperties']['lambda_arn'] 335 | redshift_cluster_id = event['ResourceProperties']['redshift_cluster_id'] 336 | redshift_database = event['ResourceProperties']['redshift_database'] 337 | redshift_user = event['ResourceProperties']['redshift_user'] 338 | redshift_cluster_iam_role = event['ResourceProperties']['redshift_cluster_iam_role'] 339 | sns_topic_arn = event['ResourceProperties']['sns_topic_arn'] 340 | 341 | if event['RequestType'] == 'Delete': 342 | sql_text = ''' 343 | DROP PROCEDURE run_elt_process(); 344 | DROP MATERIALIZED VIEW IF EXISTS nyc_yellow_taxi_volume_analysis; 345 | DROP TABLE IF EXISTS nyc_yellow_taxi; 346 | ''' 347 | response = invoke_redshift_data_api_lambda(lambda_arn, redshift_cluster_id, redshift_database, redshift_user, sql_text, sns_topic_arn) 348 | logger.info(response) 349 | cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Data': 'Delete complete'}) 350 | else: 351 | sql_text = ''' 352 | CREATE TABLE IF NOT EXISTS nyc_yellow_taxi 353 | (pickup_date DATE 354 | , pickup_datetime TIMESTAMP 355 | , dropoff_datetime TIMESTAMP 356 | , ratecode SMALLINT 357 | , passenger_count SMALLINT 358 | , trip_distance FLOAT4 359 | , fare_amount FLOAT4 360 | , total_amount FLOAT4 361 | , payment_type SMALLINT 362 | , vendorid VARCHAR(20)) 363 | SORTKEY(pickup_date); 364 | 365 | DROP MATERIALIZED VIEW IF EXISTS nyc_yellow_taxi_volume_analysis; 366 | 367 | CREATE MATERIALIZED VIEW nyc_yellow_taxi_volume_analysis 368 | AS 369 | SELECT 370 | DATE_TRUNC('mon',pickup_date) pickup_month 371 | , ROUND(AVG(trip_distance),2) avg_distance 372 | , ROUND(AVG(fare_amount),2) avg_fare 373 | , COUNT(1) total_trips 374 | , SUM(trip_distance) total_distance_per_month 375 | , SUM(fare_amount) total_fare_per_month 376 | FROM nyc_yellow_taxi 377 | GROUP BY 1; 378 | 379 | CREATE OR REPLACE PROCEDURE run_elt_process() 380 | AS $$ 381 | BEGIN 382 | truncate table nyc_yellow_taxi; 383 | COPY nyc_yellow_taxi FROM 's3://event-driven-app-with-lambda-redshift/nyc_yellow_taxi_raw/' 384 | IAM_ROLE '{}' 385 | region 'us-west-2' delimiter '|'; 386 | REFRESH MATERIALIZED VIEW nyc_yellow_taxi_volume_analysis; 387 | END; 388 | $$ LANGUAGE plpgsql; 389 | ''' 390 | sql_text = sql_text.format(redshift_cluster_iam_role) 391 | response = invoke_redshift_data_api_lambda(lambda_arn, redshift_cluster_id, redshift_database, redshift_user, sql_text, sns_topic_arn) 392 | logger.info(response) 393 | cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Data': 'Create complete'}) 394 | 395 | def invoke_redshift_data_api_lambda(lambda_arn, redshift_cluster_id, redshift_database, redshift_user, sql_text, sns_topic_arn): 396 | client = boto3.client('lambda') 397 | 398 | lambda_payload = { 399 | "Input": { 400 | "redshift_cluster_id": redshift_cluster_id, 401 | "redshift_database": redshift_database, 402 | "redshift_user": redshift_user, 403 | "action": "run_sql", 404 | "sql_text": sql_text, 405 | "sns_topic_arn": sns_topic_arn 406 | } 407 | } 408 | response = client.invoke( 409 | FunctionName=lambda_arn, 410 | InvocationType='Event', 411 | Payload=json.dumps(lambda_payload) 412 | ) 413 | return response 414 | 415 | SetupRedshiftObjects: 416 | Type: Custom::SetupRedshiftObjects 417 | DependsOn: 418 | - LambdaSetupRedshiftObjects 419 | Properties: 420 | ServiceToken: !GetAtt [LambdaSetupRedshiftObjects, Arn] 421 | lambda_arn: !GetAtt [LambdaRedshiftDataApiETL, Arn] 422 | redshift_cluster_id: !Ref RedshiftClusterIdentifier 423 | redshift_database: !Ref DatabaseName 424 | redshift_user: !Ref DbUsername 425 | redshift_cluster_iam_role: !Sub arn:aws:iam::${AWS::AccountId}:role/${RedshiftIAMRoleName} 426 | sns_topic_arn: !Ref RedshiftNotificationTopicSNS 427 | -------------------------------------------------------------------------------- /images/architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-lambda-redshift-event-driven-app/c7b977b8267104eaebe73ed8e5b9ab63466557b5/images/architecture.png -------------------------------------------------------------------------------- /images/cfn_params.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-lambda-redshift-event-driven-app/c7b977b8267104eaebe73ed8e5b9ab63466557b5/images/cfn_params.png -------------------------------------------------------------------------------- /images/output-email-from-sns.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-lambda-redshift-event-driven-app/c7b977b8267104eaebe73ed8e5b9ab63466557b5/images/output-email-from-sns.png -------------------------------------------------------------------------------- /images/output-logs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-lambda-redshift-event-driven-app/c7b977b8267104eaebe73ed8e5b9ab63466557b5/images/output-logs.png -------------------------------------------------------------------------------- /images/output-redshift-console.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-lambda-redshift-event-driven-app/c7b977b8267104eaebe73ed8e5b9ab63466557b5/images/output-redshift-console.png -------------------------------------------------------------------------------- /resources/EventBridgeRedshiftEventRule.txt: -------------------------------------------------------------------------------- 1 | EventBridgeRedshiftEventRule: 2 | Type: "AWS::Events::Rule" 3 | Description: redshift event capture rule 4 | Properties: 5 | EventPattern: !Sub "{\"source\": [\"aws.redshift-data\"],\"detail\": {\"principal\": [\"arn:aws:sts::${AWS::AccountId}:assumed-role/${LambdaRedshiftDataApiETLRole}/${LambdaRedshiftDataApiETL}\"]}}" 6 | Description: Respond to Redshift-data events 7 | State: "ENABLED" 8 | Targets: 9 | - 10 | Arn: !GetAtt 'LambdaRedshiftDataApiETL.Arn' 11 | Id: EventBridgeRedshiftEventRule 12 | InputTransformer: 13 | InputPathsMap: {"body":"$.detail"} 14 | InputTemplate: !Sub "{\"Input\":{\"redshift_cluster_id\":\"${RedshiftClusterIdentifier}\",\"redshift_database\":\"${DatabaseName}\",\"redshift_user\":\"${DbUsername}\",\"action\":\"notify\",\"subject\":\"Extract Load Transform process completed in Amazon Redshift\",\"body\":,\"sns_topic_arn\":\"${RedshiftNotificationTopicSNS}\"}}" 15 | PermissionForRedshiftEventToInvokeLambda: 16 | Type: AWS::Lambda::Permission 17 | Properties: 18 | FunctionName: 19 | Ref: "LambdaRedshiftDataApiETL" 20 | Action: "lambda:InvokeFunction" 21 | Principal: "events.amazonaws.com" 22 | SourceArn: 23 | Fn::GetAtt: 24 | - "EventBridgeRedshiftEventRule" 25 | - "Arn" 26 | -------------------------------------------------------------------------------- /resources/EventBridgeScheduledEventRule.txt: -------------------------------------------------------------------------------- 1 | EventBridgeScheduledEventRule: 2 | Type: "AWS::Events::Rule" 3 | Properties: 4 | Description: scheduled ELT event rule 5 | ScheduleExpression: !Ref ExecutionSchedule 6 | State: "ENABLED" 7 | Targets: 8 | - 9 | Arn: 10 | Fn::GetAtt: 11 | - "LambdaRedshiftDataApiETL" 12 | - "Arn" 13 | Id: ScheduledEventRedshiftELT 14 | Input: !Sub "{\"Input\":{\"redshift_cluster_id\":\"${RedshiftClusterIdentifier}\",\"redshift_database\":\"${DatabaseName}\",\"redshift_user\":\"${DbUsername}\",\"action\":\"execute_sql\",\"sql_text\":\"${SqlText}\",\"sns_topic_arn\":\"${RedshiftNotificationTopicSNS}\"}}" 15 | PermissionForScheduledEventToInvokeLambda: 16 | Type: AWS::Lambda::Permission 17 | Properties: 18 | FunctionName: 19 | Ref: "LambdaRedshiftDataApiETL" 20 | Action: "lambda:InvokeFunction" 21 | Principal: "events.amazonaws.com" 22 | SourceArn: 23 | Fn::GetAtt: 24 | - "EventBridgeScheduledEventRule" 25 | - "Arn" 26 | -------------------------------------------------------------------------------- /resources/RedshiftNotificationTopicSNS.txt: -------------------------------------------------------------------------------- 1 | RedshiftNotificationTopicSNS: 2 | Type: AWS::SNS::Topic 3 | Properties: 4 | Subscription: 5 | - Endpoint: !Ref NotificationEmailId 6 | Protocol: email 7 | --------------------------------------------------------------------------------