├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── RedshiftCommands.py ├── RedshiftCommands.yaml └── cloudformation-launch-stack.png /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *master* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | 61 | We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes. 62 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 7 | the Software, and to permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 15 | 16 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Execute Amazon Redshift Commands using AWS Glue 2 | 3 | This project demonstrates how to use a **AWS Glue Python Shell Job** to connect to your **Amazon Redshift** cluster and execute a SQL script stored in Amazon S3. Amazon Redshift SQL scripts can contain commands such as bulk loading using the COPY statement or data transformation using DDL & DML SQL statements. Leveraging this strategy, customers can migrate from their existing ETL and ELT infrastructure to a more cost-effective serverless framework. 4 | 5 | ## Cloud Formation 6 | 7 | The below **AWS Cloud Formation Template** will deploy the necessary components to build your first AWS Glue Job along with necessary components to ensure the connection between the various components is secure. The template will either load a sample script containing TPC-DS data or you can provide your own SQL script located in the same AWS Region. Once deployed, create additional AWS Glue triggers and/or workflows to invoke the **RedsdhiftCommands** job passing in any additional script you'd like. 8 | 9 | [![Launch](cloudformation-launch-stack.png)](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=RedshiftCommands&templateURL=https://s3-us-west-2.amazonaws.com/redshift-immersionday-labs/RedshiftCommands.yaml) 10 | 11 | ## Solution Components 12 | 13 | The following are the re-usable components of the AWS Cloud Formation Template: 14 | 1. **AWS Glue Bucket** - This bucket will hold the script which the AWS Glue Python Shell Job will execute. 15 | 1. **AWS Glue Connection** - This connection is used to ensure the AWS Glue Job will run within the same Amazon VPC as Amazon Redshift Cluster. 16 | 1. **Secrets Manager Secret** - This Secret is stored in the Secrets Manager and will contain the credentials to the Amazon Redshift cluster. 17 | 1. **Amazon VPC Endpoints** - 2 Amazon VPC Endpoints are deployed to ensure that Secrets Manager and S3 which are two services which run outside the VPC are accessible within the same Amazon VPC as the AWS Glue Job and Amazon Redshift Cluster. 18 | 1. **IAM Role** - This IAM Role is used by the AWS Glue job and requires read access to the Secrets Manager Secret as well as the Amazon S3 location of the python script used in the AWS Glue Job and the Amazon Redshift script. 19 | 1. **AWS Glue Job** - This AWS Glue Job will be the compute engine to execute your script. AWS Glue Python Shell jobs are optimal for this type of workload because there is no timeout and it has a very small cost per execution second. The job will take two required parameters and one optional parameter: 20 | * *Secret* - The Secrets Manager Secret ARN containing the Amazon Redshift connection information. 21 | * *SQLScript* - The Amazon S3 Script Loction of the Script in S3 containing the Redshift Script. Note: The Role created above should have access to read from this location. 22 | * *Params* - (Optional) A comma separated list of script parameters. To use these parameters in your script use the syntax ${n}. 23 | 24 | ## Sample Job 25 | Included in the CloudFormation Template is a script containing CREATE table and COPY commands to load sample TPC-DS data into your Amazon Redshift cluster. Feel free to override this sample script with your your own SQL script located in the same AWS Region. Note: the script may be parameterized and those parameters can be fed a a comma seperated list into Params field of the cloud formation template. 26 | ``` 27 | https://redshift-demos.s3.amazonaws.com/sql/redshift-tpcds.sql 28 | ``` 29 | 30 | ## Code Walkthrough 31 | The following section describes the components of the code which make this solution possible. 32 | 33 | ### Get the Required Parameters 34 | This code will get value for the inputs **SQLScript** and **Secret**. It will error if both are not passed in: 35 | ```Python 36 | args = getResolvedOptions(sys.argv, [ 37 | 'SQLScript', 38 | 'Secret' 39 | ]) 40 | 41 | script = args['SQLScript'] 42 | secret = args['Secret'] 43 | ``` 44 | 45 | ### Get the Cluster Connection Information 46 | This code will first get the connection parameters from the AWS Secrets Manager and use those values to make a connection to Redhshift leveraging the PyGreSQL library. 47 | ```Python 48 | secmgr = boto3.client('secretsmanager') 49 | secret = secmgr.get_secret_value(SecretId=secret) 50 | secretString = json.loads(secret["SecretString"]) 51 | user = secretString["user"] 52 | password = secretString["password"] 53 | host = secretString["host"] 54 | port = secretString["port"] 55 | database = secretString["database"] 56 | conn = pgdb.connect(database=database, host=host, user=user, password=password, port=port) 57 | ``` 58 | 59 | ### Get the contents of the S3 Script 60 | This code will get the S3 object containing the Redshift SQL script and store it into the statements variable. 61 | ```Python 62 | import boto3 63 | s3 = boto3.resource('s3') 64 | o = urlparse(script) 65 | bucket = o.netloc 66 | key = o.path 67 | obj = s3.Object(bucket, key.lstrip('/')) 68 | statements = obj.get()['Body'].read().decode('utf-8') 69 | ``` 70 | 71 | ### Get the Optional parameters 72 | This code will first determine if the **Params** input was provided, if so, it will get the value and replace the values matching the pattern ${n} in the *statements* variable. 73 | ```Python 74 | params = '' 75 | if ('--{}'.format('Params') in sys.argv): 76 | params = getResolvedOptions(sys.argv, ['Params'])['Params'] 77 | paramdict = params.split(',') 78 | for i, param in enumerate(paramdict, start=1): 79 | statements = statements.replace('${'+str(i)+'}', param.strip()) 80 | ``` 81 | 82 | ### Run each Statement 83 | This code will parse and execute each statement using the semicolon (;) as a delimiter. 84 | ```Python 85 | for statement in statements.split(';'): 86 | statement = statement.strip() 87 | if statement != '': 88 | print("Running Statement: --%s--" % statement) 89 | cursor.execute(statement) 90 | conn.commit() 91 | ``` 92 | 93 | ## License 94 | 95 | This library is licensed under the MIT-0 License. See the LICENSE file. 96 | 97 | -------------------------------------------------------------------------------- /RedshiftCommands.py: -------------------------------------------------------------------------------- 1 | import json 2 | import boto3 3 | import sys 4 | from awsglue.utils import getResolvedOptions 5 | import logging 6 | import pgdb 7 | from urllib.parse import urlparse 8 | 9 | logging.basicConfig() 10 | logger = logging.getLogger(__name__) 11 | logger.setLevel(logging.DEBUG) 12 | 13 | # Required Inputs 14 | args = getResolvedOptions(sys.argv, [ 15 | 'SQLScript', 16 | 'Secret' 17 | ]) 18 | 19 | script = args['SQLScript'] 20 | secret = args['Secret'] 21 | 22 | print('Secret is: %s' % secret) 23 | print('Script is: %s' % script) 24 | 25 | # Connect to the cluster 26 | try: 27 | print ('Getting Connection Info') 28 | 29 | secmgr = boto3.client('secretsmanager') 30 | secret = secmgr.get_secret_value(SecretId=secret) 31 | secretString = json.loads(secret["SecretString"]) 32 | user = secretString["user"] 33 | password = secretString["password"] 34 | host = secretString["host"] 35 | port = secretString["port"] 36 | database = secretString["database"] 37 | 38 | print('Connecting to Redshift: %s' % host) 39 | conn = pgdb.connect(database=database, host=host, user=user, password=password, port=port) 40 | print('Successfully Connected to Cluster') 41 | 42 | # create a new cursor for methods to run through 43 | cursor = conn.cursor() 44 | statement = '' 45 | try: 46 | import boto3 47 | s3 = boto3.resource('s3') 48 | o = urlparse(script) 49 | bucket = o.netloc 50 | key = o.path 51 | obj = s3.Object(bucket, key.lstrip('/')) 52 | statements = obj.get()['Body'].read().decode('utf-8') 53 | 54 | # Optional Input: Parms 55 | parms = '' 56 | if ('--{}'.format('Params') in sys.argv): 57 | params = getResolvedOptions(sys.argv, ['Params'])['Params'] 58 | paramdict = params.split(',') 59 | for i, param in enumerate(paramdict, start=1): 60 | statements = statements.replace('${'+str(i)+'}', param.strip()) 61 | 62 | 63 | for statement in statements.split(';'): 64 | statement = statement.strip() 65 | if statement != '': 66 | print("Running Statement: --%s--" % statement) 67 | cursor.execute(statement) 68 | conn.commit() 69 | cursor.close() 70 | conn.close() 71 | 72 | except Exception as e: 73 | print(e) 74 | cursor.close() 75 | conn.close() 76 | raise 77 | 78 | except Exception as e: 79 | print(e) 80 | raise 81 | -------------------------------------------------------------------------------- /RedshiftCommands.yaml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: "2010-09-09" 2 | Description: "Service Catalog: Amazon Redshift Reference Architecture Template. This template builds a AWS Glue Job which can connect to user supplied Redshift Cluster and execute either a sample scripts to load TPC-DS data or a user-provided script. See https://github.com/aws-samples/amazon-redshift-commands-using-aws-glue for more info. (fdp-redshift002)" 3 | Parameters: 4 | DatabaseHostName: 5 | Description: The hostname on which the cluster accepts incoming connections. 6 | Type: String 7 | MasterUsername: 8 | Description: The user name which will be used to execute the SQL Script. 9 | Type: String 10 | AllowedPattern: "([a-z])([a-z]|[0-9])*" 11 | MasterUserPassword: 12 | Description: The password which will be used to execute the SQL Script. 13 | Type: String 14 | NoEcho: 'true' 15 | PortNumber: 16 | Description: The port number on which the cluster accepts incoming connections. 17 | Type: Number 18 | Default: '5439' 19 | DatabaseName: 20 | Description: The name of the database which will be used to execute the SQL Script. 21 | created 22 | Type: String 23 | Default: 'dev' 24 | AllowedPattern: "([a-z]|[0-9])+" 25 | Script: 26 | Description: Enter the s3 location of an SQL script located in your AWS Region that you'd like to execute in Redshift. The default script will load a 100GB TPC-DS dataset. 27 | Type: String 28 | Default: 's3://redshift-demos/sql/redshift-tpcds.sql' 29 | ScriptParameters: 30 | Description: A comma seperated list of parameters required by the script in the form of ${n}. For the default script enter a Role which is attached to the Redshift Cluster and which as S3 Read Access. See the following reference more detail on creating a Role for Redshift https://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-create-an-iam-role.html. 31 | Type: String 32 | Default: 'arn:aws:iam:::role/' 33 | Conditions: 34 | IsDefault: 35 | !Equals 36 | - !Ref Script 37 | - s3://redshift-demos/sql/redshift-tpcds.sql 38 | IsNotDefault: 39 | !Not 40 | - !Equals 41 | - !Ref Script 42 | - s3://redshift-demos/sql/redshift-tpcds.sql 43 | Metadata: 44 | AWS::CloudFormation::Interface: 45 | ParameterGroups: 46 | - 47 | Label: 48 | default: "Connection Details" 49 | Parameters: 50 | - DatabaseHostName 51 | - MasterUsername 52 | - MasterUserPassword 53 | - PortNumber 54 | - DatabaseName 55 | - Script 56 | Resources: 57 | GlueBucket: 58 | Type: AWS::S3::Bucket 59 | Properties: 60 | VersioningConfiguration: 61 | Status: Enabled 62 | BucketEncryption: 63 | ServerSideEncryptionConfiguration: 64 | - ServerSideEncryptionByDefault: 65 | SSEAlgorithm: AES256 66 | LambdaCFNCustomRole: 67 | Type: AWS::IAM::Role 68 | Properties: 69 | AssumeRolePolicyDocument: 70 | Version: 2012-10-17 71 | Statement: 72 | - 73 | Effect: Allow 74 | Principal: 75 | Service: 76 | - lambda.amazonaws.com 77 | Action: 78 | - sts:AssumeRole 79 | Path: / 80 | ManagedPolicyArns: 81 | - arn:aws:iam::aws:policy/AmazonS3FullAccess 82 | - arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole 83 | - arn:aws:iam::aws:policy/CloudWatchLogsFullAccess 84 | - arn:aws:iam::aws:policy/AmazonVPCFullAccess 85 | - arn:aws:iam::aws:policy/AmazonRedshiftReadOnlyAccess 86 | GlueLoadRedshiftRole: 87 | Type: AWS::IAM::Role 88 | Properties: 89 | AssumeRolePolicyDocument: 90 | Version: 2012-10-17 91 | Statement: 92 | - 93 | Effect: Allow 94 | Principal: 95 | Service: 96 | - glue.amazonaws.com 97 | Action: 98 | - sts:AssumeRole 99 | Path: / 100 | Policies: 101 | - 102 | PolicyName: GlueGetSecretPolicy 103 | PolicyDocument : 104 | Version: 2012-10-17 105 | Statement: 106 | - 107 | Effect: Allow 108 | Action: 109 | - secretsmanager:GetSecretValue 110 | Resource: 111 | - Ref: Secret 112 | ManagedPolicyArns: 113 | - arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess 114 | - arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole 115 | - arn:aws:iam::aws:policy/CloudWatchLogsFullAccess 116 | Secret: 117 | Type: AWS::SecretsManager::Secret 118 | Properties: 119 | Description: Secret for Redshift Command Glue Job. 120 | SecretString: !Sub 121 | - "{\"user\": \"${user}\", \"password\": \"${pass}\", \"host\": \"${host}\", \"database\": \"${db}\", \"port\": \"${port}\"}" 122 | - {user: !Ref MasterUsername, pass: !Ref MasterUserPassword, host: !Ref DatabaseHostName, db: !Ref DatabaseName, port: !Ref PortNumber} 123 | GlueJobRedshiftCommands: 124 | Type: AWS::Glue::Job 125 | DependsOn: 126 | - InitCreateGlueConnection 127 | Properties: 128 | Role: !GetAtt 'GlueLoadRedshiftRole.Arn' 129 | ExecutionProperty: 130 | MaxConcurrentRuns: 10 131 | Connections: 132 | Connections: 133 | - Ref: InitCreateGlueConnection 134 | Command: 135 | Name: pythonshell 136 | PythonVersion: 3 137 | ScriptLocation: !Sub 138 | - s3://${bucket}/RedshiftCommands.py 139 | - {bucket: !Ref GlueBucket} 140 | DefaultArguments: 141 | "--job-bookmark-option" : "job-bookmark-disable" 142 | "--TempDir" : !Sub 143 | - s3://${bucket} 144 | - {bucket: !Ref GlueBucket} 145 | "--enable-metrics" : "" 146 | LambdaGlueJobRedshiftCommands: 147 | Type: "AWS::Lambda::Function" 148 | Properties: 149 | Role: !GetAtt 'LambdaCFNCustomRole.Arn' 150 | Timeout: 300 151 | Code: 152 | ZipFile: | 153 | import json 154 | import boto3 155 | import cfnresponse 156 | import logging 157 | 158 | logging.basicConfig() 159 | logger = logging.getLogger(__name__) 160 | logger.setLevel(logging.INFO) 161 | 162 | glue = boto3.client('glue') #start_job_run 163 | 164 | def handler(event, context): 165 | logger.info(json.dumps(event)) 166 | if event['RequestType'] != 'Create': 167 | cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Data': 'NA'}) 168 | else: 169 | try: 170 | sqlScript = event['ResourceProperties']['sqlScript'] 171 | secret = event['ResourceProperties']['secret'] 172 | params = event['ResourceProperties']['params'] 173 | jobName = event['ResourceProperties']['jobName'] 174 | 175 | response = glue.start_job_run( 176 | JobName=jobName, 177 | Arguments={ 178 | '--SQLScript':sqlScript, 179 | '--Secret':secret, 180 | '--Params':params}) 181 | 182 | message = 'Glue triggered successfully.' 183 | cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Data': message}) 184 | 185 | except Exception as e: 186 | message = 'Glue Job Issue' 187 | logger.info(e) 188 | cfnresponse.send(event, context, cfnresponse.FAILED, {'Data': message}) 189 | Handler: index.handler 190 | Runtime: python3.7 191 | DependsOn: 192 | - LambdaCFNCustomRole 193 | LambdaFunctionS3Copy: 194 | Type: "AWS::Lambda::Function" 195 | Properties: 196 | Timeout: 30 197 | Code: 198 | ZipFile: | 199 | import json 200 | import boto3 201 | import cfnresponse 202 | import logging 203 | 204 | logging.basicConfig() 205 | logger = logging.getLogger(__name__) 206 | logger.setLevel(logging.INFO) 207 | 208 | def handler(event, context): 209 | logger.info(json.dumps(event)) 210 | s3 = boto3.client('s3') #delete_object, copy_object 211 | s3BucketTarget = event['ResourceProperties']['s3BucketTarget'] 212 | s3Bucket = event['ResourceProperties']['s3Bucket'] 213 | s3Object = event['ResourceProperties']['s3Object'] 214 | 215 | if event['RequestType'] == 'Delete': 216 | try: 217 | s3.delete_object(Bucket=s3BucketTarget, Key=s3Object) 218 | s3.delete_object(Bucket=s3BucketTarget, Key=s3Object+'.temp') 219 | except Exception as e: 220 | logger.info(e) 221 | 222 | cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Data': 'Delete complete'}) 223 | 224 | else: 225 | try: 226 | s3.delete_object(Bucket=s3BucketTarget, Key=s3Object) 227 | except Exception as e: 228 | logger.info(e) 229 | try: 230 | s3.copy_object(Bucket=s3BucketTarget, CopySource=s3Bucket+"/"+s3Object, Key=s3Object) 231 | cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Data': 'Copy complete'}) 232 | 233 | except Exception as e: 234 | logger.error(e) 235 | cfnresponse.send(event, context, cfnresponse.FAILED, {'Data': 'Copy failed'}) 236 | 237 | Handler: index.handler 238 | Role: 239 | Fn::GetAtt: [LambdaCFNCustomRole, Arn] 240 | Runtime: python3.7 241 | DependsOn: 242 | - LambdaCFNCustomRole 243 | LambdaCreateGlueConnection: 244 | Type: "AWS::Lambda::Function" 245 | Properties: 246 | Timeout: 30 247 | Code: 248 | ZipFile: | 249 | import json 250 | import boto3 251 | import cfnresponse 252 | import logging 253 | 254 | logging.basicConfig() 255 | logger = logging.getLogger(__name__) 256 | logger.setLevel(logging.INFO) 257 | 258 | def handler(event, context): 259 | logger.info(json.dumps(event)) 260 | try: 261 | glue = boto3.client('glue') #delete_connection, create_connection 262 | rs = boto3.client('redshift') #describe_clusters, describe_cluster_subnet_groups 263 | accountId = event['ResourceProperties']['Account'] 264 | if event['RequestType'] == 'Delete': 265 | try: 266 | glue.delete_connection(CatalogId=accountId, ConnectionName=event['PhysicalResourceId']) 267 | except Exception as e: 268 | logger.info(e) 269 | cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Data': 'Delete complete'}) 270 | else: 271 | databaseHostName = event['ResourceProperties']['DatabaseHostName'] 272 | requestId = event['RequestId'] 273 | clusterId = databaseHostName.split('.')[0] 274 | cluster = rs.describe_clusters(ClusterIdentifier=clusterId)["Clusters"][0] 275 | availabilityzone = cluster["AvailabilityZone"] 276 | securitygroup = cluster["VpcSecurityGroups"][0]["VpcSecurityGroupId"] 277 | subnetgroupname = cluster["ClusterSubnetGroupName"] 278 | subnetgroup = rs.describe_cluster_subnet_groups(ClusterSubnetGroupName=subnetgroupname)["ClusterSubnetGroups"][0] 279 | for subnet in subnetgroup["Subnets"] : 280 | subnetid = subnet["SubnetIdentifier"] 281 | if (availabilityzone == subnet["SubnetAvailabilityZone"]["Name"]): 282 | break 283 | 284 | connectionInput = { 285 | 'Name':'GlueRedshiftConnection-'+requestId, 286 | 'ConnectionType':'JDBC', 287 | 'ConnectionProperties': { 288 | 'JDBC_CONNECTION_URL':'jdbc:redshift://host:9999/db', 289 | 'USERNAME':'user', 290 | 'PASSWORD':'password' 291 | }, 292 | 'PhysicalConnectionRequirements': { 293 | 'SubnetId': subnetid, 294 | 'SecurityGroupIdList': [securitygroup], 295 | 'AvailabilityZone':availabilityzone 296 | } 297 | } 298 | glue.create_connection(CatalogId=accountId, ConnectionInput=connectionInput) 299 | cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Data': 'Create successful'}, 'GlueRedshiftConnection-'+requestId) 300 | except Exception as e: 301 | logger.error(e) 302 | cfnresponse.send(event, context, cfnresponse.FAILED, {'Data': 'Failed'}) 303 | 304 | Handler: index.handler 305 | Role: 306 | Fn::GetAtt: [LambdaCFNCustomRole, Arn] 307 | Runtime: python3.7 308 | DependsOn: 309 | - LambdaCFNCustomRole 310 | LambdaCreateS3Connection: 311 | Type: "AWS::Lambda::Function" 312 | Properties: 313 | Timeout: 30 314 | Code: 315 | ZipFile: | 316 | import json 317 | import boto3 318 | import cfnresponse 319 | import logging 320 | 321 | logging.basicConfig() 322 | logger = logging.getLogger(__name__) 323 | logger.setLevel(logging.INFO) 324 | 325 | def handler(event, context): 326 | logger.info(json.dumps(event)) 327 | try: 328 | ec2 = boto3.client('ec2') #delete_vpc_endpoints, create_vpc_endpoint, describe_route_tables 329 | rs = boto3.client('redshift') #describe_clusters, describe_cluster_subnet_groups 330 | if event['RequestType'] == 'Delete': 331 | try: 332 | ec2.delete_vpc_endpoints(VpcEndpointIds=[event['PhysicalResourceId']]) 333 | except Exception as e: 334 | logger.info(e) 335 | cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Data': 'Delete complete'}) 336 | else: 337 | databaseHostName = event['ResourceProperties']['DatabaseHostName'] 338 | clusterId = databaseHostName.split('.')[0] 339 | cluster = rs.describe_clusters(ClusterIdentifier=clusterId)["Clusters"][0] 340 | vpc = cluster["VpcId"] 341 | availabilityzone = cluster["AvailabilityZone"] 342 | subnetgroupname = cluster["ClusterSubnetGroupName"] 343 | subnetgroup = rs.describe_cluster_subnet_groups(ClusterSubnetGroupName=subnetgroupname)["ClusterSubnetGroups"][0] 344 | for subnet in subnetgroup["Subnets"] : 345 | subnetid = subnet["SubnetIdentifier"] 346 | if (availabilityzone == subnet["SubnetAvailabilityZone"]["Name"]): 347 | break 348 | try: 349 | routetable = ec2.describe_route_tables(Filters=[{'Name': 'association.subnet-id','Values': [subnetid]}])["RouteTables"][0]["RouteTableId"] 350 | except: 351 | routetable = ec2.describe_route_tables()["RouteTables"][0]["RouteTableId"] 352 | region = event['ResourceProperties']['Region'] 353 | 354 | policyDocument = { 355 | "Version":"2012-10-17", 356 | "Statement":[{ 357 | "Effect":"Allow", 358 | "Principal": "*", 359 | "Action":"*", 360 | "Resource":"*" 361 | }] 362 | } 363 | try: 364 | response = ec2.create_vpc_endpoint( 365 | VpcEndpointType='Gateway', 366 | RouteTableIds=[routetable], 367 | VpcId=vpc, 368 | ServiceName='com.amazonaws.'+region+'.s3', 369 | PolicyDocument=json.dumps(policyDocument) 370 | ) 371 | cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Data': 'Create successful'}, response['VpcEndpoint']['VpcEndpointId']) 372 | except Exception as e: 373 | logger.error(e) 374 | if e.response["Error"]["Code"] == 'RouteAlreadyExists': 375 | cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Data': 'Create successful'}) 376 | else: 377 | cfnresponse.send(event, context, cfnresponse.FAILED, {'Data': 'Create Failed'}) 378 | except Exception as e: 379 | logger.error(e) 380 | cfnresponse.send(event, context, cfnresponse.FAILED, {'Data': 'Failed'}) 381 | 382 | Handler: index.handler 383 | Role: 384 | Fn::GetAtt: [LambdaCFNCustomRole, Arn] 385 | Runtime: python3.7 386 | DependsOn: 387 | - LambdaCFNCustomRole 388 | LambdaCreateSecretConnection: 389 | Type: "AWS::Lambda::Function" 390 | Properties: 391 | Timeout: 30 392 | Code: 393 | ZipFile: | 394 | import json 395 | import boto3 396 | import cfnresponse 397 | import logging 398 | 399 | logging.basicConfig() 400 | logger = logging.getLogger(__name__) 401 | logger.setLevel(logging.INFO) 402 | 403 | def handler(event, context): 404 | logger.info(json.dumps(event)) 405 | try: 406 | ec2 = boto3.client('ec2') #delete_vpc_endpoints, create_vpc_endpoint 407 | rs = boto3.client('redshift') #describe_clusters, describe_cluster_subnet_groups 408 | if event['RequestType'] == 'Delete': 409 | try: 410 | ec2.delete_vpc_endpoints(VpcEndpointIds=[event['PhysicalResourceId']]) 411 | except Exception as e: 412 | logger.info(e) 413 | cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Data': 'Delete complete'}) 414 | else: 415 | databaseHostName = event['ResourceProperties']['DatabaseHostName'] 416 | clusterId = databaseHostName.split('.')[0] 417 | cluster = rs.describe_clusters(ClusterIdentifier=clusterId)["Clusters"][0] 418 | vpc = cluster["VpcId"] 419 | availabilityzone = cluster["AvailabilityZone"] 420 | securitygroup = cluster["VpcSecurityGroups"][0]["VpcSecurityGroupId"] 421 | subnetgroupname = cluster["ClusterSubnetGroupName"] 422 | subnetgroup = rs.describe_cluster_subnet_groups(ClusterSubnetGroupName=subnetgroupname)["ClusterSubnetGroups"][0] 423 | for subnet in subnetgroup["Subnets"] : 424 | subnetid = subnet["SubnetIdentifier"] 425 | if (availabilityzone == subnet["SubnetAvailabilityZone"]["Name"]): 426 | break 427 | region = event['ResourceProperties']['Region'] 428 | 429 | try: 430 | response = ec2.create_vpc_endpoint( 431 | VpcEndpointType='Interface', 432 | SubnetIds=[subnetid], 433 | VpcId=vpc, 434 | ServiceName='com.amazonaws.'+region+'.secretsmanager', 435 | PrivateDnsEnabled=True, 436 | SecurityGroupIds=[securitygroup] 437 | ) 438 | cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Data': 'Create successful'}, response['VpcEndpoint']['VpcEndpointId']) 439 | except Exception as e: 440 | if 'there is already a conflicting DNS domain' in e.response["Error"]["Message"]: 441 | cfnresponse.send(event, context, cfnresponse.SUCCESS, {'Data': 'Create successful'}) 442 | else: 443 | cfnresponse.send(event, context, cfnresponse.FAILED, {'Data': 'Create failed'}) 444 | except Exception as e: 445 | logger.error(e) 446 | cfnresponse.send(event, context, cfnresponse.FAILED, {'Data': 'Failed'}) 447 | 448 | Handler: index.handler 449 | Role: 450 | Fn::GetAtt: [LambdaCFNCustomRole, Arn] 451 | Runtime: python3.7 452 | DependsOn: 453 | - LambdaCFNCustomRole 454 | ImportPyScript: 455 | Type: Custom::CopyScript 456 | DependsOn: 457 | - LambdaFunctionS3Copy 458 | - GlueBucket 459 | Properties: 460 | ServiceToken: 461 | Fn::GetAtt : [LambdaFunctionS3Copy, Arn] 462 | s3BucketTarget: 463 | Ref: GlueBucket 464 | s3Bucket: 'redshift-immersionday-labs' 465 | s3Object: 'RedshiftCommands.py' 466 | InitGlueJobRedshiftCommands: 467 | Condition: IsNotDefault 468 | Type: Custom::InitGlueJobRedshiftCommands 469 | DependsOn: 470 | - LambdaGlueJobRedshiftCommands 471 | - GlueJobRedshiftCommands 472 | - InitCreateS3Connection 473 | - InitCreateSecretConnection 474 | - ImportPyScript 475 | Properties: 476 | ServiceToken: !GetAtt 'LambdaGlueJobRedshiftCommands.Arn' 477 | sqlScript: !Ref Script 478 | secret: !Ref Secret 479 | params: !Ref ScriptParameters 480 | jobName: !Ref GlueJobRedshiftCommands 481 | InitGlueJobRedshiftCommandsDefault: 482 | Condition: IsDefault 483 | Type: Custom::InitGlueJobRedshiftCommands 484 | DependsOn: 485 | - LambdaGlueJobRedshiftCommands 486 | - GlueJobRedshiftCommands 487 | - InitCreateS3Connection 488 | - InitCreateSecretConnection 489 | - ImportPyScript 490 | Properties: 491 | ServiceToken: !GetAtt 'LambdaGlueJobRedshiftCommands.Arn' 492 | sqlScript: !Sub 493 | - s3://${bucket}/sql/redshift-tpcds.sql 494 | - {bucket: !Ref GlueBucket} 495 | secret: !Ref Secret 496 | params: !Ref ScriptParameters 497 | jobName: !Ref GlueJobRedshiftCommands 498 | ImportSqlScript: 499 | Type: Custom::CopyScript 500 | Condition: IsDefault 501 | DependsOn: 502 | - LambdaFunctionS3Copy 503 | - GlueBucket 504 | Properties: 505 | ServiceToken: 506 | Fn::GetAtt : [LambdaFunctionS3Copy, Arn] 507 | s3BucketTarget: 508 | Ref: GlueBucket 509 | s3Bucket: 'redshift-demos' 510 | s3Object: 'sql/redshift-tpcds.sql' 511 | InitCreateGlueConnection: 512 | Type: Custom::InitCreateGlueConnection 513 | DependsOn: 514 | - LambdaCreateGlueConnection 515 | Properties: 516 | ServiceToken: 517 | Fn::GetAtt : [LambdaCreateGlueConnection, Arn] 518 | DatabaseHostName: 519 | Ref: DatabaseHostName 520 | Account: !Sub "${AWS::AccountId}" 521 | InitCreateS3Connection: 522 | Type: Custom::InitCreateS3Connection 523 | DependsOn: 524 | - LambdaCreateS3Connection 525 | Properties: 526 | ServiceToken: 527 | Fn::GetAtt : [LambdaCreateS3Connection, Arn] 528 | DatabaseHostName: 529 | Ref: DatabaseHostName 530 | Region: !Sub "${AWS::Region}" 531 | InitCreateSecretConnection: 532 | Type: Custom::InitCreateSecretConnection 533 | DependsOn: 534 | - LambdaCreateSecretConnection 535 | Properties: 536 | ServiceToken: 537 | Fn::GetAtt : [LambdaCreateSecretConnection, Arn] 538 | DatabaseHostName: 539 | Ref: DatabaseHostName 540 | Region: !Sub "${AWS::Region}" 541 | -------------------------------------------------------------------------------- /cloudformation-launch-stack.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-redshift-commands-using-aws-glue/291bcbafcd19348be23c285d1913bb0bfb1c56bc/cloudformation-launch-stack.png --------------------------------------------------------------------------------