├── .gitignore ├── LICENSE ├── README.md ├── lambda-ebs-backup-cleanup.py ├── lambda-ebs-backup.py └── lambda-ebs-copy.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.swp 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2016 Chris Machler 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # aws-lambda-ebs-backups 2 | Python scripts to be run using AWS's Lambda service to Create and Delete Snapshots of EBS Volumes 3 | 4 | # THIS REPOSITORY IS DEPRECIATED AND NOT ACTIVELY MAINTAINED. 5 | 6 | [I would recommend to start using Amazon Data Lifecycle Manager.](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/snapshot-lifecycle.html) 7 | 8 | [Read my blog post for more details on setting this up in Lambda if you have not used it before.] (http://www.evergreenitco.com/evergreenit-blog/2016/4/19/aws-ebs-backup-job-run-by-lambda) 9 | 10 | ## Setting Up IAM Permissions 11 | 12 | First create an IAM policy called "ebs-backup-worker" with the following policy document: 13 | 14 | ``` 15 | { 16 | "Version": "2012-10-17", 17 | "Statement": [ 18 | { 19 | "Effect": "Allow", 20 | "Action": [ 21 | "logs:*" 22 | ], 23 | "Resource": "arn:aws:logs:*:*:*" 24 | }, 25 | { 26 | "Effect": "Allow", 27 | "Action": "ec2:Describe*", 28 | "Resource": "*" 29 | }, 30 | { 31 | "Effect": "Allow", 32 | "Action": [ 33 | "ec2:CreateSnapshot", 34 | "ec2:CopySnapshot", 35 | "ec2:DeleteSnapshot", 36 | "ec2:CreateTags", 37 | "ec2:ModifySnapshotAttribute", 38 | "ec2:ResetSnapshotAttribute" 39 | ], 40 | "Resource": [ 41 | "*" 42 | ] 43 | }, 44 | { 45 | "Effect": "Allow", 46 | "Action": [ 47 | "sns:Publish" 48 | ], 49 | "Resource": "*" 50 | } 51 | ] 52 | } 53 | ``` 54 | 55 | Next create an IAM role also called "ebs-backup-worker" select "AWS Lambda" as the Role type, then attach the "ebs-backup-worker" policy created above. When completed and you check the trust relationship in the role through "Edit Trust Relationship" it should look like below: 56 | 57 | ``` 58 | { 59 | "Version": "2012-10-17", 60 | "Statement": [ 61 | { 62 | "Effect": "Allow", 63 | "Principal": { 64 | "Service": "lambda.amazonaws.com" 65 | }, 66 | "Action": "sts:AssumeRole" 67 | } 68 | ] 69 | } 70 | ``` 71 | 72 | ## Add the regions you want run the scripts against as a Python Base64 encoded string Lambda environment variable "aws_regions". 73 | 74 | Since Lambda does not allow commas in the environment variable values, we cannot enter in a list for our regions we want to run the script against. To work around this we will Base64 encode the list/string, and then decode the string in our script and then "split" the string into a list again. 75 | 76 | Below is an example of using Python to Base64 encode our string: 77 | 78 | ``` 79 | ~$ python 80 | Python 2.7.12 (default, Nov 19 2016, 06:48:10) 81 | [GCC 5.4.0 20160609] on linux2 82 | Type "help", "copyright", "credits" or "license" for more information. 83 | >>> import base64 84 | >>> encoded = base64.b64encode(b'us-west-2,us-east-2') 85 | >>> encoded 86 | 'dXMtd2VzdC0yLHVzLWVhc3QtMg==' 87 | >>> data = base64.b64decode(encoded) 88 | >>> data 89 | 'us-west-2,us-east-2' 90 | >>> 91 | ``` 92 | **We will copy the encoded value and add it as the Lambda environment variable "aws_regions". When copying the encoded value please omit the single quotes in the output (ie. dXMtd2VzdC0yLHVzLWVhc3QtMg==).** 93 | 94 | ## Add the SNS Topics ARN you want publish as a Lambda environment variable "aws_sns_arn" 95 | 96 | This is optional environment variable if you want publish any topic, so you might receive email notification 97 | once backing up was executed. 98 | 99 | ## Create the Lambda Functions 100 | 101 | Create two functions in Lambda using the Python 2.7 runtime, one for the backup script and one for the cleanup script. I recommend just using the 128 MB memory setting, and adjust the timeout to 10 seconds (longer in a larger environment). Set the event source to be "CloudWatch Events - Schedule" and set the Schedule expression to be a cron expression of your liking i.e. "cron(0 6 * * ? *)" if you want the job to be kicked off at 06:00 UTC, set the cleanup job to run a few minutes later. 102 | Optionally a third function can be created using `lambda-ebs-copy.py` to copy snapshots to a different region for increased redundancy. The env variable `aws_copy_region` specifies the destination region of the copy. 103 | ## Tagging your EC2 instances to backup 104 | 105 | You will need to tag your instances in order for them to be backed up, below are the tags that will be used by the Lambda function: 106 | 107 | | Tag Key | Tag Value | Notes | 108 | | ------------- |:-------------: | -----:| 109 | | Backup | | Value Not Needed | 110 | | Retention | *Number of Days to Retain Snapshot* | Default is 7 Days| 111 | |Skip_Backup_Volumes|*volume id(s) in CSV string* | List either a single volume-id, or multiple volumes-ids in a Comma Separated Value String | 112 | 113 | ## More Info 114 | 115 | [Again if you need more details on setting this up please check my blog post.] (http://www.evergreenitco.com/evergreenit-blog/2016/4/19/aws-ebs-backup-job-run-by-lambda) 116 | -------------------------------------------------------------------------------- /lambda-ebs-backup-cleanup.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import re 3 | import datetime 4 | import base64 5 | import os 6 | import json 7 | 8 | base64_region = os.environ['aws_regions'] 9 | iam = boto3.client('iam') 10 | aws_sns_arn = os.getenv('aws_sns_arn', None) 11 | 12 | def send_to_sns(subject, message): 13 | if aws_sns_arn is None: 14 | return 15 | 16 | print "Sending notification to: %s" % aws_sns_arn 17 | 18 | client = boto3.client('sns') 19 | 20 | response = client.publish( 21 | TargetArn=aws_sns_arn, 22 | Message=message, 23 | Subject=subject) 24 | 25 | if 'MessageId' in response: 26 | print "Notification sent with message id: %s" % response['MessageId'] 27 | else: 28 | print "Sending notification failed with response: %s" % str(response) 29 | 30 | """ 31 | This function looks at *all* snapshots that have a "DeleteOn" tag containing 32 | the current day formatted as YYYY-MM-DD. This function should be run at least 33 | daily. 34 | """ 35 | 36 | def lambda_handler(event, context): 37 | decoded_regions = base64.b64decode(base64_region) 38 | regions = decoded_regions.split(',') 39 | 40 | print "Cleaning up snapshots in regions: %s" % regions 41 | 42 | for region in regions: 43 | ec = boto3.client('ec2', region_name=region) 44 | account_ids = list() 45 | try: 46 | """ 47 | You can replace this try/except by filling in `account_ids` yourself. 48 | Get your account ID with: 49 | > import boto3 50 | > iam = boto3.client('iam') 51 | > print iam.get_user()['User']['Arn'].split(':')[4] 52 | """ 53 | iam.get_user() 54 | except Exception as e: 55 | # use the exception message to get the account ID the function executes under 56 | account_ids.append(re.search(r'(arn:aws:sts::)([0-9]+)', str(e)).groups()[1]) 57 | 58 | 59 | delete_on = datetime.date.today().strftime('%Y-%m-%d') 60 | filters = [ 61 | {'Name': 'tag-key', 'Values': ['DeleteOn']}, 62 | {'Name': 'tag-value', 'Values': [delete_on]}, 63 | ] 64 | snapshot_response = ec.describe_snapshots(OwnerIds=account_ids, Filters=filters) 65 | 66 | print "Found %d snapshots that need deleting in region %s on %s" % ( 67 | len(snapshot_response['Snapshots']), 68 | region, 69 | delete_on) 70 | 71 | for snap in snapshot_response['Snapshots']: 72 | print "Deleting snapshot %s" % snap['SnapshotId'] 73 | ec.delete_snapshot(SnapshotId=snap['SnapshotId']) 74 | 75 | message = "{} snapshots have been cleaned up in region {}".format(len(snapshot_response['Snapshots']), region) 76 | send_to_sns('EBS Backups Cleanup', message) 77 | -------------------------------------------------------------------------------- /lambda-ebs-backup.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import collections 3 | import datetime 4 | import base64 5 | import os 6 | import json 7 | import itertools 8 | 9 | base64_region = os.environ['aws_regions'] 10 | aws_sns_arn = os.getenv('aws_sns_arn', None) 11 | 12 | def send_to_sns(subject, message): 13 | if aws_sns_arn is None: 14 | return 15 | 16 | print "Sending notification to: %s" % aws_sns_arn 17 | 18 | client = boto3.client('sns') 19 | 20 | response = client.publish( 21 | TargetArn=aws_sns_arn, 22 | Message=message, 23 | Subject=subject) 24 | 25 | if 'MessageId' in response: 26 | print "Notification sent with message id: %s" % response['MessageId'] 27 | else: 28 | print "Sending notification failed with response: %s" % str(response) 29 | 30 | def lambda_handler(event, context): 31 | decoded_regions = base64.b64decode(base64_region) 32 | regions = decoded_regions.split(',') 33 | 34 | print "Backing up instances in regions: %s" % regions 35 | 36 | for region in regions: 37 | ec = boto3.client('ec2', region_name=region) 38 | reservations = ec.describe_instances( 39 | Filters=[ 40 | {'Name': 'tag-key', 'Values': ['backup', 'Backup']}, 41 | ] 42 | ).get( 43 | 'Reservations', [] 44 | ) 45 | 46 | instances = sum( 47 | [ 48 | [i for i in r['Instances']] 49 | for r in reservations 50 | ], []) 51 | 52 | print "Found %d instances that need backing up in region %s" % (len(instances), region) 53 | 54 | to_tag_retention = collections.defaultdict(list) 55 | to_tag_mount_point = collections.defaultdict(list) 56 | 57 | for instance in instances: 58 | try: 59 | retention_days = [ 60 | int(t.get('Value')) for t in instance['Tags'] 61 | if t['Key'] == 'Retention'][0] 62 | except IndexError: 63 | retention_days = 7 64 | 65 | try: 66 | skip_volumes = [ 67 | str(t.get('Value')).split(',') for t in instance['Tags'] 68 | if t['Key'] == 'Skip_Backup_Volumes'] 69 | except Exception: 70 | pass 71 | 72 | from itertools import chain 73 | skip_volumes_list = list(chain.from_iterable(skip_volumes)) 74 | 75 | for dev in instance['BlockDeviceMappings']: 76 | if dev.get('Ebs', None) is None: 77 | continue 78 | vol_id = dev['Ebs']['VolumeId'] 79 | if vol_id in skip_volumes_list: 80 | print "Volume %s is set to be skipped, not backing up" % (vol_id) 81 | continue 82 | dev_attachment = dev['DeviceName'] 83 | print "Found EBS volume %s on instance %s attached to %s" % ( 84 | vol_id, instance['InstanceId'], dev_attachment) 85 | 86 | instance_name = '' 87 | try: 88 | instance_name = [ x['Value'] for x in instance['Tags'] if x['Key'] == 'Name' ][0] 89 | except IndexError: 90 | pass 91 | snap = ec.create_snapshot( 92 | VolumeId=vol_id, 93 | Description='{} {}'.format(instance_name, instance['InstanceId']) 94 | ) 95 | 96 | to_tag_retention[retention_days].append(snap['SnapshotId']) 97 | to_tag_mount_point[vol_id].append(snap['SnapshotId']) 98 | 99 | 100 | print "Retaining snapshot %s of volume %s from instance %s for %d days" % ( 101 | snap['SnapshotId'], 102 | vol_id, 103 | instance['InstanceId'], 104 | retention_days, 105 | ) 106 | 107 | ec.create_tags( 108 | Resources=to_tag_mount_point[vol_id], 109 | Tags=[ 110 | {'Key': 'Name', 'Value': dev_attachment}, 111 | ] 112 | ) 113 | 114 | for retention_days in to_tag_retention.keys(): 115 | delete_date = datetime.date.today() + datetime.timedelta(days=retention_days) 116 | delete_fmt = delete_date.strftime('%Y-%m-%d') 117 | print "Will delete %d snapshots on %s" % (len(to_tag_retention[retention_days]), delete_fmt) 118 | ec.create_tags( 119 | Resources=to_tag_retention[retention_days], 120 | Tags=[ 121 | {'Key': 'DeleteOn', 'Value': delete_fmt}, 122 | ] 123 | ) 124 | 125 | message = "{} instances have been backed up in region {}".format(len(instances), region) 126 | send_to_sns('EBS Backups', message) 127 | -------------------------------------------------------------------------------- /lambda-ebs-copy.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import re 3 | import datetime 4 | import base64 5 | import os 6 | import json 7 | 8 | COPY_LIMIT = 5 9 | RETENTION_DAYS = 7 10 | CROSS_COPIED_TAG = 'CrossCopied' 11 | base64_region = os.environ['aws_regions'] 12 | copy_region = os.environ['aws_copy_region'] 13 | iam = boto3.client('iam') 14 | aws_sns_arn = os.getenv('aws_sns_arn', None) 15 | 16 | 17 | def snapshot_is_copied(snap): 18 | for kv in snap['Tags']: 19 | if 'Key' in kv and kv['Key'] == CROSS_COPIED_TAG: 20 | return True 21 | return False 22 | 23 | 24 | def send_to_sns(subject, message): 25 | if aws_sns_arn is None: 26 | return 27 | 28 | print "Sending notification to: %s" % aws_sns_arn 29 | 30 | client = boto3.client('sns') 31 | 32 | response = client.publish( 33 | TargetArn=aws_sns_arn, 34 | Message=message, 35 | Subject=subject) 36 | 37 | if 'MessageId' in response: 38 | print "Notification sent with message id: %s" % response['MessageId'] 39 | else: 40 | print "Sending notification failed with response: %s" % str(response) 41 | 42 | 43 | def lambda_handler(event, context): 44 | """ 45 | This function copies snapshots to a different region 46 | for increased redundancy. 47 | """ 48 | decoded_regions = base64.b64decode(base64_region) 49 | regions = decoded_regions.split(',') 50 | 51 | print "Copying snapshots in region: %s" % copy_region 52 | 53 | dest_conn = boto3.client('ec2', region_name=copy_region) 54 | # There is a COPY_LIMIT concurrent copy operations limit 55 | copy_limit_counter = 0 56 | for region in regions: 57 | ec = boto3.client('ec2', region_name=region) 58 | account_ids = list() 59 | try: 60 | """ 61 | You can replace this try/except by filling in `account_ids` yourself. 62 | Get your account ID with: 63 | > import boto3 64 | > iam = boto3.client('iam') 65 | > print iam.get_user()['User']['Arn'].split(':')[4] 66 | """ 67 | iam.get_user() 68 | except Exception as e: 69 | # use the exception message to get the account ID the function executes under 70 | account_ids.append( 71 | re.search(r'(arn:aws:sts::)([0-9]+)', str(e)).groups()[1]) 72 | 73 | delete_on = datetime.date.today() + datetime.timedelta(days=RETENTION_DAYS) 74 | filters = [ 75 | {'Name': 'tag-key', 'Values': ['DeleteOn']}, 76 | {'Name': 'tag-value', 'Values': [delete_on.strftime('%Y-%m-%d')]}, 77 | ] 78 | snapshot_response = ec.describe_snapshots( 79 | OwnerIds=account_ids, Filters=filters) 80 | 81 | print "Analyzing %d snapshots for copying in region %s" % ( 82 | len(snapshot_response['Snapshots']), 83 | region) 84 | 85 | for snap in snapshot_response['Snapshots']: 86 | if copy_limit_counter == COPY_LIMIT: 87 | return "{} copies limit reached".format(COPY_LIMIT) 88 | if snapshot_is_copied(snap): 89 | continue 90 | print "Copying snapshot %s" % snap['SnapshotId'] 91 | copied_snap = dest_conn.copy_snapshot(SourceRegion=region, SourceSnapshotId=snap['SnapshotId'], 92 | Description='Cross copied from {} for {}'.format( 93 | region, snap['Description']) 94 | ) 95 | dest_conn.create_tags( 96 | Resources=[copied_snap['SnapshotId']], 97 | Tags=[ 98 | { 99 | 'Key': 'DeleteOn', 100 | 'Value': [x['Value'] for x in snap['Tags'] if x['Key'] == 'DeleteOn'][0], 101 | }, 102 | ], 103 | ) 104 | ec.create_tags( 105 | Resources=[snap['SnapshotId']], 106 | Tags=[ 107 | { 108 | 'Key': CROSS_COPIED_TAG, 109 | 'Value': copy_region, 110 | }, 111 | ], 112 | ) 113 | copy_limit_counter += 1 114 | 115 | message = "started copying {} snapshots in region {}".format( 116 | len(snapshot_response['Snapshots']), region) 117 | send_to_sns('EBS Copying Done', message) 118 | --------------------------------------------------------------------------------