├── README.md └── fargate-spot-capacity-fail-handler ├── lambda.py ├── README.md └── template.yaml /README.md: -------------------------------------------------------------------------------- 1 | # ec2-spot-mania -------------------------------------------------------------------------------- /fargate-spot-capacity-fail-handler/lambda.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import json 3 | import logging 4 | import os 5 | import time 6 | 7 | logger = logging.getLogger('logger') 8 | logger.setLevel(logging.INFO) 9 | 10 | def log_error_message(e): 11 | logger.error(e.response['Error']['Code']) 12 | logger.error(e.response['Error']['Message']) 13 | 14 | def lambda_handler(event, context): 15 | logger.debug('Event received %s ' % event) 16 | client = boto3.client('ecs') 17 | clusterName = event['resources'][0].split('/')[1] 18 | serviceName = event['resources'][0].split('/')[2] 19 | 20 | 21 | try: 22 | response = client.update_service( 23 | cluster = clusterName, 24 | service = serviceName, 25 | capacityProviderStrategy=[ 26 | { 27 | 'capacityProvider': 'FARGATE', 28 | 'weight': 1 29 | }, 30 | ], 31 | forceNewDeployment=True, 32 | ) 33 | logger.debug('update service response %s ' % response) 34 | except ClientError as e: 35 | log_error_message(e) 36 | return ("cluster: %s | service: %s | updated" %(clusterName,serviceName)) 37 | -------------------------------------------------------------------------------- /fargate-spot-capacity-fail-handler/README.md: -------------------------------------------------------------------------------- 1 | # Fargate Spot insufficient capacity event handler 2 | Customers use AWS Fargate Spot to run interruption tolerant workloads. Fargate Spot uses spare capacity to run tasks, sometimes customers need to have a mechanism to take an action when spare capacity is not available. 3 | 4 | This solution provides that mechanism to take an action when Fargate Spot fails to launch tasks due to lack of spare capacity. The solution deploys an EventBridge rule to listen for task placement failure event and a lambda function to update the ECS service to run 100% on Fargate. 5 | 6 | ## Deploy stack 7 | 8 | ```bash 9 | aws cloudformation create-stack --stack-name fargate-spot-capacity-fail-handler --template-body file://template.yaml --capabilities CAPABILITY_IAM 10 | ``` 11 | 12 | ## Test 13 | * Create ECS service with Fargate Spot as the capacity provider 14 | * Set Fargate 'Platform version' to 1.4.0 15 | * At the time of creating this solution, Fargate Spot didn't have capacity with Platform version 1.4 which will trigger Task Placement Failure Event. 16 | * Lambda function should be triggered and switch the service to run 100% on Fargate. 17 | 18 | ## Delete stack 19 | 20 | ```bash 21 | aws cloudformation delete-stack --stack-name fargate-spot-capacity-fail-handler 22 | ``` 23 | 24 | ## Details 25 | 26 | Example Service Task Placement Failure Event 27 | 28 | Service task placement failure events are delivered in the following format. For more information about EventBridge parameters, see Events and Event Patterns in the Amazon EventBridge User Guide. 29 | 30 | In the following example, the task was attempting to use the FARGATE_SPOT capacity provider but the service scheduler was unable to acquire any Fargate Spot capacity. 31 | 32 | ```json 33 | { 34 | "version": "0", 35 | "id": "ddca6449-b258-46c0-8653-e0e3a6d0468b", 36 | "detail-type": "ECS Service Action", 37 | "source": "aws.ecs", 38 | "account": "111122223333", 39 | "time": "2019-11-19T19:55:38Z", 40 | "region": "us-west-2", 41 | "resources": [ 42 | "arn:aws:ecs:us-west-2:111122223333:service/default/servicetest" 43 | ], 44 | "detail": { 45 | "eventType": "ERROR", 46 | "eventName": "SERVICE_TASK_PLACEMENT_FAILURE", 47 | "clusterArn": "arn:aws:ecs:us-west-2:111122223333:cluster/default", 48 | "capacityProviderArns": [ 49 | "arn:aws:ecs:us-west-2:111122223333:capacity-provider/FARGATE_SPOT" 50 | ], 51 | "reason": "RESOURCE:FARGATE", 52 | "createdAt": "2019-11-06T19:09:33.087Z" 53 | } 54 | } 55 | ``` 56 | 57 | ## Event Pattern 58 | 59 | ```json 60 | { 61 | "source": [ 62 | "aws.ecs" 63 | ], 64 | "detail-type": [ 65 | "ECS Service Action" 66 | ], 67 | "detail": { 68 | "eventName": ["SERVICE_TASK_PLACEMENT_FAILURE"] 69 | } 70 | } 71 | ``` 72 | -------------------------------------------------------------------------------- /fargate-spot-capacity-fail-handler/template.yaml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: 2010-09-09 2 | Description: This template creates a mechanisim to handle Fargate Spot task placement failure 3 | Resources: 4 | LambdaRole: 5 | Type: AWS::IAM::Role 6 | Properties: 7 | AssumeRolePolicyDocument: 8 | Statement: 9 | - Action: 10 | - sts:AssumeRole 11 | Effect: Allow 12 | Principal: 13 | Service: 14 | - lambda.amazonaws.com 15 | Version: 2012-10-17 16 | ManagedPolicyArns: 17 | - arn:aws:iam::aws:policy/AWSLambdaExecute 18 | Path: / 19 | LambdaRolePolicy: 20 | Type: "AWS::IAM::Policy" 21 | Properties: 22 | PolicyName: "Fargate-Spot-Failure-Handler-Lambda-Role-Policy" 23 | PolicyDocument: 24 | Version: "2012-10-17" 25 | Statement: 26 | - 27 | Effect: "Allow" 28 | Action: [ 29 | "ecs:ListAttributes", 30 | "ecs:DescribeTaskSets", 31 | "ecs:DescribeTaskDefinition", 32 | "ecs:DescribeClusters", 33 | "ecs:ListServices", 34 | "ecs:ListAccountSettings", 35 | "ecs:UpdateService", 36 | "ecs:ListTagsForResource", 37 | "ecs:ListTasks", 38 | "ecs:ListTaskDefinitionFamilies", 39 | "ecs:DescribeServices", 40 | "ecs:ListContainerInstances", 41 | "ecs:DescribeContainerInstances", 42 | "ecs:DescribeTasks", 43 | "ecs:ListTaskDefinitions", 44 | "ecs:ListClusters" 45 | ] 46 | Resource: "*" 47 | Roles: 48 | - 49 | Ref: "LambdaRole" 50 | LambdaFunction: 51 | Type: AWS::Lambda::Function 52 | Properties: 53 | FunctionName: "Fargate-Spot-failure-handler-function" 54 | Handler: index.lambda_handler 55 | Runtime: python3.7 56 | Description: 'Fargate Spot failure handler function' 57 | MemorySize: 128 58 | Timeout: 3 59 | Role: 60 | Fn::GetAtt: 61 | - LambdaRole 62 | - Arn 63 | Code: 64 | ZipFile: | 65 | import boto3 66 | import json 67 | import logging 68 | import os 69 | import time 70 | 71 | logger = logging.getLogger('logger') 72 | logger.setLevel(logging.INFO) 73 | 74 | def log_error_message(e): 75 | logger.error(e.response['Error']['Code']) 76 | logger.error(e.response['Error']['Message']) 77 | 78 | def lambda_handler(event, context): 79 | logger.info('Event received %s ' % event) 80 | client = boto3.client('ecs') 81 | clusterName = event['resources'][0].split('/')[1] 82 | serviceName = event['resources'][0].split('/')[2] 83 | 84 | 85 | try: 86 | response = client.update_service( 87 | cluster = clusterName, 88 | service = serviceName, 89 | capacityProviderStrategy=[ 90 | { 91 | 'capacityProvider': 'FARGATE', 92 | 'weight': 1 93 | }, 94 | ], 95 | forceNewDeployment=True, 96 | ) 97 | logger.debug('update service response %s ' % response) 98 | except ClientError as e: 99 | log_error_message(e) 100 | return ("cluster: %s | service: %s | updated" %(clusterName,serviceName)) 101 | EventRule: 102 | Type: AWS::Events::Rule 103 | Properties: 104 | Description: Listen to Fargate Spot failure events and trigger lambda function 105 | EventPattern: 106 | source: 107 | - "aws.ecs" 108 | detail-type: 109 | - "ECS Service Action" 110 | detail: 111 | eventName: 112 | - "SERVICE_TASK_PLACEMENT_FAILURE" 113 | State: "ENABLED" 114 | Targets: 115 | - 116 | Arn: !GetAtt LambdaFunction.Arn 117 | Id: TargetLambda 118 | PermissionForEventsToInvokeLambda: 119 | Type: AWS::Lambda::Permission 120 | Properties: 121 | FunctionName: 122 | Ref: LambdaFunction 123 | Action: "lambda:InvokeFunction" 124 | Principal: "events.amazonaws.com" 125 | SourceArn: !GetAtt EventRule.Arn 126 | Outputs: 127 | roleID: 128 | Value: !Ref LambdaRole 129 | lambdaID: 130 | Value: !Ref LambdaFunction --------------------------------------------------------------------------------