└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # CloudCustodian Examples 2 | 3 | - [Cloud Custodian Examples](#cloudcustodian-examples) 4 | - [Filters](#filters) 5 | - [Account](#account) 6 | - [PHD](#phd) 7 | - [EC2](#ec2) 8 | - [ECR](#ecr) 9 | - [ASG](#asg) 10 | - [EBS](#ebs) 11 | - [ELB](#elb) 12 | - [ALB](#alb-elbv2) 13 | - [IAM](#iam) 14 | - [RDS](#rds) 15 | - [S3](#s3) 16 | - [Redshift](#redshift) 17 | - [Actions](#actions) 18 | - [Cleanup](#cleanup) 19 | - [Docker](#docker) 20 | - [Lambda As Cron](#Lambda) 21 | 22 | --- 23 | 24 | # Filters 25 | 26 | ## Account 27 | 28 | 1. Alarm on root user account usage 29 | 30 | ``` 31 | - name: root-user-login-detected 32 | resource: account 33 | description: A root account login has occurred 34 | mode: 35 | type: cloudtrail 36 | events: 37 | - ConsoleLogin 38 | role: arn:aws:iam::123456789:role/cloud_custodian_lambda_role 39 | filters: 40 | - type: event 41 | key: "detail.userIdentity.type" 42 | value_type: swap 43 | op: in 44 | value: Root 45 | ``` 46 | 47 | ## PHD (Personal Health Dashboard) 48 | 49 | ``` 50 | policies: 51 | - name: phd-alerts 52 | resource: account 53 | comment: Olay PHD alerts 54 | mode: 55 | type: phd 56 | role: arn:aws:iam::123456789:role/cloud_custodian_lambda_role 57 | description: Any PHD alert 58 | ``` 59 | 60 | ## EC2 61 | 62 | 1. Find instances whose name start with `packer` running for more than 3 hours: 63 | 64 | ``` 65 | - name: ec2-packer 66 | resource: ec2 67 | comment: Report to find long running packer instances 68 | filters: 69 | - type: value 70 | key: "tag:Name" 71 | op: regex 72 | value: "^packer*" 73 | - type: instance-age 74 | hours: 3 75 | ``` 76 | 77 | 1. React via CloudTrail to EIP creation: 78 | 79 | ``` 80 | - name: ec2-elastic-ip-notify 81 | resource: aws.network-addr 82 | comment: Notify on elastic IP creation 83 | mode: 84 | type: cloudtrail 85 | role: arn:aws:iam::123456789:role/cloud_custodian_lambda_role 86 | events: 87 | - source: ec2.amazonaws.com 88 | event: AllocateAddress 89 | ids: responseElements.publicIp 90 | ``` 91 | 92 | 1. Ensure staging was torn down. Find instances whose name contain `staging` running for more than 14 hours: 93 | 94 | ``` 95 | - name: ec2-staging 96 | resource: ec2 97 | comment: Report to find staging instances not terminated 98 | filters: 99 | - type: value 100 | key: "tag:Name" 101 | op: glob 102 | value: "*staging*" 103 | - type: instance-age 104 | hours: 14 105 | ``` 106 | 107 | 1. Ensure tag compliance. Look for non-terminated (tags lost on terminate) instances missing any tags: `Name`, `Managed_by`, or `Environment` 108 | 109 | ``` 110 | - name: ec2-tag-compliance 111 | resource: ec2 112 | comment: Report on total count of non compliant instances 113 | filters: 114 | - or: 115 | - "tag:Name": absent 116 | - "tag:Managed_by": absent 117 | - "tag:Environment": absent 118 | - not: 119 | - "State.Name": terminated 120 | ``` 121 | 122 | ## ECR 123 | 124 | 1. Enable onPush vulnerability scanning 125 | 126 | ``` 127 | - name: ecr-set-scanning 128 | resource: aws.ecr 129 | filters: 130 | - type: value 131 | key: imageScanningConfiguration.scanOnPush 132 | value: false 133 | actions: 134 | - set-scanning 135 | ``` 136 | 2. Enable lifecycles 137 | 138 | ``` 139 | - name: ecr-add-lifecycle 140 | resource: aws.ecr 141 | filters: 142 | - type: lifecycle-rule 143 | state: false 144 | actions: 145 | - type: set-lifecycle 146 | rules: 147 | - rulePriority: 1 148 | description: "remove untagged > 1 days" 149 | selection: 150 | countNumber: 1 151 | countType: sinceImagePushed 152 | countUnit: days 153 | tagStatus: untagged 154 | action: 155 | type: expire 156 | - rulePriority: 2 157 | description: "remove any images after we reach 4000 in repo" 158 | selection: 159 | countNumber: 4000 160 | countType: imageCountMoreThan 161 | tagStatus: any 162 | action: 163 | type: expire 164 | ``` 165 | 166 | ## ASG 167 | 168 | 1. Look for any auto scaling groups using AMIs older than 120 days 169 | 170 | ``` 171 | - name: asg-older-image 172 | resource: asg 173 | comment: Find all LCs using AMI older than 120d in use. 174 | filters: 175 | - type: image-age 176 | days: 120 177 | ``` 178 | 179 | ## EBS 180 | 181 | 1. Look for any running ec2 instances with unencrypted EBS volumes 182 | 183 | ``` 184 | - name: instance-without-encrypted-ebs 185 | description: Instances without encrypted EBS volumes 186 | resource: ebs 187 | filters: 188 | - Encrypted: false 189 | - "State.Name": running 190 | ``` 191 | 192 | ## ELB 193 | 194 | 1. React (CloudTrail subscription) to a public ELB creation 195 | 196 | ``` 197 | - name: elb-notify-new-internet-facing 198 | resource: elb 199 | comment: Detect and alarm on internet facing ELBs 200 | mode: 201 | type: cloudtrail 202 | role: arn:aws:iam::123456789:role/cloud_custodian_lambda_role 203 | events: 204 | - CreateLoadBalancer 205 | description: | 206 | Any newly created Classic Load Balancers launched with 207 | an internet-facing schema. 208 | filters: 209 | - Scheme: internet-facing 210 | ``` 211 | 212 | ## ALB (elbv2) 213 | 214 | 1. React (CloudTrail subscription) to a public ELB creation 215 | 216 | ``` 217 | - name: app-elb-notify-new-internet-facing 218 | resource: app-elb 219 | comment: Detect and alarm on internet facing ELBs 220 | mode: 221 | type: cloudtrail 222 | role: arn:aws:iam::123456789:role/cloud_custodian_lambda_role 223 | events: 224 | - source: elasticloadbalancing.amazonaws.com 225 | event: CreateLoadBalancer 226 | ids: responseElements.loadBalancers[].loadBalancerName 227 | description: | 228 | Any newly created App Load Balancers launched with 229 | an internet-facing schema. 230 | filters: 231 | - Scheme: internet-facing 232 | ``` 233 | 234 | ## IAM 235 | 236 | 1. React (CloudTrail subscription) to a user creation 237 | 238 | ``` 239 | - name: iam-user-creation 240 | resource: iam-user 241 | comment: detect and alarm on IAM user creation 242 | mode: 243 | type: cloudtrail 244 | role: arn:aws:iam::123456789:role/cloud_custodian_lambda_role 245 | events: 246 | - source: iam.amazonaws.com 247 | event: CreateUser 248 | ids: requestParameters.userName 249 | description: | 250 | Any newly created user. 251 | ``` 252 | 253 | 1. React (CloudTrail subscription) to a user deletion 254 | 255 | ``` 256 | - name: iam-user-deletion 257 | resource: iam-user 258 | comment: Detect and alarm on IAM user deletion 259 | mode: 260 | type: cloudtrail 261 | role: arn:aws:iam::123456789:role/cloud_custodian_role 262 | events: 263 | - source: iam.amazonaws.com 264 | event: DeleteUser 265 | ids: userIdentity.userName 266 | description: | 267 | Any deleted user. 268 | ``` 269 | 270 | 1. React (CloudTrail subscription) to a failed login attempt from any user 271 | 272 | ``` 273 | - name: iam-failed-login 274 | resource: iam-user 275 | comment: detect and alarm on IAM failed login 276 | mode: 277 | type: cloudtrail 278 | role: arn:aws:iam::123456789:role/cloud_custodian_role 279 | events: 280 | - source: signin.amazonaws.com 281 | event: ConsoleLogin 282 | ids: userIdentity.userName 283 | description: | 284 | Any console login failure for an IAM user 285 | filters: 286 | - type: event 287 | key: "detail.responseElements.ConsoleLogin" 288 | value: Failure 289 | ``` 290 | 291 | ## RDS 292 | 293 | 1. React (CloudTrail subscription) to a DB Instance being deleted while skipping the final snapshot 294 | 295 | ``` 296 | - name: rds-instance-notify-delete-skip-final-snapshot 297 | resource: rds 298 | comment: Detect and alarm on RDS DB instances deleted while skipping final snapshot 299 | mode: 300 | type: cloudtrail 301 | role: arn:aws:iam::123456789:role/cloud_custodian_role 302 | events: 303 | - source: rds.amazonaws.com 304 | event: DeleteDBInstance 305 | ids: requestParameters.dBInstanceIdentifier 306 | description: Detect and alarm on RDS instances deleted while skipping final snapshot 307 | filters: 308 | - type: event 309 | key: "detail.requestParameters.skipFinalSnapshot" 310 | value: true 311 | ``` 312 | 313 | 1. React (CloudTrail subscription) to a DB Cluster being deleted while skipping the final snapshot 314 | 315 | ``` 316 | - name: rds-cluster-notify-delete-skip-final-snapshot 317 | resource: rds 318 | comment: Detect and alarm on RDS clusters deleted while skipping final snapshot 319 | mode: 320 | type: cloudtrail 321 | role: arn:aws:iam::123456789:role/cloud_custodian_role 322 | events: 323 | - source: rds.amazonaws.com 324 | event: DeleteDBCluster 325 | ids: requestParameters.dBClusterIdentifier 326 | description: Detect and alarm on RDS clusters deleted while skipping final snapshot 327 | filters: 328 | - type: event 329 | key: "detail.requestParameters.skipFinalSnapshot" 330 | value: true 331 | ``` 332 | 333 | ## S3 334 | 335 | 1. React (CloudTrail subscription) to an s3 bucket being created and check it for encryption 336 | 337 | ``` 338 | - name: s3-bucket-create-without-encryption 339 | resource: s3 340 | comment: s3 bucket created without default encryption 341 | mode: 342 | type: cloudtrail 343 | role: arn:aws:iam::1234567890:role/cloud_custodian_role 344 | events: 345 | - event: CreateBucket 346 | source: s3.amazonaws.com 347 | ids: requestParameters.bucketName 348 | description: s3 bucket created without default encryption 349 | filters: 350 | - type: bucket-encryption 351 | state: False 352 | ``` 353 | 354 | 355 | 356 | ## Redshift 357 | 358 | 1. React (CloudTrail subscription) to a Redshift cluster being deleted while skipping the final snapshot 359 | 360 | ``` 361 | - name: redshift-notify-delete-skip-final-snapshot 362 | resource: redshift 363 | comment: Detect and alarm on Redshift clusters deleted while skipping final snapshot 364 | mode: 365 | type: cloudtrail 366 | role: arn:aws:iam::123456789:role/cloud_custodian_role 367 | events: 368 | - source: redshift.amazonaws.com 369 | event: DeleteCluster 370 | ids: requestParameters.clusterIdentifier 371 | description: Detect and alarm on Redshift clusters deleted while skipping final snapshot 372 | filters: 373 | - type: event 374 | key: "detail.requestParameters.skipFinalClusterSnapshot" 375 | value: true 376 | ``` 377 | 378 | 379 | # Actions 380 | 381 | 1. Send to SNS (helpful for SNS to slack) 382 | 383 | ``` 384 | actions: 385 | - type: notify 386 | to: 387 | - foo@wut.com #required even though not needed in SNS 388 | message: "Some sort of message" 389 | transport: 390 | type: sns 391 | topic: arn:aws:sns:us-east-1:111111111:topic-ops 392 | ``` 393 | 394 | # Cleanup 395 | 396 | 1. Cleanup everything 397 | 398 | ``` 399 | policies: 400 | - name: lambda-delete-cc-functions 401 | resource: lambda 402 | filters: 403 | - "tag:custodian-info": present 404 | actions: 405 | - delete 406 | - name: cloudwatch-delete-cc-log-group 407 | resource: log-group 408 | filters: 409 | - type: value 410 | key: logGroupName 411 | value: '/aws/lambda/custodian-*' 412 | op: glob 413 | actions: 414 | - delete 415 | - name: cloudwatch-delete-cc-cw-rules 416 | resource: aws.event-rule 417 | filters: 418 | - type: value 419 | key: Name 420 | value: 'custodian-*' 421 | op: glob 422 | actions: 423 | - type: delete 424 | force: true 425 | - name: cloudwatch-delete-cc-rule-targets 426 | resource: aws.event-rule-target 427 | filters: 428 | - type: value 429 | key: Id 430 | value: 'custodian-*' 431 | op: glob 432 | actions: 433 | - type: delete 434 | force: true 435 | ``` 436 | 437 | 438 | 439 | # Docker 440 | 441 | This is what my dockerfile looks like. Rules are in `cwd/rules` as individual `.yml` files. 442 | 443 | ``` 444 | FROM capitalone/c7n 445 | 446 | RUN apk add -U ca-certificates curl 447 | RUN curl -L https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64 >/usr/local/bin/dumb-init 448 | RUN chmod +x /usr/local/bin/dumb-init 449 | 450 | RUN mkdir /opt 451 | COPY run.sh /opt 452 | RUN chmod +x /opt/run.sh 453 | 454 | COPY rules /tmp 455 | 456 | RUN echo 'policies:' >/tmp/custodian.yml 457 | RUN for yml in $(find /tmp -name '*.yml'); do cat $yml; done | grep -v policies: >>/tmp/custodian.yml 458 | 459 | CMD ["/usr/local/bin/custodian", "run", "--output-dir=/tmp/output", "/tmp/custodian.yml" 460 | ``` 461 | 462 | # Lambda 463 | 464 | I run cloud-custodian as a timer. I use lambda with a cloudwatch event trigger (2 hours) to run the ECS task definition tied to my Docker image of cloud-custodian 465 | 466 | I kick off lambda with env vars pointing to the latest task definition ARN dn the CLUSTER name. The lambda looks like: 467 | 468 | ``` 469 | #!/bin/env python 470 | import boto3 471 | import os 472 | client = boto3.client('ecs') 473 | 474 | CLUSTER = os.environ.get('CLUSTER') 475 | TASK_DEFINITION_ARN = os.environ.get('TASK_DEFINITION_ARN') 476 | 477 | def lambda_handler(event, context): 478 | print('Debug: Cluster=\'{}\''.format(CLUSTER)) 479 | print('Debug: Task def arn=\'{}\''.format(TASK_DEFINITION_ARN)) 480 | client.run_task( 481 | cluster=CLUSTER, 482 | taskDefinition=TASK_DEFINITION_ARN, 483 | count=1 484 | ) 485 | ``` 486 | 487 | My terraform for the lambda provides those env vars via remote state value and a variable: 488 | 489 | ``` 490 | resource "aws_lambda_function" "cloud-custodian" { 491 | filename = "lambda_cloud_custodian.zip" 492 | function_name = "cloud_custodian" 493 | role = "${data.terraform_remote_state.cloud_custodian_lambda_iam_role.role_arn}" 494 | handler = "lambda_function.lambda_handler" 495 | source_code_hash = "${base64sha256(file("lambda_cloud_custodian.zip"))}" 496 | runtime = "python3.6" 497 | timeout = "30" 498 | 499 | environment { 500 | variables = { 501 | CLUSTER = "${var.cluster}" 502 | TASK_DEFINITION_ARN = "${aws_ecs_task_definition.cloud_custodian.arn}" 503 | } 504 | } 505 | } 506 | 507 | resource "aws_cloudwatch_event_rule" "cloud_custodian_schedule" { 508 | name = "cloud_custodian_schedule" 509 | description = "Run every two hours" 510 | schedule_expression = "rate(2 hours)" 511 | } 512 | 513 | resource "aws_cloudwatch_event_target" "cloud_custodian_lambda" { 514 | rule = "${aws_cloudwatch_event_rule.cloud_custodian_schedule.name}" 515 | arn = "${aws_lambda_function.cloud-custodian.arn}" 516 | } 517 | 518 | resource "aws_lambda_permission" "allow_cloudwatch_to_call_cloud_custodian" { 519 | statement_id = "AllowExecutionFromCloudWatchToCloudCustodianLambda" 520 | action = "lambda:InvokeFunction" 521 | function_name = "${aws_lambda_function.cloud-custodian.function_name}" 522 | principal = "events.amazonaws.com" 523 | source_arn = "${aws_cloudwatch_event_rule.cloud_custodian_schedule.arn}" 524 | } 525 | 526 | ``` 527 | 528 | To streamline this, my Makefile looks like below. 529 | Note: my terraform for this has an output for the ECR URL 530 | 531 | ``` 532 | setup: 533 | terraform init 534 | aws ecr get-login --region us-east-1 --no-include-email | sh - 535 | 536 | zip: 537 | zip -r lambda_cloud_custodian.zip lambda_function.py 538 | 539 | build: setup 540 | terraform apply -target=aws_ecr_repository.cloud_custodian 541 | docker build -t local/cloud-custodian:latest . 542 | 543 | deploy: build zip 544 | $(eval REPO_URL := $(shell terraform output cloud_custodian_repository_url | tr -d '\n')) 545 | docker tag local/cloud-custodian:latest $(REPO_URL):latest 546 | docker push $(REPO_URL):latest 547 | terraform apply 548 | ``` 549 | --------------------------------------------------------------------------------