├── .flake8
├── .gitignore
├── .yamllint
├── LICENSE
├── README.md
├── bootstrap-create-messages.py
├── bootstrap-load-messages.py
├── clair-scanner.json
├── clair.json
├── cleanup.py
├── config.yaml
├── ecr-cve-monitor.md
├── ecr-cve-monitor.png
├── handler.py
├── list_repos.py
├── main.tf
├── putimage.zip
├── quarantine.py
├── report.py
├── reque.py
├── requirements.txt
├── variables.tf
└── versions.tf


/.flake8:
--------------------------------------------------------------------------------
1 | [flake8]
2 | ignore = E501
3 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | *.tfstate
2 | .terraform
3 | *.tfvars
4 | venv
5 | .vscode
6 | 


--------------------------------------------------------------------------------
/.yamllint:
--------------------------------------------------------------------------------
1 |  line-length:
2 |     max: 200


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | This project is based on coreos/Clair. Copyright for which are held by the coreos/clair project (https://github.com/coreos/clair/LICENSE).
 2 | This project uses a custom fork of [Klar](https://github.com/optiopay/klar).  Copyright for which are held by Optiopay GmbH, 2016 (https://github.com/optiopay/klar).
 3 | All other copyright for project ecr-cve-monitor are held by Shane Riddell, 2019.
 4 | 
 5 | Copyright 2019 Shane Riddell
 6 | 
 7 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
 8 | 
 9 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
10 | 
11 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # ecr-cve-monitor
  2 | 
  3 | This project is a working Proof-of-Concept of using the [coreos/Clair project](https://github.com/coreos/clair) to scan all images pushed to an AWS ECR registry, and to automatically rescan them if Clair detects a new CVE that affects a known image.
  4 | 
  5 | See ecr-cve-monitor.md for more details on the purpose and architecture of the project.
  6 | 
  7 | ## Installation
  8 | 
  9 | Make sure you have terraform 0.11.13 or greater in the 0.11.x release series available.
 10 | 
 11 | Create a terraform.tfvars file and define the following values:
 12 | 
 13 | ```
 14 | environment="environment name, like cicd or dev"
 15 | costcenter="costcenter identifier"
 16 | poc="point of contact email"
 17 | service="ecr-cve-monitor"
 18 | 
 19 | ecs_ami_id="latest ECS cluster AMI id for your deployment region"
 20 | key_name="ssh key for ec2 instances"
 21 | instance_type="instance type for the ecs cluster, I use m5.xlarge for the default installation settings for memory and cpu usage"
 22 | 
 23 | number_of_clair_instances=1
 24 | number_of_scanners=1
 25 | number_of_ecs_instances=2
 26 | 
 27 | prefix="a prefix to use for all resources created"
 28 | ```
 29 | 
 30 | Run terraform init, then terraform plan to look at the plan that will be generated, then apply the changes.
 31 | A new VPC with an ECS cluster running the ecr-cve-monitor software, along with the message queue, dead-letter queue, dynamodb tables, and a CloudWatch event to trigger a lambda to queue up an image scan anytime a new image is pushed to the ecr registry in this account will be created.
 32 | 
 33 | Note that if you want to install to an existing VPC and/or an existing ECS cluster, you can modify the main.tf file to do so.
 34 | 
 35 | Also, while the underlying image layer tracking is capable of supporting multiple registries (in different regions/accounts), it has not yet been tested, and you would need to modify the terraform to allow CloudWatch events from the other registry to be pushed onto the ecr-cve-monitor input queue.
 36 | 
 37 | ### Bootstrapping
 38 | 
 39 | Note that you should let the clair service deployed by terraform run for at least 60 minutes so that it can do the initial CVE database load.  While Clair is loading the initial CVEs, it will generate empty reports, but disables generating notifications for CVEs as they come in - so basically, if you bootstrap too soon, you will have to repeat it to get accurate first-time reports.
 40 | 
 41 | Run the bootstrap-create-messages.py and bootstrap-load-message.py (comments in the files contain instructions on how to run them.)  Basically, these scan a registry for all existing images, and queue a scan request for each image so that they become known to and monitored by clair, and generate an initial report to s3.
 42 | 
 43 | Note that if you have a lot of images to go through initially, you may want to temporarily adjust the terraform.tfvars for number_of_clair_instances, number_of_scanners, and number_of_ecs_instances to get through the backlog more quickly.  During testing, we found that a clair_instance could typically handle about 8 clair scanners at once.
 44 | 
 45 | ## Reporting
 46 | 
 47 | ### Setting up reporting
 48 | 
 49 | In AWS Glue, create a database for reporting on the scan results.  Then in AWS Athena, create an external table like so:
 50 | 
 51 | ```
 52 | CREATE external TABLE reports (
 53 |          LayerCount int,
 54 | AnalyzedImageName string,
 55 |   ImageDigest string,
 56 |   ECRMetadata struct<
 57 |   imageId:struct<imageDigest:string>,
 58 |   manifest:struct<config:struct<digest:string>>,
 59 |   repositoryName:string,
 60 |   registryId:string
 61 | >,
 62 | 
 63 |          Vulnerabilities struct< High:array<struct<Name:string,
 64 |          NamespaceName:string,
 65 |          Description:string,
 66 |          Link:string,
 67 |          Severity:string>>,
 68 |          Medium:array<struct<Name:string,
 69 |          NamespaceName:string,
 70 |          Description:string,
 71 |          Link:string,
 72 |          Severity:string>>,
 73 |          Medium:array<struct<Name:string,
 74 |          NamespaceName:string,
 75 |          Description:string,
 76 |          Link:string,
 77 |          Severity:string>>,
 78 |          Medium:array<struct<Name:string,
 79 |          NamespaceName:string,
 80 |          Description:string,
 81 |          Link:string,
 82 |          Severity:string>>,
 83 |          Low:array<struct<Name:string,
 84 |          NamespaceName:string,
 85 |          Description:string,
 86 |          Link:string,
 87 |          Severity:string>>,
 88 |          Medium:array<struct<Name:string,
 89 |          NamespaceName:string,
 90 |          Description:string,
 91 |          Link:string,
 92 |          Severity:string>>,
 93 |          Negligible:array<struct<Name:string,
 94 |          NamespaceName:string,
 95 |          Description:string,
 96 |          Link:string,
 97 |          Severity:string>>,
 98 |          Medium:array<struct<Name:string,
 99 |          NamespaceName:string,
100 |          Description:string,
101 |          Link:string,
102 |          Severity:string>>
103 |           >
104 | )
105 | PARTITIONED BY(year string, month string, day string)
106 | ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://my-report-bucket/'
107 | ```
108 | 
109 | Be sure to replace the LOCATION s3://my-report-bucket with the terraform output 'report_bucket' that specifies your own report bucket name
110 | 
111 | ### Reporting with Athena
112 | 
113 | To report with athena, you can load all partions with
114 | 
115 | ```
116 | MSCK REPAIR TABLE reports
117 | ```
118 | 
119 | However, typically, you do not need to creates reports across the entire time series, and will only be interested in seeing new reports (either new CVEs that affected existing image, or that exist in newly pushed images) for a given time range.  To do that more cheaply and efficiently, load just the partitions that correspond to the time range you want to query.
120 | 
121 | For example, to load and report on January 15 of 2019:
122 | 
123 | ```
124 | ALTER TABLE reports ADD PARTITION (year='2019',month='01',day='15') location 's3://my-scan-results/year=2019/month=01/day=15/'
125 | ```
126 | 
127 | You can then query for any images that were detected to have at least 1 High CVE on the 15th.
128 | 
129 | ```
130 | select distinct ECRMetadata.registryId, ECRMetadata.repositoryName, ECRMetadata.imageId.imageDigest from reports where cardinality(vulnerabilities.High) > 0 and year='2019' and month='01' and day='15' order by ECRMetadata.registryId, ECRMetadata.repositoryName, ECRMetadata.imageId.imageDigest;
131 | ```
132 | 
133 | Note that this will give you back results in terms of the internal registry ID of the image.  You can use the AWS SDKs to convert this to a (current) list of human friendly tags.  The Athena reporting itself, and the internal report structures cannot use human-friendly image tags because they are not immutable.
134 | 
135 | ## High level diagram
136 | 
137 | ![Architecture](ecr-cve-monitor.png)
138 | 
139 | ## Disaster recovery
140 | 
141 | Any loss of information can be recovered by repopulating the reports from scratch (except historical time-series data).
142 | 
143 | ## Why Clair
144 | 
145 | * From CoreOS team
146 | * Opensource
147 | * Used to power vulnerability scanning in Quay.io
148 | * Can generate reports without re-consuming layers
149 | * Can raise new vulnerabilities against existing layers without actually rescanning the image
150 | 


--------------------------------------------------------------------------------
/bootstrap-create-messages.py:
--------------------------------------------------------------------------------
 1 | import boto3
 2 | import json
 3 | 
 4 | # Should work iwth any version of python 3 with boto3 available.
 5 | # Make sure you have exported AWS credentials for the account that contains the ECR registry
 6 | # into your shell before running this script.
 7 | # Scans an ECR registry, outputting scan request messages for all images found in all repositories.
 8 | # To use, set REGISTRY_ID to the registry ID you wish to bootstrap.
 9 | # Redirect the output to output.json
10 | # python bootstrap-create-messages > output.json.
11 | 
12 | # Note that this currently assumes the registry is in us-east-1
13 | 
14 | client = boto3.client('ecr')
15 | 
16 | REGISTRY_ID = ''
17 | resp = client.describe_repositories(
18 |     registryId=REGISTRY_ID,
19 |     maxResults=100
20 | )
21 | repos = []
22 | for r in resp['repositories']:
23 |     repos.append(r)
24 | next_token = None
25 | if 'nextToken' in resp:
26 |     next_token = resp["nextToken"]
27 | while next_token is not None:
28 |     resp = client.describe_repositories(
29 |         registryId='434313288222',
30 |         maxResults=100,
31 |         nextToken=next_token
32 |     )
33 |     for r in resp['repositories']:
34 |         repos.append(r)
35 |     next_token = None
36 |     if 'nextToken' in resp:
37 |         next_token = resp['nextToken']
38 | 
39 | messages = []
40 | base = 'aws sqs send-message --queue-url %s --message-body \'{"ScanImage":{"awsRegion": "us-east-1", "repositoryName": "%s", "registryId": "%s", "imageId": {"imageDigest": "%s"}}}\''
41 | for r in repos:
42 |     resp = client.list_images(
43 |         registryId=REGISTRY_ID,
44 |         repositoryName=r['repositoryName'],
45 |         maxResults=100
46 |     )
47 |     for i in resp['imageIds']:
48 |         msg = {
49 |             'ScanImage': {
50 |                 'awsRegion': 'us-east-1',
51 |                 'repositoryName': r['repositoryName'],
52 |                 'registryId': REGISTRY_ID,
53 |                 'imageId': {'imageDigest': i['imageDigest']}
54 |             }
55 |         }
56 |         messages.append(msg)
57 |     next_token = None
58 |     if 'nextToken' in resp:
59 |         next_token = resp['nextToken']
60 |     while next_token is not None:
61 |         resp = client.list_images(
62 |             registryId=REGISTRY_ID,
63 |             repositoryName=r['repositoryName'],
64 |             maxResults=100,
65 |             nextToken=next_token
66 |         )
67 |         for i in resp['imageIds']:
68 |             msg = {
69 |                 'ScanImage': {
70 |                     'awsRegion': 'us-east-1',
71 |                     'repositoryName': r['repositoryName'],
72 |                     'registryId': REGISTRY_ID,
73 |                     'imageId': {'imageDigest': i['imageDigest']}
74 |                 }
75 |             }
76 |             messages.append(msg)
77 |         next_token = None
78 |         if 'nextToken' in resp:
79 |             next_token = resp['nextToken']
80 | 
81 | print(json.dumps(messages))
82 | 


--------------------------------------------------------------------------------
/bootstrap-load-messages.py:
--------------------------------------------------------------------------------
 1 | import boto3
 2 | import json
 3 | 
 4 | # Should work iwth any version of python 3 with boto3 available.
 5 | # Make sure you have exported AWS credentials for the account that message queue output from
 6 | # the terraform variable output variable
 7 | # input_queue.  set QUEUE_URL to the URL of the
 8 | # SQS queue created for scan requests by the terraform script, which will be the output variable
 9 | # input_queue.
10 | 
11 | 
12 | QUEUE_URL = ''
13 | client = boto3.client('sqs')
14 | 
15 | messages = []
16 | with open('output.json') as f:
17 |     messages = json.load(f)
18 | 
19 | print('loaded messages')
20 | count = 0
21 | for msg in messages:
22 |     client.send_message(
23 |         QueueUrl=QUEUE_URL,
24 |         MessageBody=json.dumps(msg)
25 |     )
26 |     count = count + 1
27 |     if (count % 100) == 0:
28 |         print(count)
29 | 


--------------------------------------------------------------------------------
/clair-scanner.json:
--------------------------------------------------------------------------------
 1 | [
 2 |   {
 3 |     "name": "clair-scanner",
 4 |     "image": "sriddell/clair-scanner:1.3.0",
 5 |     "cpu": 512,
 6 |     "memory": 1024,
 7 |     "essential": true,
 8 |     "logConfiguration": {
 9 |         "logDriver": "awslogs",
10 |         "options": {
11 |             "awslogs-group": "${log_group}",
12 |             "awslogs-region": "${region}",
13 |             "awslogs-stream-prefix": "clair-scanner"
14 |         }
15 |     },
16 |     "environment": [
17 |         {
18 |             "name": "SQS_QUEUE_URL",
19 |             "value": "${sqs_url}"
20 |         },
21 |         {
22 |             "name": "REGION",
23 |             "value": "${region}"
24 |         },
25 |         {
26 |             "name": "CLAIR_ADDR",
27 |             "value": "${clair_endpoint}"
28 |         },
29 |         {
30 |             "name": "BUCKET",
31 |             "value": "${output_bucket}"
32 |         },
33 |         {
34 |             "name": "LOG_LEVEL",
35 |             "value": "INFO"
36 |         }
37 |     ]
38 |   }
39 | ]
40 | 


--------------------------------------------------------------------------------
/clair.json:
--------------------------------------------------------------------------------
 1 | [
 2 |   {
 3 |     "name": "clair",
 4 |     "image": "sriddell/clair-with-ssm:1.2.0",
 5 |     "cpu": 3900,
 6 |     "memory": 14000,
 7 |     "ulimits": [
 8 |         {
 9 |           "softLimit": 16384,
10 |           "hardLimit": 16384,
11 |           "name": "nofile"
12 |         }
13 |     ],
14 |     "essential": true,
15 |     "links": ["notification-endpoint"],
16 |     "logConfiguration": {
17 |         "logDriver": "awslogs",
18 |         "options": {
19 |             "awslogs-group": "${log_group}",
20 |             "awslogs-region": "${region}",
21 |             "awslogs-stream-prefix": "clair"
22 |         }
23 |     },
24 |     "environment": [
25 |         {
26 |             "name": "CONFIG_PARAMETER_REGION",
27 |             "value": "${region}"
28 |         },
29 |         {
30 |             "name": "CONFIG_PARAMETER_NAME",
31 |             "value": "${config_parameter_name}"
32 |         },
33 |         {
34 |           "name": "LOG_LEVEL",
35 |           "value": "WARN"
36 |         }
37 |     ],
38 |     "portMappings": [
39 |       {
40 |         "containerPort": 6060
41 |       }
42 |     ]
43 |   },
44 |   {
45 |     "name": "notification-endpoint",
46 |     "image": "sriddell/clair-notification-endpoint:0.2.0",
47 |     "cpu": 128,
48 |     "memory": 128,
49 |     "essential": true,
50 |     "logConfiguration": {
51 |         "logDriver": "awslogs",
52 |         "options": {
53 |             "awslogs-group": "${log_group}",
54 |             "awslogs-region": "${region}",
55 |             "awslogs-stream-prefix": "notification-endpoint"
56 |         }
57 |     },
58 |     "environment": [
59 |         {
60 |             "name": "SQS_QUEUE_URL",
61 |             "value": "${sqs_url}"
62 |         },
63 |         {
64 |             "name": "REGION",
65 |             "value": "${region}"
66 |         },
67 |         {
68 |             "name": "CLAIR_ENDPOINT",
69 |             "value": "http://${clair_endpoint}"
70 |         }
71 |     ]
72 |   }
73 | ]
74 | 


--------------------------------------------------------------------------------
/cleanup.py:
--------------------------------------------------------------------------------
  1 | import boto3
  2 | import json
  3 | import sys
  4 | 
  5 | # Prototype to remove s3 records and dynamodb records for images that have been removed from ECR.
  6 | # Right now, the list_repos.py has to be run under 10011 credentials to build the list of all repos,
  7 | # then this script runs under 10021 credentials to remove s3 reports and dynamodb entries for any repos that
  8 | # no longer exist, so we don't report on them, or trigger clair layer notifications for them.
  9 | # This should be wrapped into lambda functions to run periodically, or on notification of a delete
 10 | # from ECR
 11 | BUCKET = 'ecrscan-clair-scan-results'
 12 | raw = None
 13 | with open(sys.argv[1]) as f:
 14 |     raw = json.load(f)
 15 | 
 16 | images = {}
 17 | for image in raw:
 18 |     registryId = image['registryId']
 19 |     repository = image['repository']
 20 |     imageDigest = image['imageDigest'].split('sha256:')[1]
 21 |     if registryId not in images.keys():
 22 |         images[registryId] = {}
 23 |     if repository not in images[registryId].keys():
 24 |         images[registryId][repository] = set()
 25 |     if imageDigest not in images[registryId][repository]:
 26 |         images[registryId][repository].add(imageDigest)
 27 | 
 28 | s3 = boto3.client('s3')
 29 | reports = []
 30 | response = s3.list_objects_v2(
 31 |     Bucket=BUCKET
 32 | )
 33 | for k in response['Contents']:
 34 |     reports.append(k['Key'])
 35 | continuationToken = None
 36 | if response['IsTruncated']:
 37 |     continuationToken = response['NextContinuationToken']
 38 | while continuationToken is not None:
 39 |     response = s3.list_objects_v2(
 40 |         Bucket=BUCKET,
 41 |         ContinuationToken=continuationToken
 42 |     )
 43 |     for k in response['Contents']:
 44 |         reports.append(k['Key'])
 45 |     continuationToken = None
 46 |     if response['IsTruncated']:
 47 |         continuationToken = response['NextContinuationToken']
 48 | 
 49 | to_delete = []
 50 | for key in reports:
 51 |     # value='year=2019/month=08/day=09/registry_id=434313288222/prod/workflow-api/457531f2efe6475baef56af1248930f46bc8b7992bedfb072248fc8ec38250b6.json.gz'
 52 |     value = key
 53 |     value = value.split('/', 1)[1]
 54 |     value = value.split('/', 1)[1]
 55 |     value = value.split('/', 1)[1]
 56 |     values = value.split('/', 1)
 57 |     registry_id = values[0].split('registry_id=')[1]
 58 |     value = values[1]
 59 |     values = value.split('/')
 60 |     repository = '/'.join(values[:-1])
 61 |     image_digest = values[-1].split('.json.gz')[0]
 62 |     # print(registry_id)
 63 |     # print(repo_name)
 64 |     # print(image_digest)
 65 |     delete = True
 66 |     if not (registry_id in images and repository in images[registry_id] and image_digest in images[registry_id][repository]):
 67 |         to_delete.append(key)
 68 | 
 69 | print("Deleting s3 reports:")
 70 | for k in to_delete:
 71 |     print(k)
 72 |     s3.delete_object(
 73 |         Bucket=BUCKET,
 74 |         Key=k
 75 |     )
 76 | 
 77 | 
 78 | def should_delete_from_db(item, images):
 79 |     registryId = item['image_data']['M']['registryId']['S']
 80 |     repository = item['image_data']['M']['repositoryName']['S']
 81 |     imageDigest = item['image_data']['M']['imageId']['M']['imageDigest']['S']
 82 |     imageDigest = imageDigest.split('sha256:')[1]
 83 |     exists = registryId in images and repository in images[registryId] and imageDigest in images[registryId][repository]
 84 |     return not exists
 85 | 
 86 | 
 87 | to_delete = []
 88 | db = boto3.client('dynamodb')
 89 | response = db.scan(
 90 |     TableName='clair-indexed-layers',
 91 |     ConsistentRead=True
 92 | )
 93 | for item in response['Items']:
 94 |     if should_delete_from_db(item, images):
 95 |         to_delete.append({
 96 |             'layer_name': item['layer_name']['S'],
 97 |             'image_name': item['image_name']['S']
 98 |         })
 99 | last_evaluated_key = None
100 | if 'LastEvaluatedKey' in response:
101 |     last_evaluated_key = response['LastEvaluatedKey']
102 | while last_evaluated_key is not None:
103 |     response = db.scan(
104 |         TableName='clair-indexed-layers',
105 |         ConsistentRead=True,
106 |         ExclusiveStartKey=last_evaluated_key
107 |     )
108 |     if should_delete_from_db(item, images):
109 |         to_delete.append({
110 |             'layer_name': item['layer_name']['S'],
111 |             'image_name': item['image_name']['S']
112 |         })
113 |     last_evaluated_key = None
114 |     if 'LastEvaluatedKey' in response:
115 |         last_evaluated_key = response['LastEvaluatedKey']
116 | 
117 | print("delete dynamodb records:")
118 | for item in to_delete:
119 |     print(item)
120 |     db.delete_item(
121 |         TableName='clair-indexed-layers',
122 |         Key={
123 |             'layer_name': {
124 |                 'S': item['layer_name']
125 |             },
126 |             'image_name': {
127 |                 'S': item['image_name']
128 |             }
129 |         }
130 |     )
131 | 


--------------------------------------------------------------------------------
/config.yaml:
--------------------------------------------------------------------------------
 1 | ---
 2 | clair:
 3 |   database:
 4 |     # Database driver
 5 |     type: pgsql
 6 |     options:
 7 |       # PostgreSQL Connection string
 8 |       # https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-CONNSTRING
 9 |       source: host=${host} dbname=${dbname} user=${user} password=${password}
10 | 
11 |       # Number of elements kept in the cache
12 |       # Values unlikely to change (e.g. namespaces) are cached in order to save prevent needless roundtrips to the database.
13 |       cachesize: 16384
14 | 
15 |       # 32-bit URL-safe base64 key used to encrypt pagination tokens
16 |       # If one is not provided, it will be generated.
17 |       # Multiple clair instances in the same cluster need the same value.
18 |       paginationkey: "XxoPtCUzrUv4JV5dS+yQ+MdW7yLEJnRMwigVY/bpgtQ="
19 | 
20 |   api:
21 |     # v3 grpc/RESTful API server address
22 |     addr: "0.0.0.0:6060"
23 | 
24 |     # Health server address
25 |     # This is an unencrypted endpoint useful for load balancers to check to healthiness of the clair server.
26 |     healthaddr: "0.0.0.0:6061"
27 | 
28 |     # Deadline before an API request will respond with a 503
29 |     timeout: 900s
30 | 
31 |     # Optional PKI configuration
32 |     # If you want to easily generate client certificates and CAs, try the following projects:
33 |     # https://github.com/coreos/etcd-ca
34 |     # https://github.com/cloudflare/cfssl
35 |     servername:
36 |     cafile:
37 |     keyfile:
38 |     certfile:
39 | 
40 |   worker:
41 |     namespace_detectors:
42 |       - os-release
43 |       - lsb-release
44 |       - apt-sources
45 |       - alpine-release
46 |       - redhat-release
47 | 
48 |     feature_listers:
49 |       - apk
50 |       - dpkg
51 |       - rpm
52 | 
53 |   updater:
54 |     # Frequency the database will be updated with vulnerabilities from the default data sources
55 |     # The value 0 disables the updater entirely.
56 |     interval: 5m
57 |     enabledupdaters:
58 |       - debian
59 |       - ubuntu
60 |       - rhel
61 |       - oracle
62 |       - alpine
63 | 
64 |   notifier:
65 |     # Number of attempts before the notification is marked as failed to be sent
66 |     attempts: 30
67 | 
68 |     # Duration before a failed notification is retried
69 |     renotifyinterval: 2h
70 | 
71 |     http:
72 |       # Optional endpoint that will receive notifications via POST requests
73 |       endpoint: http://notification-endpoint:3000/notify
74 | 
75 |       # Optional PKI configuration
76 |       # If you want to easily generate client certificates and CAs, try the following projects:
77 |       # https://github.com/cloudflare/cfssl
78 |       # https://github.com/coreos/etcd-ca
79 |       servername:
80 |       cafile:
81 |       keyfile:
82 |       certfile:
83 | 
84 |       # Optional HTTP Proxy: must be a valid URL (including the scheme).
85 |       proxy:


--------------------------------------------------------------------------------
/ecr-cve-monitor.md:
--------------------------------------------------------------------------------
 1 | With cyber attacks on the rise against Higher Education institutions, it is critical to be able to detect vulnerabilities in the docker images that may run your applications.  This involves not just vulnerability scanning of the applications, but of the OS packages installed in a docker image as well.
 2 | 
 3 | The ecr-cve-monitor project (https://github.com/sriddell/ecr-cve-monitor) is an open-source proof-of-concept designed to fill the OS/package vulnerability scanning space for docker images stored in ECR.  It is based on Clair (https://github.com/coreos/clair) and Klar (https://github.com/sriddell/klar), and designed specifically for use with ECR.  Any images pushed to a repository in an ECR will be automatically scanned and have a report generated for them.  Any new CVEs that come in that affect an already scanned image will trigger the creation of an updated report.
 4 | 
 5 | Reports are stored as gzip compressed JSON files in time-series in S3, making it easy to query for images with CVEs via AWS Athena.
 6 | 
 7 | ecr-cve-monitor is message-based.  All operations are passed as messages on an SQS queue to provide automatic retries, with a final dead-letter queue.
 8 | 
 9 | Clair itself functions by 'indexing' all layers in a docker image for 'features', and then stores those features in postgres.  If a new CVE comes in to Clair that affects a layer Clair has already indexed, it issues a notification to the custom ecr-cve-monitor notification endpoint, which converts it to a rescan message on the input queue.
10 | 
11 | If a new image is pushed to ECR, CloudTrail generates a CloudWatch event, which triggers a small lambda function that puts a scan image message on the input queue for the new image.
12 | 
13 | Thus, new images are automatically added to those monitored, and existing images that are affected by new CVEs can be identified.
14 | 
15 | To bootstrap a new installation some simple python scripts are provided to generate and load 'ScanImage' messages to the pending scan queue for all existing images in a given ECR.  The installation can also be temporarily scaled up during the initial load to reduce the time it takes to index an existing ECR containing many images.
16 | 
17 | Although it has only been tested with a single registry so far, it was designed to handle multiple registries and regions, assuming you set up the necessary cross-account permissions to allow the account ecr-cve-monitor is deployed in to be able to pull all images in the other account's ECR.
18 | 
19 | Any time an image is scanned, either because it was just pushed to a repo, or because Clair detected that a layer in the image is affected by a new CVE, ecr-cve-monitor generates a new JSON report of all vulnerabilities in that image and stores the result in s3 under a year/month/day time-series scheme.  For a given day, only 1 report at most will exist for an image.
20 | 
21 | AWS Athena can then be used to generate reports such as 'show me any image with 1 or more high level CVEs'.  Or 'show me any images with new high CVE vulnerabilities in the last 2 days'.  The time-series storage also allows only a small amount of the data to be loaded into an Athena partition, so you can scan a small subset of the data for any new vulnerabilities in the last 24 hours, for example.
22 | 
23 | Note that Clair does not recognize or track images directly - it only scans and knows about layers.  Software that uses Clair (in ecr-cve-monitor, the clair scanner) is responsible for sending in each layer with a unique id.  The clair scanner is then responsible for tracking which layers are present in which docker images in the ECR.  ecr-cve-monitor accomplishes this by mapping the unique layer ID to the image as identified by its unique registry ID in a dynamo DB table.
24 | 
25 | Note that uniquely identifying images is a bit confusing.  Images have an internal sha256 identifier, but this is not the address of the image in ECR.  ECR appears to assign images a unique sha256 ID, separate from the image sha256.  This is the true unique ID within an ECR.  Docker tags are mutable, and thus cannot be tracked because they could change over time.
26 | 
27 | So the reports are generated in terms of the ECR identifier, repo id, and registry sha256 ID (which permanently identifies an image in an ECR repository).
28 | 
29 | A second layer of reporting would be necessary to translate Athena query results into images using the human-friendly tags assigned to that image, although it would only be guaranteed accurate at the time of the report, as the tags (particularly the 'latest' tag) can change.
30 | 
31 | Preventing an image with vulnerabilities being pushed into ECR requires a different technique that can scan an image before it is pushed and fail earlier in the CI/CD pipeline.  There is already an open source project to facilitate this, called clair-local-scan (https://github.com/arminc/clair-local-scan).  This project generates daily docker images of the Clair database already fully loaded with current CVE vulnerabilities that can be used to pre-scan images before pushing them to ECR.
32 | 
33 | ecr-cve-monitor just tells you about OS level CVE vulnerabilities in your containers.  To fully put it into effect, you need to decide for your organization how to deal with images identified as vulnerable.  This could involve quarantining them so they can't be further deployed, blocking deployment of them via your CI/CD pipeline, or further reporting on however you run your containers (ECS, EKS, etc) to identify vulnerable images that are actively running.  The optimal mix of techniques of course depends on your SLAs, sensitivity to downtime, and the severity of a new CVE that is detected in an already running image.
34 | 
35 | To run or experiment with the ecr-cve-monitor project, visit https://github.com/sriddell/ecr-cve-monitor and view the README.md for details on installations, operation, and underlying architecture.
36 | 
37 | Please also visit the projects that power and make ECR monitoring possible:
38 | 
39 | Clair: https://github.com/coreos/clair
40 | Klar: https://github.com/optiopay/klar or the fork modified specifically for ECR https://github.com/sriddell/klar
41 | 
42 | 
43 | 
44 | 


--------------------------------------------------------------------------------
/ecr-cve-monitor.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sriddell/ecr-cve-monitor/b820efea1efcc9f8a266c3fac7fb88097f05425f/ecr-cve-monitor.png


--------------------------------------------------------------------------------
/handler.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import os
 3 | import boto3
 4 | 
 5 | 
 6 | def put_image(event, context):
 7 |     endpoint = None
 8 |     if 'SQS_ENDPOINT' in os.environ:
 9 |         endpoint = os.environ['SQS_ENDPOINT']
10 |     sqs = boto3.resource('sqs', region_name=os.environ['REGION'], endpoint_url=endpoint)
11 |     queue = sqs.Queue(os.environ['SQS_QUEUE_URL'])
12 |     msg = {'CloudWatchEvent': event}
13 |     queue.send_message(MessageBody=json.dumps(msg))
14 | 
15 |     return {
16 |         'statusCode': 200,
17 |         'body': 'Queued message'
18 |     }
19 | 


--------------------------------------------------------------------------------
/list_repos.py:
--------------------------------------------------------------------------------
 1 | import boto3
 2 | import json
 3 | 
 4 | registries = ['434313288222']
 5 | existing_repos = []
 6 | ecr = boto3.client('ecr')
 7 | for r in registries:
 8 |     response = ecr.describe_repositories(
 9 |         registryId=str(r)
10 |     )
11 |     for repo in response['repositories']:
12 |         t = (r, repo['repositoryName'])
13 |         if t not in existing_repos:
14 |             existing_repos.append(t)
15 |     nextToken = None
16 |     if 'nextToken' in response.keys():
17 |         nextToken = response['nextToken']
18 |     while nextToken is not None:
19 |         response = ecr.describe_repositories(
20 |             registryId=r,
21 |             nextToken=nextToken
22 |         )
23 |         for repo in response['repositories']:
24 |             t = (r, repo['repositoryName'])
25 |             if t not in existing_repos:
26 |                 existing_repos.append(t)
27 |             nextToken = None
28 |         if 'nextToken' in response.keys():
29 |             nextToken = response['nextToken']
30 | 
31 | existing_images = []
32 | for repo in existing_repos:
33 |     response = ecr.describe_images(
34 |         registryId=str(repo[0]),
35 |         repositoryName=str(repo[1])
36 |     )
37 |     for image in response['imageDetails']:
38 |         t = (r, repo[1], image['imageDigest'])
39 |         if t not in existing_images:
40 |             existing_images.append(t)
41 |     nextToken = None
42 |     if 'nextToken' in response.keys():
43 |         nextToken = response['nextToken']
44 |     while nextToken is not None:
45 |         response = ecr.describe_images(
46 |             registryId=repo[0],
47 |             repositoryName=repo[1],
48 |             nextToken=nextToken
49 |         )
50 |         for image in response['imageDetails']:
51 |             t = (r, repo[1], image['imageDigest'])
52 |             if t not in existing_images:
53 |                 existing_images.append(t)
54 |         nextToken = None
55 |         if 'nextToken' in response.keys():
56 |             nextToken = response['nextToken']
57 | 
58 | images = []
59 | for t in existing_images:
60 |     images.append({
61 |         'registryId': t[0],
62 |         'repository': t[1],
63 |         'imageDigest': t[2]
64 |     })
65 | 
66 | print(json.dumps(images))
67 | 


--------------------------------------------------------------------------------
/main.tf:
--------------------------------------------------------------------------------
  1 | terraform {
  2 |   backend "s3" {
  3 |   }
  4 | }
  5 | 
  6 | provider "aws" {
  7 |   version = "~> 2.19.0"
  8 | }
  9 | 
 10 | provider "null" {
 11 |   version = "~> 2.1.2"
 12 | }
 13 | 
 14 | provider "template" {
 15 |   version = "~> 2.1.2"
 16 | }
 17 | 
 18 | data "aws_region" "current" {
 19 | }
 20 | 
 21 | #temp; we expect the vpc to be created externally
 22 | module "vpc" {
 23 |   source = "git::https://github.com/sriddell/terraform-module-standard-vpc.git?ref=1.0.0"
 24 | 
 25 |   #source = "/Users/sriddell/working/titan/terraform-module-standard-vpc"
 26 |   aws_region     = data.aws_region.current.name
 27 |   service        = var.service
 28 |   environment    = var.environment
 29 |   costcenter     = var.costcenter
 30 |   poc            = var.poc
 31 |   key_name       = var.key_name
 32 |   az             = "us-east-1d,us-east-1e"
 33 |   enable_bastion = "0"
 34 | }
 35 | 
 36 | module "cluster" {
 37 |   source                           = "git::https://github.com/sriddell/terraform-module-ecs-cluster.git?ref=2.0.2"
 38 |   vpc_cidr_block                   = module.vpc.vpc_cidr_block
 39 |   environment                      = var.environment
 40 |   costcenter                       = var.costcenter
 41 |   poc                              = var.poc
 42 |   cluster_name                     = "${var.environment}-ecr-cve-monitor"
 43 |   key_name                         = var.key_name
 44 |   ami_id                           = var.ecs_ami_id
 45 |   vpc_id                           = module.vpc.vpc_id
 46 |   private_subnets                  = join(",", module.vpc.private_subnets)
 47 |   container_instance_sec_group_ids = []
 48 |   instance_type                    = var.instance_type
 49 |   asg_desired_capacity             = var.number_of_ecs_instances
 50 |   asg_max_size                     = var.number_of_ecs_instances
 51 |   asg_min_size                     = "0"
 52 | }
 53 | 
 54 | resource "aws_vpc_endpoint" "s3" {
 55 |   vpc_id       = module.vpc.vpc_id
 56 |   service_name = "com.amazonaws.us-east-1.s3"
 57 | }
 58 | 
 59 | output "private_subnets" {
 60 |   value = module.vpc.private_subnets
 61 | }
 62 | 
 63 | output "public_subnets" {
 64 |   value = module.vpc.public_subnets
 65 | }
 66 | 
 67 | output "vpc_id" {
 68 |   value = module.vpc.vpc_id
 69 | }
 70 | 
 71 | output "vpc_cidr_block" {
 72 |   value = module.vpc.vpc_cidr_block
 73 | }
 74 | 
 75 | resource "aws_sqs_queue" "dead_letter" {
 76 |   name                      = "${var.prefix}-clair-dead-letter"
 77 |   delay_seconds             = 0
 78 |   message_retention_seconds = 1209600
 79 |   tags = {
 80 |     Environment = var.environment
 81 |     Service     = var.service
 82 |     CostCenter  = var.costcenter
 83 |     POC         = var.poc
 84 |   }
 85 | }
 86 | 
 87 | resource "aws_sqs_queue" "queue" {
 88 |   name                      = "${var.prefix}-clair-index-requests"
 89 |   delay_seconds             = 0
 90 |   message_retention_seconds = 1209600
 91 |   redrive_policy            = "{\"deadLetterTargetArn\":\"${aws_sqs_queue.dead_letter.arn}\",\"maxReceiveCount\":4}"
 92 | 
 93 |   tags = {
 94 |     Environment = var.environment
 95 |     Service     = var.service
 96 |     CostCenter  = var.costcenter
 97 |     POC         = var.poc
 98 |   }
 99 | }
100 | 
101 | output "input_queue" {
102 |   value = aws_sqs_queue.queue.id
103 | }
104 | 
105 | resource "aws_s3_bucket" "bucket" {
106 |   bucket = "${var.prefix}-clair-scan-results"
107 |   acl    = "private"
108 | 
109 |   tags = {
110 |     Environment = var.environment
111 |     Service     = var.service
112 |     CostCenter  = var.costcenter
113 |     POC         = var.poc
114 |   }
115 | }
116 | 
117 | output "report_bucket" {
118 |   value = aws_s3_bucket.bucket.id
119 | }
120 | 
121 | # DB Subnet group to put in RDS database in vpc
122 | resource "aws_db_subnet_group" "clair" {
123 |   name = "clair-db-subnet"
124 |   # TF-UPGRADE-TODO: In Terraform v0.10 and earlier, it was sometimes necessary to
125 |   # force an interpolation expression to be interpreted as a list by wrapping it
126 |   # in an extra set of list brackets. That form was supported for compatibilty in
127 |   # v0.11, but is no longer supported in Terraform v0.12.
128 |   #
129 |   # If the expression in the following list itself returns a list, remove the
130 |   # brackets to avoid interpretation as a list of lists. If the expression
131 |   # returns a single list item then leave it as-is and remove this TODO comment.
132 |   subnet_ids = "${module.vpc.private_subnets}"
133 | 
134 |   tags = {
135 |     Name        = "clair-db-subnet"
136 |     Environment = var.environment
137 |     Service     = var.service
138 |     CostCenter  = var.costcenter
139 |     POC         = var.poc
140 |   }
141 | }
142 | 
143 | #Create a security group for RDS acccess
144 | resource "aws_security_group" "allow-db" {
145 |   name        = "allow_clair_db"
146 |   description = "Allow all inbound traffic from db processes"
147 |   vpc_id      = module.vpc.vpc_id
148 | 
149 |   ingress {
150 |     from_port = 5432
151 |     to_port   = 5432
152 |     protocol  = "tcp"
153 |     # TF-UPGRADE-TODO: In Terraform v0.10 and earlier, it was sometimes necessary to
154 |     # force an interpolation expression to be interpreted as a list by wrapping it
155 |     # in an extra set of list brackets. That form was supported for compatibilty in
156 |     # v0.11, but is no longer supported in Terraform v0.12.
157 |     #
158 |     # If the expression in the following list itself returns a list, remove the
159 |     # brackets to avoid interpretation as a list of lists. If the expression
160 |     # returns a single list item then leave it as-is and remove this TODO comment.
161 |     cidr_blocks = [module.vpc.vpc_cidr_block]
162 |   }
163 | }
164 | 
165 | resource "random_string" "postgres_password" {
166 |   length  = 16
167 |   special = false
168 | }
169 | 
170 | # Postgres RDS database
171 | resource "aws_db_instance" "default" {
172 |   identifier             = "clair-db"
173 |   allocated_storage      = 10
174 |   storage_type           = "gp2"
175 |   engine                 = "postgres"
176 |   engine_version         = "10.6"
177 |   instance_class         = "db.t2.small"
178 |   name                   = "ClairDb"
179 |   username               = "postgres"
180 |   password               = random_string.postgres_password.result
181 |   db_subnet_group_name   = aws_db_subnet_group.clair.name
182 |   skip_final_snapshot    = true
183 |   vpc_security_group_ids = [aws_security_group.allow-db.id]
184 | 
185 |   tags = {
186 |     Name        = "${var.service}-clair-db"
187 |     Environment = var.environment
188 |     Service     = var.service
189 |     CostCenter  = var.costcenter
190 |     POC         = var.poc
191 |   }
192 | }
193 | 
194 | resource "aws_dynamodb_table" "indexed-layers" {
195 |   name           = "clair-indexed-layers"
196 |   billing_mode   = "PAY_PER_REQUEST"
197 |   read_capacity  = 2
198 |   write_capacity = 100
199 |   hash_key       = "layer_name"
200 |   range_key      = "image_name"
201 | 
202 |   attribute {
203 |     name = "layer_name"
204 |     type = "S"
205 |   }
206 |   attribute {
207 |     name = "image_name"
208 |     type = "S"
209 |   }
210 | 
211 |   tags = {
212 |     Name        = "${var.service}-clair-db"
213 |     Environment = var.environment
214 |     Service     = var.service
215 |     CostCenter  = var.costcenter
216 |     POC         = var.poc
217 |   }
218 | }
219 | 
220 | data "template_file" "clair-config" {
221 |   template = file("config.yaml")
222 |   vars = {
223 |     host     = aws_db_instance.default.address
224 |     dbname   = aws_db_instance.default.name
225 |     user     = "postgres"
226 |     password = random_string.postgres_password.result
227 |   }
228 | }
229 | 
230 | resource "aws_ssm_parameter" "clair-db-connect-string" {
231 |   name        = "/${var.service}/clair-config.yaml"
232 |   description = "The database connection string for the Clair DB"
233 |   type        = "SecureString"
234 |   value       = base64encode(data.template_file.clair-config.rendered)
235 | 
236 |   tags = {
237 |     Name        = "clair-db-connect-string"
238 |     Environment = var.environment
239 |     Service     = var.service
240 |     CostCenter  = var.costcenter
241 |     POC         = var.poc
242 |   }
243 | }
244 | 
245 | data "aws_iam_policy_document" "clair" {
246 |   # Can fetch secrets
247 |   statement {
248 |     actions = ["ssm:GetParameter"]
249 | 
250 |     resources = [
251 |       aws_ssm_parameter.clair-db-connect-string.arn,
252 |     ]
253 | 
254 |     effect = "Allow"
255 |   }
256 | 
257 |   statement {
258 |     actions = [
259 |       "ecr:GetAuthorizationToken",
260 |       "ecr:BatchCheckLayerAvailability",
261 |       "ecr:GetDownloadUrlForLayer",
262 |       "ecr:GetRepositoryPolicy",
263 |       "ecr:DescribeRepositories",
264 |       "ecr:ListImages",
265 |       "ecr:DescribeImages",
266 |       "ecr:BatchGetImage",
267 |     ]
268 |     resources = ["*"]
269 |     effect    = "Allow"
270 |   }
271 | 
272 |   statement {
273 |     actions = ["sqs:*"]
274 | 
275 |     resources = [
276 |       aws_sqs_queue.queue.arn,
277 |       "${aws_sqs_queue.queue.arn}/*",
278 |     ]
279 | 
280 |     effect = "Allow"
281 |   }
282 | }
283 | 
284 | resource "aws_iam_policy" "clair" {
285 |   name   = "clair"
286 |   policy = data.aws_iam_policy_document.clair.json
287 | }
288 | 
289 | resource "aws_iam_role" "clair" {
290 |   name = "clair"
291 | 
292 |   assume_role_policy = <<EOF
293 | {
294 |     "Version": "2008-10-17",
295 |     "Statement": [
296 |         {
297 |             "Action": "sts:AssumeRole",
298 |             "Effect": "Allow",
299 |             "Principal": {
300 |                 "Service": "ecs-tasks.amazonaws.com"
301 |             }
302 |         }
303 |     ]
304 | }
305 | EOF
306 | 
307 | }
308 | 
309 | # Attach the policy to the role
310 | resource "aws_iam_role_policy_attachment" "clair" {
311 |   role = aws_iam_role.clair.name
312 |   policy_arn = aws_iam_policy.clair.arn
313 | }
314 | 
315 | data "aws_iam_policy_document" "clair-scanner" {
316 |   # Can fetch secrets
317 |   statement {
318 |     actions = ["s3:*"]
319 | 
320 |     resources = [
321 |       aws_s3_bucket.bucket.arn,
322 |       "${aws_s3_bucket.bucket.arn}/*",
323 |     ]
324 | 
325 |     effect = "Allow"
326 |   }
327 |   statement {
328 |     actions = ["sqs:*"]
329 | 
330 |     resources = [
331 |       aws_sqs_queue.queue.arn,
332 |       "${aws_sqs_queue.queue.arn}/*",
333 |     ]
334 | 
335 |     effect = "Allow"
336 |   }
337 |   statement {
338 |     actions = ["dynamodb:*"]
339 | 
340 |     resources = [
341 |       aws_dynamodb_table.indexed-layers.arn,
342 |     ]
343 | 
344 |     effect = "Allow"
345 |   }
346 |   statement {
347 |     actions = [
348 |       "ecr:GetAuthorizationToken",
349 |       "ecr:BatchCheckLayerAvailability",
350 |       "ecr:GetDownloadUrlForLayer",
351 |       "ecr:GetRepositoryPolicy",
352 |       "ecr:DescribeRepositories",
353 |       "ecr:ListImages",
354 |       "ecr:DescribeImages",
355 |       "ecr:BatchGetImage",
356 |     ]
357 |     resources = ["*"]
358 |     effect = "Allow"
359 |   }
360 | }
361 | 
362 | resource "aws_iam_policy" "clair-scanner" {
363 |   name = "clair-scanner"
364 |   policy = data.aws_iam_policy_document.clair-scanner.json
365 | }
366 | 
367 | resource "aws_iam_role" "clair-scanner" {
368 |   name = "clair-scanner"
369 | 
370 |   assume_role_policy = <<EOF
371 | {
372 |     "Version": "2008-10-17",
373 |     "Statement": [
374 |         {
375 |             "Action": "sts:AssumeRole",
376 |             "Effect": "Allow",
377 |             "Principal": {
378 |                 "Service": "ecs-tasks.amazonaws.com"
379 |             }
380 |         }
381 |     ]
382 | }
383 | EOF
384 | 
385 | }
386 | 
387 | # Attach the policy to the role
388 | resource "aws_iam_role_policy_attachment" "clair-scanner" {
389 | role       = aws_iam_role.clair-scanner.name
390 | policy_arn = aws_iam_policy.clair-scanner.arn
391 | }
392 | 
393 | resource "aws_cloudwatch_log_group" "ecs" {
394 | name = "${var.prefix}-clair"
395 | 
396 | tags = {
397 | Environment = var.environment
398 | Service     = var.service
399 | CostCenter  = var.costcenter
400 | POC         = var.poc
401 | }
402 | }
403 | 
404 | data "template_file" "clair_task" {
405 | template = file("clair.json")
406 | vars = {
407 | config_parameter_name = "/${var.service}/clair-config.yaml"
408 | log_group             = aws_cloudwatch_log_group.ecs.name
409 | region                = data.aws_region.current.name
410 | sqs_url               = aws_sqs_queue.queue.id
411 | clair_endpoint        = "${aws_lb.clair.dns_name}:6060"
412 | }
413 | }
414 | 
415 | resource "aws_ecs_task_definition" "clair" {
416 | family                = "${var.prefix}-clair"
417 | container_definitions = data.template_file.clair_task.rendered
418 | task_role_arn         = aws_iam_role.clair.arn
419 | }
420 | 
421 | resource "aws_lb_target_group" "clair" {
422 | lifecycle {
423 | create_before_destroy = true
424 | }
425 | name_prefix = "clair"
426 | port        = 6060
427 | protocol    = "HTTP"
428 | vpc_id      = module.vpc.vpc_id
429 | target_type = "instance"
430 | health_check {
431 | interval            = 30
432 | path                = "/v1/namespaces"
433 | healthy_threshold   = 2
434 | unhealthy_threshold = 5
435 | }
436 | }
437 | 
438 | resource "aws_security_group" "clair-private" {
439 | name   = "clair-private"
440 | vpc_id = module.vpc.vpc_id
441 | ingress {
442 | from_port = 6060
443 | to_port   = 6060
444 | protocol  = "tcp"
445 | # TF-UPGRADE-TODO: In Terraform v0.10 and earlier, it was sometimes necessary to
446 | # force an interpolation expression to be interpreted as a list by wrapping it
447 | # in an extra set of list brackets. That form was supported for compatibilty in
448 | # v0.11, but is no longer supported in Terraform v0.12.
449 | #
450 | # If the expression in the following list itself returns a list, remove the
451 | # brackets to avoid interpretation as a list of lists. If the expression
452 | # returns a single list item then leave it as-is and remove this TODO comment.
453 | cidr_blocks = [module.vpc.vpc_cidr_block]
454 | }
455 | egress {
456 | from_port = 32768
457 | to_port   = 60000
458 | protocol  = "tcp"
459 | # TF-UPGRADE-TODO: In Terraform v0.10 and earlier, it was sometimes necessary to
460 | # force an interpolation expression to be interpreted as a list by wrapping it
461 | # in an extra set of list brackets. That form was supported for compatibilty in
462 | # v0.11, but is no longer supported in Terraform v0.12.
463 | #
464 | # If the expression in the following list itself returns a list, remove the
465 | # brackets to avoid interpretation as a list of lists. If the expression
466 | # returns a single list item then leave it as-is and remove this TODO comment.
467 | cidr_blocks = [module.vpc.vpc_cidr_block]
468 | }
469 | tags = {
470 | Environment = var.environment
471 | Service     = var.service
472 | CostCenter  = var.costcenter
473 | POC         = var.poc
474 | }
475 | }
476 | 
477 | resource "aws_lb" "clair" {
478 | name               = "clair"
479 | internal           = true
480 | load_balancer_type = "application"
481 | security_groups    = [aws_security_group.clair-private.id]
482 | # TF-UPGRADE-TODO: In Terraform v0.10 and earlier, it was sometimes necessary to
483 | # force an interpolation expression to be interpreted as a list by wrapping it
484 | # in an extra set of list brackets. That form was supported for compatibilty in
485 | # v0.11, but is no longer supported in Terraform v0.12.
486 | #
487 | # If the expression in the following list itself returns a list, remove the
488 | # brackets to avoid interpretation as a list of lists. If the expression
489 | # returns a single list item then leave it as-is and remove this TODO comment.
490 | subnets = "${module.vpc.private_subnets}"
491 | 
492 | tags = {
493 | Name        = "clair-db-subnet"
494 | Environment = var.environment
495 | Service     = var.service
496 | CostCenter  = var.costcenter
497 | POC         = var.poc
498 | }
499 | }
500 | 
501 | resource "aws_lb_listener" "clair" {
502 | depends_on        = [aws_lb.clair]
503 | load_balancer_arn = aws_lb.clair.arn
504 | port              = "6060"
505 | protocol          = "HTTP"
506 | 
507 | default_action {
508 | type             = "forward"
509 | target_group_arn = aws_lb_target_group.clair.arn
510 | }
511 | }
512 | 
513 | resource "aws_ecs_service" "clair" {
514 | depends_on = [
515 | aws_lb_target_group.clair,
516 | aws_lb.clair,
517 | aws_lb_listener.clair,
518 | ]
519 | name            = "clair"
520 | cluster         = module.cluster.cluster_id
521 | task_definition = aws_ecs_task_definition.clair.arn
522 | desired_count   = var.number_of_clair_instances
523 | 
524 | load_balancer {
525 | target_group_arn = aws_lb_target_group.clair.arn
526 | container_name   = "clair"
527 | container_port   = 6060
528 | }
529 | }
530 | 
531 | data "template_file" "clair-scanner" {
532 | template = file("clair-scanner.json")
533 | vars = {
534 | clair_endpoint = "${aws_lb.clair.dns_name}:6060"
535 | sqs_url        = aws_sqs_queue.queue.id
536 | output_bucket  = aws_s3_bucket.bucket.id
537 | log_group      = aws_cloudwatch_log_group.ecs.name
538 | region         = data.aws_region.current.name
539 | }
540 | }
541 | 
542 | resource "aws_ecs_task_definition" "clair-scanner" {
543 | family                = "${var.prefix}-clair-scanner"
544 | container_definitions = data.template_file.clair-scanner.rendered
545 | task_role_arn         = aws_iam_role.clair-scanner.arn
546 | }
547 | 
548 | resource "aws_ecs_service" "clair-scanner" {
549 | name                               = "clair-scanner"
550 | cluster                            = module.cluster.cluster_id
551 | task_definition                    = aws_ecs_task_definition.clair-scanner.arn
552 | desired_count                      = var.number_of_scanners
553 | deployment_minimum_healthy_percent = 0
554 | deployment_maximum_percent         = 100
555 | #   network_configuration {
556 | #     subnets = ["${module.vpc.private_subnets}"]
557 | #     security_groups = ["${aws_security_group.clair-users.id}"]
558 | #   }
559 | }
560 | 
561 | resource "aws_cloudwatch_event_rule" "putimage" {
562 | name          = "ecr-PutImage"
563 | event_pattern = <<PATTERN
564 | {
565 |   "source": [
566 |     "aws.ecr"
567 |   ],
568 |   "detail": {
569 |     "eventName": [
570 |       "PutImage"
571 |     ]
572 |  }
573 | }
574 | PATTERN
575 | 
576 | }
577 | 
578 | resource "aws_cloudwatch_event_target" "lambda" {
579 | rule = aws_cloudwatch_event_rule.putimage.name
580 | arn = aws_lambda_function.putimage.arn
581 | }
582 | 
583 | resource "aws_iam_role" "iam_for_lambda" {
584 | name_prefix = "lambda-put-image"
585 | 
586 | assume_role_policy = <<EOF
587 | {
588 |   "Version": "2012-10-17",
589 |   "Statement": [
590 |     {
591 |       "Action": "sts:AssumeRole",
592 |       "Principal": {
593 |         "Service": "lambda.amazonaws.com"
594 |       },
595 |       "Effect": "Allow",
596 |       "Sid": ""
597 |     }
598 |   ]
599 | }
600 | EOF
601 | 
602 | }
603 | 
604 | resource "aws_iam_policy" "lambda_logging" {
605 |   name        = "lambda_logging"
606 |   path        = "/"
607 |   description = "IAM policy for logging from a lambda"
608 | 
609 |   policy = <<EOF
610 | {
611 |   "Version": "2012-10-17",
612 |   "Statement": [
613 |     {
614 |       "Action": [
615 |         "logs:CreateLogGroup",
616 |         "logs:CreateLogStream",
617 |         "logs:PutLogEvents"
618 |       ],
619 |       "Resource": "arn:aws:logs:*:*:*",
620 |       "Effect": "Allow"
621 |     }
622 |   ]
623 | }
624 | EOF
625 | 
626 | }
627 | 
628 | resource "aws_iam_role_policy_attachment" "lambda_logs" {
629 |   role = aws_iam_role.iam_for_lambda.name
630 |   policy_arn = aws_iam_policy.lambda_logging.arn
631 | }
632 | 
633 | data "aws_iam_policy_document" "lambda_sqs" {
634 |   statement {
635 |     actions = ["sqs:*"]
636 | 
637 |     resources = [
638 |       aws_sqs_queue.queue.arn,
639 |       "${aws_sqs_queue.queue.arn}/*",
640 |     ]
641 | 
642 |     effect = "Allow"
643 |   }
644 | }
645 | 
646 | resource "aws_iam_policy" "lambda_sqs" {
647 |   name = "clair_lambda_sqs"
648 |   policy = data.aws_iam_policy_document.lambda_sqs.json
649 | }
650 | 
651 | resource "aws_iam_role_policy_attachment" "lambda_sqs" {
652 |   role = aws_iam_role.iam_for_lambda.name
653 |   policy_arn = aws_iam_policy.lambda_sqs.arn
654 | }
655 | 
656 | resource "aws_lambda_function" "putimage" {
657 |   filename = "putimage.zip"
658 |   function_name = "ecr-cve-monitor-putimage"
659 |   role = aws_iam_role.iam_for_lambda.arn
660 |   handler = "handler.put_image"
661 |   source_code_hash = filebase64sha256("putimage.zip")
662 |   runtime = "python3.6"
663 |   environment {
664 |     variables = {
665 |       SQS_QUEUE_URL = aws_sqs_queue.queue.id
666 |       REGION = data.aws_region.current.name
667 |     }
668 |   }
669 | }
670 | 
671 | resource "aws_lambda_permission" "allow_cloudwatch" {
672 |   action = "lambda:InvokeFunction"
673 |   function_name = aws_lambda_function.putimage.function_name
674 |   principal = "events.amazonaws.com"
675 |   source_arn = aws_cloudwatch_event_rule.putimage.arn
676 | }
677 | 
678 | 


--------------------------------------------------------------------------------
/putimage.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sriddell/ecr-cve-monitor/b820efea1efcc9f8a266c3fac7fb88097f05425f/putimage.zip


--------------------------------------------------------------------------------
/quarantine.py:
--------------------------------------------------------------------------------
 1 | import boto3
 2 | import json
 3 | import sys
 4 | import subprocess
 5 | 
 6 | # date_handler = lambda obj: (
 7 | #     obj.isoformat()
 8 | #     if isinstance(obj, (datetime.datetime, datetime.date))
 9 | #     else None
10 | # )
11 | ecr = boto3.client('ecr')
12 | s3 = boto3.resource('s3')
13 | to_quarantine = []
14 | with open(sys.argv[1]) as json_file:
15 |     to_quarantine = json.load(json_file)
16 | for q in to_quarantine:
17 |     image_name = q['registryId'] + '.dkr.ecr.us-east-1.amazonaws.com/' + q['repositoryName'] + '@' + q['imageId']
18 |     subprocess.run(['docker', 'pull', image_name])
19 |     imageId = q['imageId'].split('sha256:')[1]
20 |     archive_name = imageId + '.tar'
21 |     with open(imageId + '.json', 'w') as outfile:
22 |         json.dump(q, outfile)
23 |     subprocess.run(['docker', 'save', image_name, '-o', archive_name])
24 |     s3.Bucket('10011-ecr-quarantine').upload_file(
25 |         Filename=imageId + '.json',
26 |         Key=q['registryId'] + '/' + q['repositoryName'] + '/' + imageId + '/' + imageId + '.json'
27 |     )
28 |     s3.Bucket('10011-ecr-quarantine').upload_file(
29 |         Filename=archive_name,
30 |         Key=q['registryId'] + '/' + q['repositoryName'] + '/' + imageId + '/' + archive_name
31 |     )
32 |     ecr.batch_delete_image(
33 |         registryId=q['registryId'],
34 |         repositoryName=q['repositoryName'],
35 |         imageIds=[
36 |             {
37 |                 'imageDigest': q['imageId']
38 |             }
39 |         ]
40 |     )
41 | 
42 | 


--------------------------------------------------------------------------------
/report.py:
--------------------------------------------------------------------------------
  1 | import boto3
  2 | import time
  3 | import json
  4 | import csv
  5 | import botocore
  6 | import os
  7 | import uuid
  8 | import re
  9 | import sys
 10 | from datetime import datetime, timedelta, timezone, date
 11 | 
 12 | 
 13 | partitions = None
 14 | start = datetime(int(sys.argv[3]), int(sys.argv[1]), int(sys.argv[2]))
 15 | stop = datetime(int(sys.argv[6]), int(sys.argv[4]), int(sys.argv[5]))
 16 | cutoff = datetime(year=int(sys.argv[9]), month=int(sys.argv[7]), day=int(sys.argv[8]), hour=0, minute=0, second=0, tzinfo=timezone.utc)
 17 | 
 18 | day = start
 19 | partitions = []
 20 | while day <= stop:
 21 |     partitions.append({
 22 |         'year': str(day.year),
 23 |         'month': str(day.month).zfill(2),
 24 |         'day': str(day.day).zfill(2)
 25 |     })
 26 |     day = day + timedelta(days=1)
 27 | 
 28 | date_handler = lambda obj: (
 29 |     obj.isoformat()
 30 |     if isinstance(obj, (datetime, date))
 31 |     else None
 32 | )
 33 | 
 34 | athena = boto3.client('athena')
 35 | ecr = boto3.client('ecr')
 36 | 
 37 | 
 38 | def execute_query(query_string):
 39 |     result = athena.start_query_execution(
 40 |         QueryString=query_string,
 41 |         QueryExecutionContext={
 42 |             'Database': 'ecrreports'
 43 |         },
 44 |         ResultConfiguration={
 45 |             'OutputLocation': 's3://ecr-clair-scan-results'
 46 |         }
 47 |     )
 48 | 
 49 |     q_execution_id = result['QueryExecutionId']
 50 | 
 51 |     status = 'RUNNING'
 52 |     response = None
 53 |     while status == 'RUNNING':
 54 |         time.sleep(3)
 55 |         response = athena.batch_get_query_execution(
 56 |             QueryExecutionIds=[q_execution_id]
 57 |         )
 58 |         status = response['QueryExecutions'][0]['Status']['State']
 59 |     if status != "SUCCEEDED":
 60 |         print(response)
 61 |         raise Exception("Failed, status is " + status)
 62 |     return q_execution_id
 63 | 
 64 | 
 65 | def update_details(vulnerable_images, repo, details, imageDetails):
 66 |     for image in imageDetails:
 67 |         t = (repo[0], repo[1], image['imageDigest'])
 68 |         if t in vulnerable_images:
 69 |             if t not in details:
 70 |                 details[t] = {'tags': [], 'imagePushedAt': None}
 71 |                 details[t]['imagePushedAt'] = image['imagePushedAt']
 72 |             if 'imageTags' in image.keys():
 73 |                 details[t]['tags'] = image['imageTags']
 74 | 
 75 | 
 76 | table_name = 'reports_' + re.sub('[-]', '', str(uuid.uuid4()))
 77 | table_def = 'CREATE external TABLE ' + table_name + ' ('
 78 | table_def = table_def + '''
 79 |   LayerCount int,
 80 |   AnalyzedImageName string,
 81 |   ImageDigest string,
 82 |   ECRMetadata struct<
 83 |   imageId:struct<imageDigest:string>,
 84 |   manifest:struct<config:struct<digest:string>>,
 85 |   repositoryName:string,
 86 |   registryId:string
 87 | >,
 88 | 
 89 |          Vulnerabilities struct< High:array<struct<Name:string,
 90 |          NamespaceName:string,
 91 |          Description:string,
 92 |          Link:string,
 93 |          Severity:string>>,
 94 |          Medium:array<struct<Name:string,
 95 |          NamespaceName:string,
 96 |          Description:string,
 97 |          Link:string,
 98 |          Severity:string>>,
 99 |          Medium:array<struct<Name:string,
100 |          NamespaceName:string,
101 |          Description:string,
102 |          Link:string,
103 |          Severity:string>>,
104 |          Medium:array<struct<Name:string,
105 |          NamespaceName:string,
106 |          Description:string,
107 |          Link:string,
108 |          Severity:string>>,
109 |          Low:array<struct<Name:string,
110 |          NamespaceName:string,
111 |          Description:string,
112 |          Link:string,
113 |          Severity:string>>,
114 |          Medium:array<struct<Name:string,
115 |          NamespaceName:string,
116 |          Description:string,
117 |          Link:string,
118 |          Severity:string>>,
119 |          Negligible:array<struct<Name:string,
120 |          NamespaceName:string,
121 |          Description:string,
122 |          Link:string,
123 |          Severity:string>>,
124 |          Medium:array<struct<Name:string,
125 |          NamespaceName:string,
126 |          Description:string,
127 |          Link:string,
128 |          Severity:string>>
129 |           >
130 | )
131 | PARTITIONED BY(year string, month string, day string)
132 | ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://ecrscan-clair-scan-results/'
133 | '''
134 | drop_table = 'DROP TABLE `' + table_name + '`;'
135 | 
136 | 
137 | execute_query(table_def)
138 | 
139 | if partitions is None:
140 |     execute_query("MSCK REPAIR TABLE " + table_name)  # load all data
141 | else:
142 |     for partition in partitions:
143 |         add_partition = "ALTER TABLE " + table_name + " ADD PARTITION (year='" + partition['year'] + "',month='" + partition['month'] + "',day='" + partition['day'] + "') location 's3://ecrscan-clair-scan-results/year=" + partition['year'] + "/month=" + partition['month'] + "/day=" + partition['day'] + "/'"
144 |         execute_query(add_partition)
145 | 
146 | 
147 | query_string = "select distinct ECRMetadata.registryId, ECRMetadata.repositoryName, ECRMetadata.imageId.imageDigest from " + table_name + " where cardinality(vulnerabilities.High) > 0 order by ECRMetadata.registryId, ECRMetadata.repositoryName, ECRMetadata.imageId.imageDigest;"
148 | q_execution_id = execute_query(query_string)
149 | execute_query(drop_table)
150 | 
151 | s3_key = q_execution_id + '.csv'
152 | local_filename = q_execution_id + '.csv'
153 | s3 = boto3.resource('s3')
154 | try:
155 |     s3.Bucket('ecr-clair-scan-results').download_file(s3_key, local_filename)
156 | except botocore.exceptions.ClientError as e:
157 |     if e.response['Error']['Code'] == "404":
158 |         print("The object does not exist.")
159 |     else:
160 |         raise
161 | 
162 | # read file to array
163 | vulnerable_images = []
164 | with open(local_filename) as csvfile:
165 |     reader = csv.DictReader(csvfile)
166 |     for row in reader:
167 |         vulnerable_images.append((row['registryid'], row['repositoryname'], row['imagedigest']))
168 | # delete result file
169 | if os.path.isfile(local_filename):
170 |     os.remove(local_filename)
171 | repos = set()
172 | registries = set()
173 | for row in vulnerable_images:
174 |     t = (row[0], row[1])
175 |     if t not in repos:
176 |         repos.add(t)
177 |     if row[0] not in registries:
178 |         registries.add(row[0])
179 | 
180 | details = {}
181 | for k in repos:
182 |     try:
183 |         response = ecr.describe_images(
184 |             registryId=k[0],
185 |             repositoryName=k[1]
186 |         )
187 | 
188 |         update_details(vulnerable_images, k, details, response['imageDetails'])
189 |         nextToken = None
190 |         if 'nextToken' in response.keys():
191 |             nextToken = response['nextToken']
192 |         while nextToken is not None:
193 |             response = ecr.describe_images(
194 |                 registryId=k[0],
195 |                 repositoryName=k[1],
196 |                 nextToken=nextToken
197 |             )
198 |             update_details(vulnerable_images, k, details, response['imageDetails'])
199 |             nextToken = None
200 |             if 'nextToken' in response.keys():
201 |                 nextToken = response['nextToken']
202 |     except botocore.exceptions.ClientError:
203 |         # ideally, we would list all repos, then filter out reports for repos which have been deleted
204 |         # unfortunately, listing all repos cross account doesn't seem to be working; have reached out to
205 |         # aws on this
206 |         continue
207 | 
208 | # Note that the tags map may contain fewer images than generated in the report, this is because
209 | # an ecr image may have been deleted after it was scanned.
210 | report = {
211 |     'partitions': partitions,
212 |     'high_vulnerabilities': []
213 | }
214 | report['high_vulnerabilities'] = []
215 | for k in details.keys():
216 |     out = {
217 |         'registryId': k[0],
218 |         'repositoryName': k[1],
219 |         'imageId': k[2],
220 |         'tags': details[k]['tags'],
221 |         'imagePushedAt': details[k]['imagePushedAt']
222 |     }
223 |     report['high_vulnerabilities'].append(out)
224 | 
225 | report['high_vulnerabilities'] = list(filter(lambda x: (x['imagePushedAt'] >= cutoff), report['high_vulnerabilities']))
226 | report['high_vulnerabilities'].sort(key=lambda x: x['imagePushedAt'], reverse=True)
227 | print(json.dumps(report, default=date_handler))
228 | 


--------------------------------------------------------------------------------
/reque.py:
--------------------------------------------------------------------------------
 1 | import boto3
 2 | 
 3 | sqs = boto3.resource('sqs')
 4 | queue = sqs.Queue('https://sqs.us-east-1.amazonaws.com/234324814398/ecrscan-clair-dead-letter')
 5 | to_queue = sqs.Queue('https://sqs.us-east-1.amazonaws.com/234324814398/ecrscan-clair-index-requests')
 6 | while True:
 7 |     msgs = queue.receive_messages(
 8 |         VisibilityTimeout=20 * 60,
 9 |         WaitTimeSeconds=20
10 |     )
11 |     if len(msgs) > 0:
12 |         for msg in msgs:
13 |             to_queue.send_message(MessageBody=msg.body)
14 |             msg.delete()
15 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | boto3
2 | 


--------------------------------------------------------------------------------
/variables.tf:
--------------------------------------------------------------------------------
 1 | variable "service" {
 2 | }
 3 | 
 4 | variable "environment" {
 5 | }
 6 | 
 7 | variable "costcenter" {
 8 | }
 9 | 
10 | variable "poc" {
11 | }
12 | 
13 | variable "ecs_ami_id" {
14 | }
15 | 
16 | variable "key_name" {
17 | }
18 | 
19 | variable "number_of_scanners" {
20 |   default = 1
21 | }
22 | 
23 | variable "number_of_ecs_instances" {
24 |   default = 1
25 | }
26 | 
27 | variable "instance_type" {
28 | }
29 | 
30 | variable "prefix" {
31 | }
32 | 
33 | variable "number_of_clair_instances" {
34 |   default = 1
35 | }
36 | 
37 | # variable "private_subnet_ids" {
38 | #   type = "list"
39 | # }
40 | #variable "vpc_id" {}
41 | 


--------------------------------------------------------------------------------
/versions.tf:
--------------------------------------------------------------------------------
1 | 
2 | terraform {
3 |   required_version = ">= 0.12"
4 | }
5 | 


--------------------------------------------------------------------------------