├── .flake8 ├── .gitignore ├── .yamllint ├── LICENSE ├── README.md ├── bootstrap-create-messages.py ├── bootstrap-load-messages.py ├── clair-scanner.json ├── clair.json ├── cleanup.py ├── config.yaml ├── ecr-cve-monitor.md ├── ecr-cve-monitor.png ├── handler.py ├── list_repos.py ├── main.tf ├── putimage.zip ├── quarantine.py ├── report.py ├── reque.py ├── requirements.txt ├── variables.tf └── versions.tf /.flake8: -------------------------------------------------------------------------------- 1 | [flake8] 2 | ignore = E501 3 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.tfstate 2 | .terraform 3 | *.tfvars 4 | venv 5 | .vscode 6 | -------------------------------------------------------------------------------- /.yamllint: -------------------------------------------------------------------------------- 1 | line-length: 2 | max: 200 -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | This project is based on coreos/Clair. Copyright for which are held by the coreos/clair project (https://github.com/coreos/clair/LICENSE). 2 | This project uses a custom fork of [Klar](https://github.com/optiopay/klar). Copyright for which are held by Optiopay GmbH, 2016 (https://github.com/optiopay/klar). 3 | All other copyright for project ecr-cve-monitor are held by Shane Riddell, 2019. 4 | 5 | Copyright 2019 Shane Riddell 6 | 7 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 8 | 9 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 10 | 11 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ecr-cve-monitor 2 | 3 | This project is a working Proof-of-Concept of using the [coreos/Clair project](https://github.com/coreos/clair) to scan all images pushed to an AWS ECR registry, and to automatically rescan them if Clair detects a new CVE that affects a known image. 4 | 5 | See ecr-cve-monitor.md for more details on the purpose and architecture of the project. 6 | 7 | ## Installation 8 | 9 | Make sure you have terraform 0.11.13 or greater in the 0.11.x release series available. 10 | 11 | Create a terraform.tfvars file and define the following values: 12 | 13 | ``` 14 | environment="environment name, like cicd or dev" 15 | costcenter="costcenter identifier" 16 | poc="point of contact email" 17 | service="ecr-cve-monitor" 18 | 19 | ecs_ami_id="latest ECS cluster AMI id for your deployment region" 20 | key_name="ssh key for ec2 instances" 21 | instance_type="instance type for the ecs cluster, I use m5.xlarge for the default installation settings for memory and cpu usage" 22 | 23 | number_of_clair_instances=1 24 | number_of_scanners=1 25 | number_of_ecs_instances=2 26 | 27 | prefix="a prefix to use for all resources created" 28 | ``` 29 | 30 | Run terraform init, then terraform plan to look at the plan that will be generated, then apply the changes. 31 | A new VPC with an ECS cluster running the ecr-cve-monitor software, along with the message queue, dead-letter queue, dynamodb tables, and a CloudWatch event to trigger a lambda to queue up an image scan anytime a new image is pushed to the ecr registry in this account will be created. 32 | 33 | Note that if you want to install to an existing VPC and/or an existing ECS cluster, you can modify the main.tf file to do so. 34 | 35 | Also, while the underlying image layer tracking is capable of supporting multiple registries (in different regions/accounts), it has not yet been tested, and you would need to modify the terraform to allow CloudWatch events from the other registry to be pushed onto the ecr-cve-monitor input queue. 36 | 37 | ### Bootstrapping 38 | 39 | Note that you should let the clair service deployed by terraform run for at least 60 minutes so that it can do the initial CVE database load. While Clair is loading the initial CVEs, it will generate empty reports, but disables generating notifications for CVEs as they come in - so basically, if you bootstrap too soon, you will have to repeat it to get accurate first-time reports. 40 | 41 | Run the bootstrap-create-messages.py and bootstrap-load-message.py (comments in the files contain instructions on how to run them.) Basically, these scan a registry for all existing images, and queue a scan request for each image so that they become known to and monitored by clair, and generate an initial report to s3. 42 | 43 | Note that if you have a lot of images to go through initially, you may want to temporarily adjust the terraform.tfvars for number_of_clair_instances, number_of_scanners, and number_of_ecs_instances to get through the backlog more quickly. During testing, we found that a clair_instance could typically handle about 8 clair scanners at once. 44 | 45 | ## Reporting 46 | 47 | ### Setting up reporting 48 | 49 | In AWS Glue, create a database for reporting on the scan results. Then in AWS Athena, create an external table like so: 50 | 51 | ``` 52 | CREATE external TABLE reports ( 53 | LayerCount int, 54 | AnalyzedImageName string, 55 | ImageDigest string, 56 | ECRMetadata struct< 57 | imageId:struct, 58 | manifest:struct>, 59 | repositoryName:string, 60 | registryId:string 61 | >, 62 | 63 | Vulnerabilities struct< High:array>, 68 | Medium:array>, 73 | Medium:array>, 78 | Medium:array>, 83 | Low:array>, 88 | Medium:array>, 93 | Negligible:array>, 98 | Medium:array> 103 | > 104 | ) 105 | PARTITIONED BY(year string, month string, day string) 106 | ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://my-report-bucket/' 107 | ``` 108 | 109 | Be sure to replace the LOCATION s3://my-report-bucket with the terraform output 'report_bucket' that specifies your own report bucket name 110 | 111 | ### Reporting with Athena 112 | 113 | To report with athena, you can load all partions with 114 | 115 | ``` 116 | MSCK REPAIR TABLE reports 117 | ``` 118 | 119 | However, typically, you do not need to creates reports across the entire time series, and will only be interested in seeing new reports (either new CVEs that affected existing image, or that exist in newly pushed images) for a given time range. To do that more cheaply and efficiently, load just the partitions that correspond to the time range you want to query. 120 | 121 | For example, to load and report on January 15 of 2019: 122 | 123 | ``` 124 | ALTER TABLE reports ADD PARTITION (year='2019',month='01',day='15') location 's3://my-scan-results/year=2019/month=01/day=15/' 125 | ``` 126 | 127 | You can then query for any images that were detected to have at least 1 High CVE on the 15th. 128 | 129 | ``` 130 | select distinct ECRMetadata.registryId, ECRMetadata.repositoryName, ECRMetadata.imageId.imageDigest from reports where cardinality(vulnerabilities.High) > 0 and year='2019' and month='01' and day='15' order by ECRMetadata.registryId, ECRMetadata.repositoryName, ECRMetadata.imageId.imageDigest; 131 | ``` 132 | 133 | Note that this will give you back results in terms of the internal registry ID of the image. You can use the AWS SDKs to convert this to a (current) list of human friendly tags. The Athena reporting itself, and the internal report structures cannot use human-friendly image tags because they are not immutable. 134 | 135 | ## High level diagram 136 | 137 | ![Architecture](ecr-cve-monitor.png) 138 | 139 | ## Disaster recovery 140 | 141 | Any loss of information can be recovered by repopulating the reports from scratch (except historical time-series data). 142 | 143 | ## Why Clair 144 | 145 | * From CoreOS team 146 | * Opensource 147 | * Used to power vulnerability scanning in Quay.io 148 | * Can generate reports without re-consuming layers 149 | * Can raise new vulnerabilities against existing layers without actually rescanning the image 150 | -------------------------------------------------------------------------------- /bootstrap-create-messages.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import json 3 | 4 | # Should work iwth any version of python 3 with boto3 available. 5 | # Make sure you have exported AWS credentials for the account that contains the ECR registry 6 | # into your shell before running this script. 7 | # Scans an ECR registry, outputting scan request messages for all images found in all repositories. 8 | # To use, set REGISTRY_ID to the registry ID you wish to bootstrap. 9 | # Redirect the output to output.json 10 | # python bootstrap-create-messages > output.json. 11 | 12 | # Note that this currently assumes the registry is in us-east-1 13 | 14 | client = boto3.client('ecr') 15 | 16 | REGISTRY_ID = '' 17 | resp = client.describe_repositories( 18 | registryId=REGISTRY_ID, 19 | maxResults=100 20 | ) 21 | repos = [] 22 | for r in resp['repositories']: 23 | repos.append(r) 24 | next_token = None 25 | if 'nextToken' in resp: 26 | next_token = resp["nextToken"] 27 | while next_token is not None: 28 | resp = client.describe_repositories( 29 | registryId='434313288222', 30 | maxResults=100, 31 | nextToken=next_token 32 | ) 33 | for r in resp['repositories']: 34 | repos.append(r) 35 | next_token = None 36 | if 'nextToken' in resp: 37 | next_token = resp['nextToken'] 38 | 39 | messages = [] 40 | base = 'aws sqs send-message --queue-url %s --message-body \'{"ScanImage":{"awsRegion": "us-east-1", "repositoryName": "%s", "registryId": "%s", "imageId": {"imageDigest": "%s"}}}\'' 41 | for r in repos: 42 | resp = client.list_images( 43 | registryId=REGISTRY_ID, 44 | repositoryName=r['repositoryName'], 45 | maxResults=100 46 | ) 47 | for i in resp['imageIds']: 48 | msg = { 49 | 'ScanImage': { 50 | 'awsRegion': 'us-east-1', 51 | 'repositoryName': r['repositoryName'], 52 | 'registryId': REGISTRY_ID, 53 | 'imageId': {'imageDigest': i['imageDigest']} 54 | } 55 | } 56 | messages.append(msg) 57 | next_token = None 58 | if 'nextToken' in resp: 59 | next_token = resp['nextToken'] 60 | while next_token is not None: 61 | resp = client.list_images( 62 | registryId=REGISTRY_ID, 63 | repositoryName=r['repositoryName'], 64 | maxResults=100, 65 | nextToken=next_token 66 | ) 67 | for i in resp['imageIds']: 68 | msg = { 69 | 'ScanImage': { 70 | 'awsRegion': 'us-east-1', 71 | 'repositoryName': r['repositoryName'], 72 | 'registryId': REGISTRY_ID, 73 | 'imageId': {'imageDigest': i['imageDigest']} 74 | } 75 | } 76 | messages.append(msg) 77 | next_token = None 78 | if 'nextToken' in resp: 79 | next_token = resp['nextToken'] 80 | 81 | print(json.dumps(messages)) 82 | -------------------------------------------------------------------------------- /bootstrap-load-messages.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import json 3 | 4 | # Should work iwth any version of python 3 with boto3 available. 5 | # Make sure you have exported AWS credentials for the account that message queue output from 6 | # the terraform variable output variable 7 | # input_queue. set QUEUE_URL to the URL of the 8 | # SQS queue created for scan requests by the terraform script, which will be the output variable 9 | # input_queue. 10 | 11 | 12 | QUEUE_URL = '' 13 | client = boto3.client('sqs') 14 | 15 | messages = [] 16 | with open('output.json') as f: 17 | messages = json.load(f) 18 | 19 | print('loaded messages') 20 | count = 0 21 | for msg in messages: 22 | client.send_message( 23 | QueueUrl=QUEUE_URL, 24 | MessageBody=json.dumps(msg) 25 | ) 26 | count = count + 1 27 | if (count % 100) == 0: 28 | print(count) 29 | -------------------------------------------------------------------------------- /clair-scanner.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "name": "clair-scanner", 4 | "image": "sriddell/clair-scanner:1.3.0", 5 | "cpu": 512, 6 | "memory": 1024, 7 | "essential": true, 8 | "logConfiguration": { 9 | "logDriver": "awslogs", 10 | "options": { 11 | "awslogs-group": "${log_group}", 12 | "awslogs-region": "${region}", 13 | "awslogs-stream-prefix": "clair-scanner" 14 | } 15 | }, 16 | "environment": [ 17 | { 18 | "name": "SQS_QUEUE_URL", 19 | "value": "${sqs_url}" 20 | }, 21 | { 22 | "name": "REGION", 23 | "value": "${region}" 24 | }, 25 | { 26 | "name": "CLAIR_ADDR", 27 | "value": "${clair_endpoint}" 28 | }, 29 | { 30 | "name": "BUCKET", 31 | "value": "${output_bucket}" 32 | }, 33 | { 34 | "name": "LOG_LEVEL", 35 | "value": "INFO" 36 | } 37 | ] 38 | } 39 | ] 40 | -------------------------------------------------------------------------------- /clair.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "name": "clair", 4 | "image": "sriddell/clair-with-ssm:1.2.0", 5 | "cpu": 3900, 6 | "memory": 14000, 7 | "ulimits": [ 8 | { 9 | "softLimit": 16384, 10 | "hardLimit": 16384, 11 | "name": "nofile" 12 | } 13 | ], 14 | "essential": true, 15 | "links": ["notification-endpoint"], 16 | "logConfiguration": { 17 | "logDriver": "awslogs", 18 | "options": { 19 | "awslogs-group": "${log_group}", 20 | "awslogs-region": "${region}", 21 | "awslogs-stream-prefix": "clair" 22 | } 23 | }, 24 | "environment": [ 25 | { 26 | "name": "CONFIG_PARAMETER_REGION", 27 | "value": "${region}" 28 | }, 29 | { 30 | "name": "CONFIG_PARAMETER_NAME", 31 | "value": "${config_parameter_name}" 32 | }, 33 | { 34 | "name": "LOG_LEVEL", 35 | "value": "WARN" 36 | } 37 | ], 38 | "portMappings": [ 39 | { 40 | "containerPort": 6060 41 | } 42 | ] 43 | }, 44 | { 45 | "name": "notification-endpoint", 46 | "image": "sriddell/clair-notification-endpoint:0.2.0", 47 | "cpu": 128, 48 | "memory": 128, 49 | "essential": true, 50 | "logConfiguration": { 51 | "logDriver": "awslogs", 52 | "options": { 53 | "awslogs-group": "${log_group}", 54 | "awslogs-region": "${region}", 55 | "awslogs-stream-prefix": "notification-endpoint" 56 | } 57 | }, 58 | "environment": [ 59 | { 60 | "name": "SQS_QUEUE_URL", 61 | "value": "${sqs_url}" 62 | }, 63 | { 64 | "name": "REGION", 65 | "value": "${region}" 66 | }, 67 | { 68 | "name": "CLAIR_ENDPOINT", 69 | "value": "http://${clair_endpoint}" 70 | } 71 | ] 72 | } 73 | ] 74 | -------------------------------------------------------------------------------- /cleanup.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import json 3 | import sys 4 | 5 | # Prototype to remove s3 records and dynamodb records for images that have been removed from ECR. 6 | # Right now, the list_repos.py has to be run under 10011 credentials to build the list of all repos, 7 | # then this script runs under 10021 credentials to remove s3 reports and dynamodb entries for any repos that 8 | # no longer exist, so we don't report on them, or trigger clair layer notifications for them. 9 | # This should be wrapped into lambda functions to run periodically, or on notification of a delete 10 | # from ECR 11 | BUCKET = 'ecrscan-clair-scan-results' 12 | raw = None 13 | with open(sys.argv[1]) as f: 14 | raw = json.load(f) 15 | 16 | images = {} 17 | for image in raw: 18 | registryId = image['registryId'] 19 | repository = image['repository'] 20 | imageDigest = image['imageDigest'].split('sha256:')[1] 21 | if registryId not in images.keys(): 22 | images[registryId] = {} 23 | if repository not in images[registryId].keys(): 24 | images[registryId][repository] = set() 25 | if imageDigest not in images[registryId][repository]: 26 | images[registryId][repository].add(imageDigest) 27 | 28 | s3 = boto3.client('s3') 29 | reports = [] 30 | response = s3.list_objects_v2( 31 | Bucket=BUCKET 32 | ) 33 | for k in response['Contents']: 34 | reports.append(k['Key']) 35 | continuationToken = None 36 | if response['IsTruncated']: 37 | continuationToken = response['NextContinuationToken'] 38 | while continuationToken is not None: 39 | response = s3.list_objects_v2( 40 | Bucket=BUCKET, 41 | ContinuationToken=continuationToken 42 | ) 43 | for k in response['Contents']: 44 | reports.append(k['Key']) 45 | continuationToken = None 46 | if response['IsTruncated']: 47 | continuationToken = response['NextContinuationToken'] 48 | 49 | to_delete = [] 50 | for key in reports: 51 | # value='year=2019/month=08/day=09/registry_id=434313288222/prod/workflow-api/457531f2efe6475baef56af1248930f46bc8b7992bedfb072248fc8ec38250b6.json.gz' 52 | value = key 53 | value = value.split('/', 1)[1] 54 | value = value.split('/', 1)[1] 55 | value = value.split('/', 1)[1] 56 | values = value.split('/', 1) 57 | registry_id = values[0].split('registry_id=')[1] 58 | value = values[1] 59 | values = value.split('/') 60 | repository = '/'.join(values[:-1]) 61 | image_digest = values[-1].split('.json.gz')[0] 62 | # print(registry_id) 63 | # print(repo_name) 64 | # print(image_digest) 65 | delete = True 66 | if not (registry_id in images and repository in images[registry_id] and image_digest in images[registry_id][repository]): 67 | to_delete.append(key) 68 | 69 | print("Deleting s3 reports:") 70 | for k in to_delete: 71 | print(k) 72 | s3.delete_object( 73 | Bucket=BUCKET, 74 | Key=k 75 | ) 76 | 77 | 78 | def should_delete_from_db(item, images): 79 | registryId = item['image_data']['M']['registryId']['S'] 80 | repository = item['image_data']['M']['repositoryName']['S'] 81 | imageDigest = item['image_data']['M']['imageId']['M']['imageDigest']['S'] 82 | imageDigest = imageDigest.split('sha256:')[1] 83 | exists = registryId in images and repository in images[registryId] and imageDigest in images[registryId][repository] 84 | return not exists 85 | 86 | 87 | to_delete = [] 88 | db = boto3.client('dynamodb') 89 | response = db.scan( 90 | TableName='clair-indexed-layers', 91 | ConsistentRead=True 92 | ) 93 | for item in response['Items']: 94 | if should_delete_from_db(item, images): 95 | to_delete.append({ 96 | 'layer_name': item['layer_name']['S'], 97 | 'image_name': item['image_name']['S'] 98 | }) 99 | last_evaluated_key = None 100 | if 'LastEvaluatedKey' in response: 101 | last_evaluated_key = response['LastEvaluatedKey'] 102 | while last_evaluated_key is not None: 103 | response = db.scan( 104 | TableName='clair-indexed-layers', 105 | ConsistentRead=True, 106 | ExclusiveStartKey=last_evaluated_key 107 | ) 108 | if should_delete_from_db(item, images): 109 | to_delete.append({ 110 | 'layer_name': item['layer_name']['S'], 111 | 'image_name': item['image_name']['S'] 112 | }) 113 | last_evaluated_key = None 114 | if 'LastEvaluatedKey' in response: 115 | last_evaluated_key = response['LastEvaluatedKey'] 116 | 117 | print("delete dynamodb records:") 118 | for item in to_delete: 119 | print(item) 120 | db.delete_item( 121 | TableName='clair-indexed-layers', 122 | Key={ 123 | 'layer_name': { 124 | 'S': item['layer_name'] 125 | }, 126 | 'image_name': { 127 | 'S': item['image_name'] 128 | } 129 | } 130 | ) 131 | -------------------------------------------------------------------------------- /config.yaml: -------------------------------------------------------------------------------- 1 | --- 2 | clair: 3 | database: 4 | # Database driver 5 | type: pgsql 6 | options: 7 | # PostgreSQL Connection string 8 | # https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-CONNSTRING 9 | source: host=${host} dbname=${dbname} user=${user} password=${password} 10 | 11 | # Number of elements kept in the cache 12 | # Values unlikely to change (e.g. namespaces) are cached in order to save prevent needless roundtrips to the database. 13 | cachesize: 16384 14 | 15 | # 32-bit URL-safe base64 key used to encrypt pagination tokens 16 | # If one is not provided, it will be generated. 17 | # Multiple clair instances in the same cluster need the same value. 18 | paginationkey: "XxoPtCUzrUv4JV5dS+yQ+MdW7yLEJnRMwigVY/bpgtQ=" 19 | 20 | api: 21 | # v3 grpc/RESTful API server address 22 | addr: "0.0.0.0:6060" 23 | 24 | # Health server address 25 | # This is an unencrypted endpoint useful for load balancers to check to healthiness of the clair server. 26 | healthaddr: "0.0.0.0:6061" 27 | 28 | # Deadline before an API request will respond with a 503 29 | timeout: 900s 30 | 31 | # Optional PKI configuration 32 | # If you want to easily generate client certificates and CAs, try the following projects: 33 | # https://github.com/coreos/etcd-ca 34 | # https://github.com/cloudflare/cfssl 35 | servername: 36 | cafile: 37 | keyfile: 38 | certfile: 39 | 40 | worker: 41 | namespace_detectors: 42 | - os-release 43 | - lsb-release 44 | - apt-sources 45 | - alpine-release 46 | - redhat-release 47 | 48 | feature_listers: 49 | - apk 50 | - dpkg 51 | - rpm 52 | 53 | updater: 54 | # Frequency the database will be updated with vulnerabilities from the default data sources 55 | # The value 0 disables the updater entirely. 56 | interval: 5m 57 | enabledupdaters: 58 | - debian 59 | - ubuntu 60 | - rhel 61 | - oracle 62 | - alpine 63 | 64 | notifier: 65 | # Number of attempts before the notification is marked as failed to be sent 66 | attempts: 30 67 | 68 | # Duration before a failed notification is retried 69 | renotifyinterval: 2h 70 | 71 | http: 72 | # Optional endpoint that will receive notifications via POST requests 73 | endpoint: http://notification-endpoint:3000/notify 74 | 75 | # Optional PKI configuration 76 | # If you want to easily generate client certificates and CAs, try the following projects: 77 | # https://github.com/cloudflare/cfssl 78 | # https://github.com/coreos/etcd-ca 79 | servername: 80 | cafile: 81 | keyfile: 82 | certfile: 83 | 84 | # Optional HTTP Proxy: must be a valid URL (including the scheme). 85 | proxy: -------------------------------------------------------------------------------- /ecr-cve-monitor.md: -------------------------------------------------------------------------------- 1 | With cyber attacks on the rise against Higher Education institutions, it is critical to be able to detect vulnerabilities in the docker images that may run your applications. This involves not just vulnerability scanning of the applications, but of the OS packages installed in a docker image as well. 2 | 3 | The ecr-cve-monitor project (https://github.com/sriddell/ecr-cve-monitor) is an open-source proof-of-concept designed to fill the OS/package vulnerability scanning space for docker images stored in ECR. It is based on Clair (https://github.com/coreos/clair) and Klar (https://github.com/sriddell/klar), and designed specifically for use with ECR. Any images pushed to a repository in an ECR will be automatically scanned and have a report generated for them. Any new CVEs that come in that affect an already scanned image will trigger the creation of an updated report. 4 | 5 | Reports are stored as gzip compressed JSON files in time-series in S3, making it easy to query for images with CVEs via AWS Athena. 6 | 7 | ecr-cve-monitor is message-based. All operations are passed as messages on an SQS queue to provide automatic retries, with a final dead-letter queue. 8 | 9 | Clair itself functions by 'indexing' all layers in a docker image for 'features', and then stores those features in postgres. If a new CVE comes in to Clair that affects a layer Clair has already indexed, it issues a notification to the custom ecr-cve-monitor notification endpoint, which converts it to a rescan message on the input queue. 10 | 11 | If a new image is pushed to ECR, CloudTrail generates a CloudWatch event, which triggers a small lambda function that puts a scan image message on the input queue for the new image. 12 | 13 | Thus, new images are automatically added to those monitored, and existing images that are affected by new CVEs can be identified. 14 | 15 | To bootstrap a new installation some simple python scripts are provided to generate and load 'ScanImage' messages to the pending scan queue for all existing images in a given ECR. The installation can also be temporarily scaled up during the initial load to reduce the time it takes to index an existing ECR containing many images. 16 | 17 | Although it has only been tested with a single registry so far, it was designed to handle multiple registries and regions, assuming you set up the necessary cross-account permissions to allow the account ecr-cve-monitor is deployed in to be able to pull all images in the other account's ECR. 18 | 19 | Any time an image is scanned, either because it was just pushed to a repo, or because Clair detected that a layer in the image is affected by a new CVE, ecr-cve-monitor generates a new JSON report of all vulnerabilities in that image and stores the result in s3 under a year/month/day time-series scheme. For a given day, only 1 report at most will exist for an image. 20 | 21 | AWS Athena can then be used to generate reports such as 'show me any image with 1 or more high level CVEs'. Or 'show me any images with new high CVE vulnerabilities in the last 2 days'. The time-series storage also allows only a small amount of the data to be loaded into an Athena partition, so you can scan a small subset of the data for any new vulnerabilities in the last 24 hours, for example. 22 | 23 | Note that Clair does not recognize or track images directly - it only scans and knows about layers. Software that uses Clair (in ecr-cve-monitor, the clair scanner) is responsible for sending in each layer with a unique id. The clair scanner is then responsible for tracking which layers are present in which docker images in the ECR. ecr-cve-monitor accomplishes this by mapping the unique layer ID to the image as identified by its unique registry ID in a dynamo DB table. 24 | 25 | Note that uniquely identifying images is a bit confusing. Images have an internal sha256 identifier, but this is not the address of the image in ECR. ECR appears to assign images a unique sha256 ID, separate from the image sha256. This is the true unique ID within an ECR. Docker tags are mutable, and thus cannot be tracked because they could change over time. 26 | 27 | So the reports are generated in terms of the ECR identifier, repo id, and registry sha256 ID (which permanently identifies an image in an ECR repository). 28 | 29 | A second layer of reporting would be necessary to translate Athena query results into images using the human-friendly tags assigned to that image, although it would only be guaranteed accurate at the time of the report, as the tags (particularly the 'latest' tag) can change. 30 | 31 | Preventing an image with vulnerabilities being pushed into ECR requires a different technique that can scan an image before it is pushed and fail earlier in the CI/CD pipeline. There is already an open source project to facilitate this, called clair-local-scan (https://github.com/arminc/clair-local-scan). This project generates daily docker images of the Clair database already fully loaded with current CVE vulnerabilities that can be used to pre-scan images before pushing them to ECR. 32 | 33 | ecr-cve-monitor just tells you about OS level CVE vulnerabilities in your containers. To fully put it into effect, you need to decide for your organization how to deal with images identified as vulnerable. This could involve quarantining them so they can't be further deployed, blocking deployment of them via your CI/CD pipeline, or further reporting on however you run your containers (ECS, EKS, etc) to identify vulnerable images that are actively running. The optimal mix of techniques of course depends on your SLAs, sensitivity to downtime, and the severity of a new CVE that is detected in an already running image. 34 | 35 | To run or experiment with the ecr-cve-monitor project, visit https://github.com/sriddell/ecr-cve-monitor and view the README.md for details on installations, operation, and underlying architecture. 36 | 37 | Please also visit the projects that power and make ECR monitoring possible: 38 | 39 | Clair: https://github.com/coreos/clair 40 | Klar: https://github.com/optiopay/klar or the fork modified specifically for ECR https://github.com/sriddell/klar 41 | 42 | 43 | 44 | -------------------------------------------------------------------------------- /ecr-cve-monitor.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sriddell/ecr-cve-monitor/b820efea1efcc9f8a266c3fac7fb88097f05425f/ecr-cve-monitor.png -------------------------------------------------------------------------------- /handler.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import boto3 4 | 5 | 6 | def put_image(event, context): 7 | endpoint = None 8 | if 'SQS_ENDPOINT' in os.environ: 9 | endpoint = os.environ['SQS_ENDPOINT'] 10 | sqs = boto3.resource('sqs', region_name=os.environ['REGION'], endpoint_url=endpoint) 11 | queue = sqs.Queue(os.environ['SQS_QUEUE_URL']) 12 | msg = {'CloudWatchEvent': event} 13 | queue.send_message(MessageBody=json.dumps(msg)) 14 | 15 | return { 16 | 'statusCode': 200, 17 | 'body': 'Queued message' 18 | } 19 | -------------------------------------------------------------------------------- /list_repos.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import json 3 | 4 | registries = ['434313288222'] 5 | existing_repos = [] 6 | ecr = boto3.client('ecr') 7 | for r in registries: 8 | response = ecr.describe_repositories( 9 | registryId=str(r) 10 | ) 11 | for repo in response['repositories']: 12 | t = (r, repo['repositoryName']) 13 | if t not in existing_repos: 14 | existing_repos.append(t) 15 | nextToken = None 16 | if 'nextToken' in response.keys(): 17 | nextToken = response['nextToken'] 18 | while nextToken is not None: 19 | response = ecr.describe_repositories( 20 | registryId=r, 21 | nextToken=nextToken 22 | ) 23 | for repo in response['repositories']: 24 | t = (r, repo['repositoryName']) 25 | if t not in existing_repos: 26 | existing_repos.append(t) 27 | nextToken = None 28 | if 'nextToken' in response.keys(): 29 | nextToken = response['nextToken'] 30 | 31 | existing_images = [] 32 | for repo in existing_repos: 33 | response = ecr.describe_images( 34 | registryId=str(repo[0]), 35 | repositoryName=str(repo[1]) 36 | ) 37 | for image in response['imageDetails']: 38 | t = (r, repo[1], image['imageDigest']) 39 | if t not in existing_images: 40 | existing_images.append(t) 41 | nextToken = None 42 | if 'nextToken' in response.keys(): 43 | nextToken = response['nextToken'] 44 | while nextToken is not None: 45 | response = ecr.describe_images( 46 | registryId=repo[0], 47 | repositoryName=repo[1], 48 | nextToken=nextToken 49 | ) 50 | for image in response['imageDetails']: 51 | t = (r, repo[1], image['imageDigest']) 52 | if t not in existing_images: 53 | existing_images.append(t) 54 | nextToken = None 55 | if 'nextToken' in response.keys(): 56 | nextToken = response['nextToken'] 57 | 58 | images = [] 59 | for t in existing_images: 60 | images.append({ 61 | 'registryId': t[0], 62 | 'repository': t[1], 63 | 'imageDigest': t[2] 64 | }) 65 | 66 | print(json.dumps(images)) 67 | -------------------------------------------------------------------------------- /main.tf: -------------------------------------------------------------------------------- 1 | terraform { 2 | backend "s3" { 3 | } 4 | } 5 | 6 | provider "aws" { 7 | version = "~> 2.19.0" 8 | } 9 | 10 | provider "null" { 11 | version = "~> 2.1.2" 12 | } 13 | 14 | provider "template" { 15 | version = "~> 2.1.2" 16 | } 17 | 18 | data "aws_region" "current" { 19 | } 20 | 21 | #temp; we expect the vpc to be created externally 22 | module "vpc" { 23 | source = "git::https://github.com/sriddell/terraform-module-standard-vpc.git?ref=1.0.0" 24 | 25 | #source = "/Users/sriddell/working/titan/terraform-module-standard-vpc" 26 | aws_region = data.aws_region.current.name 27 | service = var.service 28 | environment = var.environment 29 | costcenter = var.costcenter 30 | poc = var.poc 31 | key_name = var.key_name 32 | az = "us-east-1d,us-east-1e" 33 | enable_bastion = "0" 34 | } 35 | 36 | module "cluster" { 37 | source = "git::https://github.com/sriddell/terraform-module-ecs-cluster.git?ref=2.0.2" 38 | vpc_cidr_block = module.vpc.vpc_cidr_block 39 | environment = var.environment 40 | costcenter = var.costcenter 41 | poc = var.poc 42 | cluster_name = "${var.environment}-ecr-cve-monitor" 43 | key_name = var.key_name 44 | ami_id = var.ecs_ami_id 45 | vpc_id = module.vpc.vpc_id 46 | private_subnets = join(",", module.vpc.private_subnets) 47 | container_instance_sec_group_ids = [] 48 | instance_type = var.instance_type 49 | asg_desired_capacity = var.number_of_ecs_instances 50 | asg_max_size = var.number_of_ecs_instances 51 | asg_min_size = "0" 52 | } 53 | 54 | resource "aws_vpc_endpoint" "s3" { 55 | vpc_id = module.vpc.vpc_id 56 | service_name = "com.amazonaws.us-east-1.s3" 57 | } 58 | 59 | output "private_subnets" { 60 | value = module.vpc.private_subnets 61 | } 62 | 63 | output "public_subnets" { 64 | value = module.vpc.public_subnets 65 | } 66 | 67 | output "vpc_id" { 68 | value = module.vpc.vpc_id 69 | } 70 | 71 | output "vpc_cidr_block" { 72 | value = module.vpc.vpc_cidr_block 73 | } 74 | 75 | resource "aws_sqs_queue" "dead_letter" { 76 | name = "${var.prefix}-clair-dead-letter" 77 | delay_seconds = 0 78 | message_retention_seconds = 1209600 79 | tags = { 80 | Environment = var.environment 81 | Service = var.service 82 | CostCenter = var.costcenter 83 | POC = var.poc 84 | } 85 | } 86 | 87 | resource "aws_sqs_queue" "queue" { 88 | name = "${var.prefix}-clair-index-requests" 89 | delay_seconds = 0 90 | message_retention_seconds = 1209600 91 | redrive_policy = "{\"deadLetterTargetArn\":\"${aws_sqs_queue.dead_letter.arn}\",\"maxReceiveCount\":4}" 92 | 93 | tags = { 94 | Environment = var.environment 95 | Service = var.service 96 | CostCenter = var.costcenter 97 | POC = var.poc 98 | } 99 | } 100 | 101 | output "input_queue" { 102 | value = aws_sqs_queue.queue.id 103 | } 104 | 105 | resource "aws_s3_bucket" "bucket" { 106 | bucket = "${var.prefix}-clair-scan-results" 107 | acl = "private" 108 | 109 | tags = { 110 | Environment = var.environment 111 | Service = var.service 112 | CostCenter = var.costcenter 113 | POC = var.poc 114 | } 115 | } 116 | 117 | output "report_bucket" { 118 | value = aws_s3_bucket.bucket.id 119 | } 120 | 121 | # DB Subnet group to put in RDS database in vpc 122 | resource "aws_db_subnet_group" "clair" { 123 | name = "clair-db-subnet" 124 | # TF-UPGRADE-TODO: In Terraform v0.10 and earlier, it was sometimes necessary to 125 | # force an interpolation expression to be interpreted as a list by wrapping it 126 | # in an extra set of list brackets. That form was supported for compatibilty in 127 | # v0.11, but is no longer supported in Terraform v0.12. 128 | # 129 | # If the expression in the following list itself returns a list, remove the 130 | # brackets to avoid interpretation as a list of lists. If the expression 131 | # returns a single list item then leave it as-is and remove this TODO comment. 132 | subnet_ids = "${module.vpc.private_subnets}" 133 | 134 | tags = { 135 | Name = "clair-db-subnet" 136 | Environment = var.environment 137 | Service = var.service 138 | CostCenter = var.costcenter 139 | POC = var.poc 140 | } 141 | } 142 | 143 | #Create a security group for RDS acccess 144 | resource "aws_security_group" "allow-db" { 145 | name = "allow_clair_db" 146 | description = "Allow all inbound traffic from db processes" 147 | vpc_id = module.vpc.vpc_id 148 | 149 | ingress { 150 | from_port = 5432 151 | to_port = 5432 152 | protocol = "tcp" 153 | # TF-UPGRADE-TODO: In Terraform v0.10 and earlier, it was sometimes necessary to 154 | # force an interpolation expression to be interpreted as a list by wrapping it 155 | # in an extra set of list brackets. That form was supported for compatibilty in 156 | # v0.11, but is no longer supported in Terraform v0.12. 157 | # 158 | # If the expression in the following list itself returns a list, remove the 159 | # brackets to avoid interpretation as a list of lists. If the expression 160 | # returns a single list item then leave it as-is and remove this TODO comment. 161 | cidr_blocks = [module.vpc.vpc_cidr_block] 162 | } 163 | } 164 | 165 | resource "random_string" "postgres_password" { 166 | length = 16 167 | special = false 168 | } 169 | 170 | # Postgres RDS database 171 | resource "aws_db_instance" "default" { 172 | identifier = "clair-db" 173 | allocated_storage = 10 174 | storage_type = "gp2" 175 | engine = "postgres" 176 | engine_version = "10.6" 177 | instance_class = "db.t2.small" 178 | name = "ClairDb" 179 | username = "postgres" 180 | password = random_string.postgres_password.result 181 | db_subnet_group_name = aws_db_subnet_group.clair.name 182 | skip_final_snapshot = true 183 | vpc_security_group_ids = [aws_security_group.allow-db.id] 184 | 185 | tags = { 186 | Name = "${var.service}-clair-db" 187 | Environment = var.environment 188 | Service = var.service 189 | CostCenter = var.costcenter 190 | POC = var.poc 191 | } 192 | } 193 | 194 | resource "aws_dynamodb_table" "indexed-layers" { 195 | name = "clair-indexed-layers" 196 | billing_mode = "PAY_PER_REQUEST" 197 | read_capacity = 2 198 | write_capacity = 100 199 | hash_key = "layer_name" 200 | range_key = "image_name" 201 | 202 | attribute { 203 | name = "layer_name" 204 | type = "S" 205 | } 206 | attribute { 207 | name = "image_name" 208 | type = "S" 209 | } 210 | 211 | tags = { 212 | Name = "${var.service}-clair-db" 213 | Environment = var.environment 214 | Service = var.service 215 | CostCenter = var.costcenter 216 | POC = var.poc 217 | } 218 | } 219 | 220 | data "template_file" "clair-config" { 221 | template = file("config.yaml") 222 | vars = { 223 | host = aws_db_instance.default.address 224 | dbname = aws_db_instance.default.name 225 | user = "postgres" 226 | password = random_string.postgres_password.result 227 | } 228 | } 229 | 230 | resource "aws_ssm_parameter" "clair-db-connect-string" { 231 | name = "/${var.service}/clair-config.yaml" 232 | description = "The database connection string for the Clair DB" 233 | type = "SecureString" 234 | value = base64encode(data.template_file.clair-config.rendered) 235 | 236 | tags = { 237 | Name = "clair-db-connect-string" 238 | Environment = var.environment 239 | Service = var.service 240 | CostCenter = var.costcenter 241 | POC = var.poc 242 | } 243 | } 244 | 245 | data "aws_iam_policy_document" "clair" { 246 | # Can fetch secrets 247 | statement { 248 | actions = ["ssm:GetParameter"] 249 | 250 | resources = [ 251 | aws_ssm_parameter.clair-db-connect-string.arn, 252 | ] 253 | 254 | effect = "Allow" 255 | } 256 | 257 | statement { 258 | actions = [ 259 | "ecr:GetAuthorizationToken", 260 | "ecr:BatchCheckLayerAvailability", 261 | "ecr:GetDownloadUrlForLayer", 262 | "ecr:GetRepositoryPolicy", 263 | "ecr:DescribeRepositories", 264 | "ecr:ListImages", 265 | "ecr:DescribeImages", 266 | "ecr:BatchGetImage", 267 | ] 268 | resources = ["*"] 269 | effect = "Allow" 270 | } 271 | 272 | statement { 273 | actions = ["sqs:*"] 274 | 275 | resources = [ 276 | aws_sqs_queue.queue.arn, 277 | "${aws_sqs_queue.queue.arn}/*", 278 | ] 279 | 280 | effect = "Allow" 281 | } 282 | } 283 | 284 | resource "aws_iam_policy" "clair" { 285 | name = "clair" 286 | policy = data.aws_iam_policy_document.clair.json 287 | } 288 | 289 | resource "aws_iam_role" "clair" { 290 | name = "clair" 291 | 292 | assume_role_policy = <, 84 | manifest:struct>, 85 | repositoryName:string, 86 | registryId:string 87 | >, 88 | 89 | Vulnerabilities struct< High:array>, 94 | Medium:array>, 99 | Medium:array>, 104 | Medium:array>, 109 | Low:array>, 114 | Medium:array>, 119 | Negligible:array>, 124 | Medium:array> 129 | > 130 | ) 131 | PARTITIONED BY(year string, month string, day string) 132 | ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://ecrscan-clair-scan-results/' 133 | ''' 134 | drop_table = 'DROP TABLE `' + table_name + '`;' 135 | 136 | 137 | execute_query(table_def) 138 | 139 | if partitions is None: 140 | execute_query("MSCK REPAIR TABLE " + table_name) # load all data 141 | else: 142 | for partition in partitions: 143 | add_partition = "ALTER TABLE " + table_name + " ADD PARTITION (year='" + partition['year'] + "',month='" + partition['month'] + "',day='" + partition['day'] + "') location 's3://ecrscan-clair-scan-results/year=" + partition['year'] + "/month=" + partition['month'] + "/day=" + partition['day'] + "/'" 144 | execute_query(add_partition) 145 | 146 | 147 | query_string = "select distinct ECRMetadata.registryId, ECRMetadata.repositoryName, ECRMetadata.imageId.imageDigest from " + table_name + " where cardinality(vulnerabilities.High) > 0 order by ECRMetadata.registryId, ECRMetadata.repositoryName, ECRMetadata.imageId.imageDigest;" 148 | q_execution_id = execute_query(query_string) 149 | execute_query(drop_table) 150 | 151 | s3_key = q_execution_id + '.csv' 152 | local_filename = q_execution_id + '.csv' 153 | s3 = boto3.resource('s3') 154 | try: 155 | s3.Bucket('ecr-clair-scan-results').download_file(s3_key, local_filename) 156 | except botocore.exceptions.ClientError as e: 157 | if e.response['Error']['Code'] == "404": 158 | print("The object does not exist.") 159 | else: 160 | raise 161 | 162 | # read file to array 163 | vulnerable_images = [] 164 | with open(local_filename) as csvfile: 165 | reader = csv.DictReader(csvfile) 166 | for row in reader: 167 | vulnerable_images.append((row['registryid'], row['repositoryname'], row['imagedigest'])) 168 | # delete result file 169 | if os.path.isfile(local_filename): 170 | os.remove(local_filename) 171 | repos = set() 172 | registries = set() 173 | for row in vulnerable_images: 174 | t = (row[0], row[1]) 175 | if t not in repos: 176 | repos.add(t) 177 | if row[0] not in registries: 178 | registries.add(row[0]) 179 | 180 | details = {} 181 | for k in repos: 182 | try: 183 | response = ecr.describe_images( 184 | registryId=k[0], 185 | repositoryName=k[1] 186 | ) 187 | 188 | update_details(vulnerable_images, k, details, response['imageDetails']) 189 | nextToken = None 190 | if 'nextToken' in response.keys(): 191 | nextToken = response['nextToken'] 192 | while nextToken is not None: 193 | response = ecr.describe_images( 194 | registryId=k[0], 195 | repositoryName=k[1], 196 | nextToken=nextToken 197 | ) 198 | update_details(vulnerable_images, k, details, response['imageDetails']) 199 | nextToken = None 200 | if 'nextToken' in response.keys(): 201 | nextToken = response['nextToken'] 202 | except botocore.exceptions.ClientError: 203 | # ideally, we would list all repos, then filter out reports for repos which have been deleted 204 | # unfortunately, listing all repos cross account doesn't seem to be working; have reached out to 205 | # aws on this 206 | continue 207 | 208 | # Note that the tags map may contain fewer images than generated in the report, this is because 209 | # an ecr image may have been deleted after it was scanned. 210 | report = { 211 | 'partitions': partitions, 212 | 'high_vulnerabilities': [] 213 | } 214 | report['high_vulnerabilities'] = [] 215 | for k in details.keys(): 216 | out = { 217 | 'registryId': k[0], 218 | 'repositoryName': k[1], 219 | 'imageId': k[2], 220 | 'tags': details[k]['tags'], 221 | 'imagePushedAt': details[k]['imagePushedAt'] 222 | } 223 | report['high_vulnerabilities'].append(out) 224 | 225 | report['high_vulnerabilities'] = list(filter(lambda x: (x['imagePushedAt'] >= cutoff), report['high_vulnerabilities'])) 226 | report['high_vulnerabilities'].sort(key=lambda x: x['imagePushedAt'], reverse=True) 227 | print(json.dumps(report, default=date_handler)) 228 | -------------------------------------------------------------------------------- /reque.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | 3 | sqs = boto3.resource('sqs') 4 | queue = sqs.Queue('https://sqs.us-east-1.amazonaws.com/234324814398/ecrscan-clair-dead-letter') 5 | to_queue = sqs.Queue('https://sqs.us-east-1.amazonaws.com/234324814398/ecrscan-clair-index-requests') 6 | while True: 7 | msgs = queue.receive_messages( 8 | VisibilityTimeout=20 * 60, 9 | WaitTimeSeconds=20 10 | ) 11 | if len(msgs) > 0: 12 | for msg in msgs: 13 | to_queue.send_message(MessageBody=msg.body) 14 | msg.delete() 15 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | boto3 2 | -------------------------------------------------------------------------------- /variables.tf: -------------------------------------------------------------------------------- 1 | variable "service" { 2 | } 3 | 4 | variable "environment" { 5 | } 6 | 7 | variable "costcenter" { 8 | } 9 | 10 | variable "poc" { 11 | } 12 | 13 | variable "ecs_ami_id" { 14 | } 15 | 16 | variable "key_name" { 17 | } 18 | 19 | variable "number_of_scanners" { 20 | default = 1 21 | } 22 | 23 | variable "number_of_ecs_instances" { 24 | default = 1 25 | } 26 | 27 | variable "instance_type" { 28 | } 29 | 30 | variable "prefix" { 31 | } 32 | 33 | variable "number_of_clair_instances" { 34 | default = 1 35 | } 36 | 37 | # variable "private_subnet_ids" { 38 | # type = "list" 39 | # } 40 | #variable "vpc_id" {} 41 | -------------------------------------------------------------------------------- /versions.tf: -------------------------------------------------------------------------------- 1 | 2 | terraform { 3 | required_version = ">= 0.12" 4 | } 5 | --------------------------------------------------------------------------------