├── .gitignore ├── LICENSE.txt ├── README.md ├── example ├── csp_server.py └── index.html ├── index.js └── template.yaml /.gitignore: -------------------------------------------------------------------------------- 1 | packaged-template.yaml -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | Copyright 2017 Michael Banfield 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 4 | 5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 6 | 7 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Serverless CSP Report To 2 | 3 | Serverless CSP violation reporting server that streams reports to a S3 data lake, and enables easy querying using Athena. 4 | 5 | This application has the following components 6 | 7 | * A simple API Gateway endpoint that accepts CSP violations 8 | * Validates and cleans submitted reports 9 | * Publishes the reports to Kinesis Firehose 10 | * Batch writes the reports into S3 11 | * Creates a AWS Glue table on top of the S3 data for simple querying through Athena 12 | 13 | # Usage 14 | 15 | This application uses AWS SAM, a simple framework for deploying serverless applications 16 | 17 | ## Prerequisites 18 | * AWS CLI 19 | * Local IAM user with permissions for cloudformation etc 20 | 21 | First clone this repository 22 | 23 | ``` 24 | git clone git@github.com:michaelbanfield/serverless-csp-report-to.git 25 | ``` 26 | 27 | Then create an S3 bucket to store the code 28 | 29 | Then run 30 | 31 | ``` 32 | aws cloudformation package \ 33 | --template-file template.yaml \ 34 | --s3-bucket \ 35 | --output-template-file packaged-template.yaml 36 | 37 | aws cloudformation deploy --template-file /Users/michaelbanfield/dev/js/serverless-csp-report-to/packaged-template.yaml --stack-name CSPReporter --capabilities CAPABILITY_IAM 38 | ``` 39 | 40 | Once cloudformation finishes you can get the CSP url with this command 41 | 42 | ``` 43 | aws cloudformation describe-stacks --query "Stacks[0].Outputs[0].OutputValue" --output text --stack-name CSPReporter 44 | ``` 45 | 46 | Then just simply add this URL to the report-to/report-uri section of your CSP header 47 | 48 | ## Trying it out 49 | 50 | Optionally to test this out quickly with some real data 51 | 52 | ``` 53 | cd example 54 | python csp_server.py $(aws cloudformation describe-stacks --query "Stacks[0].Outputs[0].OutputValue" --output text --stack-name CSPReporter) 55 | ``` 56 | 57 | Visit http://localhost:31338/ from your browser, this should generate some reports 58 | 59 | Wait for around 60 seconds then go to the Glue AWS console, press on Crawlers, tick csp_reports_crawler and select Run Crawler 60 | 61 | Once this is finished you can go to the Athena Console and run 62 | 63 | ``` 64 | SELECT * FROM "csp_reports"."v1" limit 10; 65 | 66 | ``` 67 | 68 | From here you can explore the data using standard SQL. 69 | 70 | # Next Steps 71 | 72 | ## Cost 73 | For cost saving purposes the Glue crawler has no schedule defined, the monthly cost of an hourly crawler (~$50) is not really warranted for most use cases. 74 | 75 | This means you cant take advantage of partitions, which can make your queries much faster and cheaper for larger datasets (ie if you only need reports from a particular hour, you only pay for scanning that hour). If you would rather take advantage of partitions, just set up a schedule that works for you from the Glue console. Hourly will ensure you can always query the latest data. 76 | 77 | If you would rather save money, and your dataset is fairly small, you will need to manually delete the partition_0, partition_1 etc columns manually through the Glue console. 78 | 79 | The rest of the application should be low/no cost, especially on the free tier. You should still keep an eye on your AWS bill, setting an alarm or similar as the report URL is unauthenticated, and you could recieve malicious traffic driving up the various costs. 80 | 81 | A further cost saving would be dialing up the buffer variables (size and time) in kinesis firehose to the maximum. This can be done through the UI or template.yaml. 82 | 83 | Finally settings up a lifecycle rule to delete reports after X days is a simple way to reduce cost. 84 | 85 | ## Table improvements 86 | 87 | Glue cant detect that the timestamp field is a timestamp, to enable date functions on this field just manually change the datatype to TIMESTAMP in the Glue console. 88 | 89 | 90 | # TODO/Improvements 91 | 92 | * Switch from GZIP to Snappy compression, this is better for a data lake however Glue cant seem to scan it correctly 93 | * Move from a crawler to a table defined in cloud formation - this would solve the snappy problem as well as some other limitations 94 | 95 | 96 | -------------------------------------------------------------------------------- /example/csp_server.py: -------------------------------------------------------------------------------- 1 | import SimpleHTTPServer 2 | import sys 3 | 4 | # Taken from https://gist.github.com/enjalot/2904124 5 | class CORSHTTPRequestHandler(SimpleHTTPServer.SimpleHTTPRequestHandler): 6 | def send_head(self): 7 | """Common code for GET and HEAD commands. 8 | 9 | This sends the response code and MIME headers. 10 | 11 | Return value is either a file object (which has to be copied 12 | to the outputfile by the caller unless the command was HEAD, 13 | and must be closed by the caller under all circumstances), or 14 | None, in which case the caller has nothing further to do. 15 | 16 | """ 17 | path = self.translate_path(self.path) 18 | f = None 19 | if os.path.isdir(path): 20 | if not self.path.endswith('/'): 21 | # redirect browser - doing basically what apache does 22 | self.send_response(301) 23 | self.send_header("Location", self.path + "/") 24 | self.end_headers() 25 | return None 26 | for index in "index.html", "index.htm": 27 | index = os.path.join(path, index) 28 | if os.path.exists(index): 29 | path = index 30 | break 31 | else: 32 | return self.list_directory(path) 33 | ctype = self.guess_type(path) 34 | try: 35 | # Always read in binary mode. Opening files in text mode may cause 36 | # newline translations, making the actual size of the content 37 | # transmitted *less* than the content-length! 38 | f = open(path, 'rb') 39 | except IOError: 40 | self.send_error(404, "File not found") 41 | return None 42 | self.send_response(200) 43 | self.send_header("Content-type", ctype) 44 | fs = os.fstat(f.fileno()) 45 | self.send_header("Content-Length", str(fs[6])) 46 | self.send_header("Last-Modified", self.date_time_string(fs.st_mtime)) 47 | self.send_header("Content-Security-Policy", "default-src none; report-uri " + sys.argv[1]) 48 | self.send_header("Access-Control-Allow-Origin", "*") 49 | self.end_headers() 50 | return f 51 | 52 | 53 | if __name__ == "__main__": 54 | import os 55 | import SocketServer 56 | 57 | PORT = 31338 58 | 59 | Handler = CORSHTTPRequestHandler 60 | #Handler = SimpleHTTPServer.SimpleHTTPRequestHandler 61 | 62 | httpd = SocketServer.TCPServer(("", PORT), Handler) 63 | 64 | print "serving at port", PORT 65 | httpd.serve_forever() 66 | -------------------------------------------------------------------------------- /example/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | CSP Tester 5 | 6 | 7 | 8 | 9 | 10 | 11 |

Hello, world!

12 | 13 | 14 | 15 | -------------------------------------------------------------------------------- /index.js: -------------------------------------------------------------------------------- 1 | 'use strict'; 2 | 3 | 4 | const AWS = require('aws-sdk'); 5 | 6 | const firehose = new AWS.Firehose(); 7 | 8 | function js_yyyy_mm_dd_hh_mm_ss () { 9 | var now = new Date(); 10 | var year = "" + now.getFullYear(); 11 | var month = "" + (now.getMonth() + 1); if (month.length == 1) { month = "0" + month; } 12 | var day = "" + now.getDate(); if (day.length == 1) { day = "0" + day; } 13 | var hour = "" + now.getHours(); if (hour.length == 1) { hour = "0" + hour; } 14 | var minute = "" + now.getMinutes(); if (minute.length == 1) { minute = "0" + minute; } 15 | var second = "" + now.getSeconds(); if (second.length == 1) { second = "0" + second; } 16 | return year + "-" + month + "-" + day + " " + hour + ":" + minute + ":" + second; 17 | } 18 | 19 | exports.report = (event, context, callback) => { 20 | 21 | const body = JSON.parse(event.body)["csp-report"]; 22 | 23 | 24 | //APIg doesnt seem to support blocking requests with extra fields, so pick out the important fields here 25 | //and camelcase them so they are easier to query 26 | const cspReport = {} 27 | 28 | cspReport.documentUri = body["document-uri"]; 29 | cspReport.violatedDirective = body["violated-directive"]; 30 | cspReport.effectiveDirective = body["effective-directive"]; 31 | cspReport.originalPolicy = body["original-policy"]; 32 | cspReport.blockedUri = body["blocked-uri"]; 33 | cspReport.statusCode = body["status-code"]; 34 | cspReport.timestamp = js_yyyy_mm_dd_hh_mm_ss(); 35 | 36 | 37 | const params = { 38 | DeliveryStreamName: "CSPReports", 39 | Record: { 40 | Data: JSON.stringify(cspReport) + "\n" 41 | } 42 | } 43 | 44 | firehose.putRecord(params, function(err, data) { 45 | if (err) { 46 | console.log(err, err.stack); 47 | } 48 | else { 49 | console.log(data); 50 | } 51 | 52 | //Fail silently 53 | callback(null, { 54 | statusCode: 200 55 | }); 56 | }); 57 | 58 | }; -------------------------------------------------------------------------------- /template.yaml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: '2010-09-09' 2 | Transform: AWS::Serverless-2016-10-31 3 | Description: Serverless API that streams CSP Violation Reports to S3 4 | Outputs: 5 | ReportUrl: 6 | Description: URL to send CSP violations to 7 | Value: !Join 8 | - '' 9 | - - "https://" 10 | - !Ref Api 11 | - ".execute-api." 12 | - !Ref AWS::Region 13 | - ".amazonaws.com" 14 | - "/prod/report" 15 | 16 | Resources: 17 | GlueRole: 18 | Type: AWS::IAM::Role 19 | Properties: 20 | AssumeRolePolicyDocument: 21 | Version: "2012-10-17" 22 | Statement: 23 | - 24 | Effect: "Allow" 25 | Principal: 26 | Service: 27 | - "glue.amazonaws.com" 28 | Action: 29 | - "sts:AssumeRole" 30 | Path: "/" 31 | Policies: 32 | - 33 | PolicyName: "root" 34 | PolicyDocument: 35 | Version: "2012-10-17" 36 | Statement: 37 | - 38 | Effect: "Allow" 39 | Action: "*" 40 | Resource: "*" 41 | Database: 42 | Type: AWS::Glue::Database 43 | Properties: 44 | CatalogId: !Ref AWS::AccountId 45 | DatabaseInput: 46 | Name: "csp_reports" 47 | Crawler: 48 | Type: AWS::Glue::Crawler 49 | Properties: 50 | Name: "csp_reports_crawler" 51 | Role: !GetAtt GlueRole.Arn 52 | DatabaseName: !Ref Database 53 | Targets: 54 | S3Targets: 55 | - Path: !Join 56 | - '/' 57 | - - 's3:/' 58 | - !Ref FirehoseBucket 59 | - 'csp_reports' 60 | - 'v1' 61 | SchemaChangePolicy: 62 | UpdateBehavior: "UPDATE_IN_DATABASE" 63 | DeleteBehavior: "LOG" 64 | FirehoseBucket: 65 | Type: 'AWS::S3::Bucket' 66 | Firehose: 67 | DependsOn: 68 | - DeliveryPolicy 69 | Type: "AWS::KinesisFirehose::DeliveryStream" 70 | Properties: 71 | DeliveryStreamName: CSPReports 72 | DeliveryStreamType: DirectPut 73 | S3DestinationConfiguration: 74 | BucketARN: !Join 75 | - '' 76 | - - 'arn:aws:s3:::' 77 | - !Ref FirehoseBucket 78 | BufferingHints: 79 | IntervalInSeconds: 60 80 | SizeInMBs: 1 81 | #Snappy is a better option for Athena but Glue doesnt seem to be able to detect it 82 | CompressionFormat: GZIP 83 | Prefix: csp_reports/v1/ 84 | RoleARN: !GetAtt 85 | - DeliveryRole 86 | - Arn 87 | DeliveryRole: 88 | Type: 'AWS::IAM::Role' 89 | Properties: 90 | AssumeRolePolicyDocument: 91 | Version: 2012-10-17 92 | Statement: 93 | - Sid: '' 94 | Effect: Allow 95 | Principal: 96 | Service: firehose.amazonaws.com 97 | Action: 'sts:AssumeRole' 98 | Condition: 99 | StringEquals: 100 | 'sts:ExternalId': !Ref AWS::AccountId 101 | DeliveryPolicy: 102 | Type: 'AWS::IAM::Policy' 103 | Properties: 104 | PolicyName: firehose_delivery_policy 105 | PolicyDocument: 106 | Version: 2012-10-17 107 | Statement: 108 | - Effect: Allow 109 | Action: 110 | - 's3:AbortMultipartUpload' 111 | - 's3:GetBucketLocation' 112 | - 's3:GetObject' 113 | - 's3:ListBucket' 114 | - 's3:ListBucketMultipartUploads' 115 | - 's3:PutObject' 116 | Resource: 117 | - !Join 118 | - '' 119 | - - 'arn:aws:s3:::' 120 | - !Ref FirehoseBucket 121 | - !Join 122 | - '' 123 | - - 'arn:aws:s3:::' 124 | - !Ref FirehoseBucket 125 | - '*' 126 | Roles: 127 | - !Ref DeliveryRole 128 | Api: 129 | Type: AWS::Serverless::Api 130 | Properties: 131 | StageName: prod 132 | DefinitionBody: 133 | swagger: '2.0' 134 | info: 135 | title: CSPReportTo 136 | version: 1.0.0 137 | schemes: 138 | - https 139 | basePath: /v1 140 | produces: 141 | - application/json 142 | x-amazon-apigateway-request-validators: 143 | all: 144 | validateRequestBody: true 145 | validateRequestParameters: true 146 | params-only: 147 | validateRequestBody: false 148 | validateRequestParameters: true 149 | paths: 150 | /report: 151 | post: 152 | parameters: 153 | - in: body 154 | name: RequestBodyModel 155 | required: true 156 | schema: 157 | $ref: '#/definitions/RequestBodyModel' 158 | responses: 159 | '200': 160 | description: The report was collected 161 | x-amazon-apigateway-request-validator: all 162 | x-amazon-apigateway-integration: 163 | httpMethod: POST 164 | type: aws_proxy 165 | uri: 166 | Fn::Sub: arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${CollectReport.Arn}/invocations 167 | responses: {} 168 | definitions: 169 | RequestBodyModel: 170 | type: object 171 | additionalProperties: false 172 | required: 173 | - csp-report 174 | properties: 175 | csp-report: 176 | type: object 177 | additionalProperties: false 178 | properties: 179 | document-uri: 180 | type: string 181 | referrer: 182 | type: string 183 | violated-directive: 184 | type: string 185 | effective-directive: 186 | type: string 187 | original-policy: 188 | type: string 189 | blocked-uri: 190 | type: string 191 | status-code: 192 | type: integer 193 | 194 | required: 195 | - document-uri 196 | - original-policy 197 | - violated-directive 198 | - blocked-uri 199 | CollectReport: 200 | Type: AWS::Serverless::Function 201 | Properties: 202 | Handler: index.report 203 | Runtime: nodejs6.10 204 | CodeUri: ./index.js 205 | Policies: AmazonKinesisFirehoseFullAccess 206 | Events: 207 | PostApi: 208 | Type: Api 209 | Properties: 210 | Path: /report 211 | Method: POST 212 | RestApiId: 213 | Ref: Api --------------------------------------------------------------------------------