├── .github └── workflows │ └── github-repo-stats.yml ├── AWSSecurityAnalyticsBootstrap ├── amazon_security_lake_queries │ ├── README.md │ └── ocsf │ │ ├── amazon_security_lake_queries_all.md │ │ ├── amazon_security_lake_queries_cloudtrail_lambda.md │ │ ├── amazon_security_lake_queries_cloudtrail_management.md │ │ ├── amazon_security_lake_queries_route53.md │ │ ├── amazon_security_lake_queries_securityhub.md │ │ └── amazon_security_lake_queries_vpcflow.md ├── cfn │ ├── Athena_IAM_setup.yml │ ├── Athena_infra_setup.yml │ └── VPC_enable_flowlogs.yml ├── docs │ ├── aws_security_analytics_bootstrap_deployment_guide.md │ └── img │ │ └── architecture │ │ ├── MultiAccount.png │ │ ├── SingleAccount.png │ │ └── SingleAccount_BootstrapOnly.png └── sql │ ├── ddl │ └── create_tables │ │ ├── create_cloudtrail_table.sql │ │ ├── create_dnsquerylog_table.sql │ │ └── create_vpcflowlog_table.sql │ └── dml │ └── analytics │ ├── cloudtrail │ └── cloudtrail_demo_queries.sql │ ├── dns │ └── dns_demo_queries.sql │ └── vpcflow │ └── vpcflow_demo_queries.sql ├── CHANGELOG.md ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── NOTICE └── README.md /.github/workflows/github-repo-stats.yml: -------------------------------------------------------------------------------- 1 | name: github-repo-stats 2 | 3 | on: 4 | schedule: 5 | # Run this once per day, towards the end of the day for keeping the most 6 | # recent data point most meaningful (hours are interpreted in UTC). 7 | - cron: "0 23 * * *" 8 | workflow_dispatch: # Allow for running this manually. 9 | 10 | jobs: 11 | j1: 12 | name: github-repo-stats 13 | runs-on: ubuntu-latest 14 | steps: 15 | - name: run-ghrs 16 | # Use latest release. 17 | uses: jgehrcke/github-repo-stats@RELEASE 18 | with: 19 | ghtoken: ${{ secrets.ghrs_github_api_token }} 20 | -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/amazon_security_lake_queries/README.md: -------------------------------------------------------------------------------- 1 | # Amazon Security Lake OCSF Queries 2 | 3 | This directory contains sample security analytics queries for the new [Amazon Security Lake](https://w.amazon.com/bin/view/SecurityLake/) service. Amazon Security Lake is a new automated security data lake service that allows customers to aggregate, manage, and derive value from their security related log and event data. Amazon Security Lake automates the central management of security data, normalizing it into the open-source security schema [OCSF](https://github.com/ocsf). OCSF was [co-initiated by AWS and developed in collaboration with other industry leaders](https://aws.amazon.com/blogs/security/aws-co-announces-release-of-the-open-cybersecurity-schema-framework-ocsf-project/) to enable security use cases such as incident response and security data analytics. 4 | 5 | These queries were originally developed for the AWS Customer Incident Response Team for the [AWS Security Analytics Bootstrap](https://github.com/awslabs/aws-security-analytics-bootstrap/blob/main/LICENSE) project and were converted into the normalized OSCF log format used by Amazon Security Lake. 6 | 7 | ## Amazon Security Lake Demo Queries 8 | 9 | AWS Service Log | Demo Query Link 10 | ------|------| 11 | All Queries Combined | [all demo queries](./ocsf/amazon_security_lake_queries_all.md) 12 | [AWS CloudTrail Management Events](https://docs.aws.amazon.com/security-lake/latest/userguide/internal-sources.html#cloudtrail-event-logs) | [cloudtrail management events demo queries](./ocsf/amazon_security_lake_queries_cloudtrail_management.md) 13 | [AWS CloudTrail Lambda Data Events](https://docs.aws.amazon.com/security-lake/latest/userguide/internal-sources.html#cloudtrail-event-logs) | [cloudtrail lambda data events demo queries](./ocsf/amazon_security_lake_queries_cloudtrail_lambda.md) 14 | [Amazon Virtual Private Cloud (VPC) Flow Logs](https://docs.aws.amazon.com/security-lake/latest/userguide/internal-sources.html#vpc-flow-logs) | [vpc flow demo queries](./ocsf/amazon_security_lake_queries_vpcflow.md) 15 | [Amazon Route 53 DNS resolver query logs](https://docs.aws.amazon.com/security-lake/latest/userguide/internal-sources.html#route-53-logs) | [route 53 dns demo queries](./ocsf/amazon_security_lake_queries_route53.md) 16 | [Security Hub Findings](https://docs.aws.amazon.com/security-lake/latest/userguide/internal-sources.html#security-hub-findings) | [security hub event demo queries](./ocsf/amazon_security_lake_queries_securityhub.md) 17 | 18 | ## Acknowledgment 19 | 20 | Many thanks to support from: 21 | - AWS Customer Incident Response Team 22 | - Amazon Security Lake Product Team 23 | - Anna McAbee 24 | - Charles Roberts 25 | - Marc Luescher 26 | - Ross Warren 27 | - Josh Pavel 28 | 29 | 30 | ## License 31 | 32 | This project is licensed under the Apache-2.0 License. 33 | -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/amazon_security_lake_queries/ocsf/amazon_security_lake_queries_all.md: -------------------------------------------------------------------------------- 1 | 5 | 6 | # Amazon Security Lake Example Queries 7 | 8 | ## Cloudtrail Management Events 9 | 10 | ### PREVIEW CLOUDTRAIL MANAGEMENT EVENTS TABLE 11 | 12 | **Query:** Preview first 10 rows with all fields, quick way to verify everything is setup correctly 13 | 14 | ``` 15 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 16 | LIMIT 10; 17 | ``` 18 | 19 | ### CLOUDTRAIL MANAGEMENT PARTITION TESTS 20 | > **NOTE:** if there are no partition constraints (accountid, region, or eventday) then by default ALL data will be scanned this could lead to costly query, always consider using at least one partition constraint. 21 | > 22 | > Note that this is the case even if you have other constraints in a query (e.g. sourceipaddress = '192.0.2.1'), only constraints using partition fields (eventday, region, accountid) will limit the amount of data scanned. 23 | 24 | **Query:** Preview first 10 rows with all fields, limited to a single account 25 | ``` 26 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 27 | WHERE accountid = '111122223333' 28 | LIMIT 10; 29 | ``` 30 | 31 | **Query:** Preview first 10 rows with all fields, limited to multiple accounts 32 | ``` 33 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 34 | WHERE accountid in ('111122223333','444455556666','123456789012') 35 | LIMIT 10; 36 | ``` 37 | 38 | **Query:** Preview first 10 rows with all fields, limited to a single region 39 | ``` 40 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 41 | WHERE region = 'us-east-1' 42 | LIMIT 10; 43 | ``` 44 | 45 | **Query:** Preview first 10 rows with all fields, limited to multiple regions 46 | ``` 47 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 48 | WHERE region in ('us-east-1','us-east-2','us-west-2') 49 | LIMIT 10; 50 | ``` 51 | 52 | **Query:** preview first 10 rows with all fields, limited to a certain date range 53 | > NOTE: eventday format is 'YYYYMMDD' as a string 54 | ``` 55 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 56 | WHERE eventday >= '20230530' 57 | AND eventday <= '20230631' 58 | LIMIT 10; 59 | ``` 60 | 61 | **Query:** Preview first 10 rows with all fields, limited to the past 30 days (relative) 62 | ``` 63 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 64 | WHERE eventday >= date_format(date_add('day',-30,current_timestamp), '%Y%m%d') 65 | LIMIT 10; 66 | ``` 67 | 68 | **Query:** Preview first 10 rows with all fields, limited by a combination of partition constraints 69 | > NOTE: narrowing the scope of the query as much as possible will improve performance and minimize cost 70 | 71 | ``` 72 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 73 | WHERE eventday >= '20230530' 74 | AND eventday <= '20230631' 75 | AND accountid = '111122223333' 76 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 77 | LIMIT 10; 78 | ``` 79 | 80 | ### CLOUDTRAIL MANAGEMENT ANALYSIS EXAMPLES 81 | > NOTE: default partition constraints have been provided for each query, be sure to add the appropriate partition constraints to the WHERE clause as shown in the section above 82 | 83 | > DEFAULT partition constraints: 84 | ``` 85 | WHERE eventday >= '20230530' 86 | AND eventday <= '20230631' 87 | AND accountid = '111122223333' 88 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 89 | ``` 90 | > Be sure to modify or remove these to fit the scope of your intended analysis 91 | 92 | 93 | **Query:** Summary of event counts by Region (e.g. where is the most activity) 94 | ``` 95 | SELECT region, count(*) as eventcount FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 96 | WHERE eventday >= '20230530' 97 | AND eventday <= '20230631' 98 | AND accountid = '111122223333' 99 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 100 | GROUP BY region 101 | ORDER BY eventcount DESC 102 | ``` 103 | 104 | **Query:** Summary of event count by Region and EventName, ordered by event count (descending) for each region. This is a quick way to identify top cloudtrail eventnames seen in each region 105 | 106 | ``` 107 | SELECT region, api.operation, count(*) as operation_count FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 108 | WHERE eventday >= '20230530' 109 | AND eventday <= '20230631' 110 | AND accountid = '111122223333' 111 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 112 | GROUP BY region, api.operation 113 | ORDER BY region, operation_count DESC 114 | ``` 115 | 116 | **Query:** User login summary, via AssumeRole or ConsoleLogin includes a list of all source IPs for each user 117 | ``` 118 | SELECT actor.user.uuid, api.operation, array_agg(DISTINCT(src_endpoint.ip) ORDER BY src_endpoint.ip) AS sourceips FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 119 | WHERE actor.user.uuid IS NOT NULL 120 | AND (api.operation = 'AssumeRole' OR api.operation = 'ConsoleLogin') 121 | AND eventday >= '20230530' 122 | AND eventday <= '20230631' 123 | AND accountid = '111122223333' 124 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 125 | GROUP BY actor.user.uuid, api.operation 126 | ORDER BY api.operation 127 | ``` 128 | 129 | **Query:** User login summary, via AssumeRole or ConsoleLogin includes a list of all source IPs for each user 130 | 131 | > NOTE: This query is simlar to the quere above, except it uses the normalized OCSF activityid for login activity (1) rather than explitly searching for login operation names. 132 | 133 | ``` 134 | SELECT actor.user.uuid, api.operation, array_agg(DISTINCT(src_endpoint.ip) ORDER BY src_endpoint.ip) AS sourceips FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 135 | WHERE actor.user.uuid IS NOT NULL 136 | AND activity_id = 1 137 | AND eventday >= '20230530' 138 | AND eventday <= '20230631' 139 | AND accountid = '111122223333' 140 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 141 | GROUP BY actor.user.uuid, api.operation 142 | ORDER BY api.operation 143 | ``` 144 | 145 | **Query:** User Activity Summary: filter high volume read-only GET/LIST/DECRIBE calls 146 | 147 | ``` 148 | SELECT actor.user.uuid, array_agg(DISTINCT(api.operation)) AS operations, 149 | array_agg(DISTINCT(src_endpoint.ip) ORDER BY src_endpoint.ip) AS sourceips, 150 | array_agg(DISTINCT(http_request.user_agent) ORDER BY http_request.user_agent) AS user_agents FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 151 | WHERE api.operation <> 'AssumeRole' 152 | AND api.operation NOT LIKE 'Get%' 153 | AND api.operation NOT LIKE 'List%' 154 | AND api.operation NOT LIKE 'Describe%' 155 | AND eventday >= '20230530' 156 | AND eventday <= '20230631' 157 | AND accountid = '111122223333' 158 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 159 | GROUP BY actor.user.uuid 160 | ``` 161 | 162 | **Query:** User Activity Summary, including username: filter high volume read-only GET/LIST/DECRIBE calls 163 | > NOTE: this query is similar to the one above, but will include the ARN or the username (for IAM Users) of the principal 164 | 165 | ``` 166 | SELECT actor.user.uuid, actor.user.name, 167 | array_agg(DISTINCT(api.operation) ORDER BY api.operation) AS operations, 168 | array_agg(DISTINCT(src_endpoint.ip) ORDER BY src_endpoint.ip) AS sourceips, 169 | array_agg(DISTINCT(http_request.user_agent) ORDER BY http_request.user_agent) AS user_agents FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 170 | WHERE api.operation <> 'AssumeRole' 171 | AND api.operation NOT LIKE 'Get%' 172 | AND api.operation NOT LIKE 'List%' 173 | AND api.operation NOT LIKE 'Describe%' 174 | AND eventday >= '20230530' 175 | AND eventday <= '20230631' 176 | AND accountid = '111122223333' 177 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 178 | GROUP BY actor.user.uuid, actor.user.uid, actor.user.name 179 | ``` 180 | 181 | **Query:** Search for activity by a specific IAM User 182 | > NOTE: this query is similar to the one above, but will search for just a certain access key that's associated with an IAM User 183 | ``` 184 | SELECT time, eventday, actor.user.uuid, actor.user.name, actor.user.credential_uid, api.operation, unmapped['requestParameters.userName'] as requestParametersUsername, unmapped['requestParameters.policyArn'] as requestParametersPolicyArn, api.response 185 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 186 | WHERE actor.user.type = 'IAMUser' 187 | AND actor.user.name = '{username}' 188 | AND eventday >= '20230530' 189 | AND eventday <= '20230631' 190 | AND accountid = '111122223333' 191 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2'); 192 | ``` 193 | 194 | **Query:** Search for activity associated with a specific IAM User's Access Key 195 | > NOTE: this query is similar to the one above, but will search for just a certain access key that's associated with an IAM User 196 | ``` 197 | SELECT time, eventday, actor.user.uuid, actor.user.name, actor.user.credential_uid, api.operation, unmapped['requestParameters.userName'] as requestParametersUsername, unmapped['requestParameters.policyArn'] as requestParametersPolicyArn, api.response 198 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 199 | WHERE actor.user.type = 'IAMUser' 200 | AND actor.user.credential_uid = '{accesskeyid}' 201 | AND eventday >= '20230530' 202 | AND eventday <= '20230631' 203 | AND accountid = '111122223333' 204 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2'); 205 | ``` 206 | 207 | **Query:** IAM change summary: Filter read-only GET/LIST/DESCRIBE and Filter unsuccessful calls 208 | 209 | ``` 210 | SELECT time, actor.user.uuid, actor.user.name, api.operation, unmapped['requestParameters'] AS request_parameters 211 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 212 | WHERE api.service.name = 'iam.amazonaws.com' 213 | AND api.operation NOT LIKE 'Get%' 214 | AND api.operation NOT LIKE 'List%' 215 | AND api.operation NOT LIKE 'Describe%' 216 | AND api.response.error IS NULL 217 | AND eventday >= '20230530' 218 | AND eventday <= '20230631' 219 | AND accountid = '111122223333' 220 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 221 | ORDER BY accountid, time 222 | ``` 223 | 224 | **Query:** Access Key creations with extract of username and keyid. Filter unsuccessful calls 225 | 226 | ``` 227 | SELECT time, actor.user.uuid, actor.user.name, api.operation, 228 | unmapped['responseElements.accessKey.userName'] AS user_name, 229 | unmapped['responseElements.accessKey.accessKeyId'] AS access_key 230 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 231 | WHERE api.operation = 'CreateAccessKey' 232 | AND api.response.error IS NULL 233 | AND eventday >= '20230530' 234 | AND eventday <= '20230631' 235 | AND accountid = '111122223333' 236 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 237 | ORDER BY accountid, time 238 | ``` 239 | 240 | **Query:** Password changes with extract of username. Filter unsuccessful calls 241 | 242 | ``` 243 | SELECT time, actor.user.uuid, actor.user.name, api.operation, 244 | user.name AS "username with password modified" 245 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 246 | WHERE api.operation IN ('UpdateLoginProfile', 'CreateLoginProfile') 247 | AND api.response.error IS NULL 248 | AND eventday >= '20230530' 249 | AND eventday <= '20230631' 250 | AND accountid = '111122223333' 251 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 252 | ORDER BY accountid, time 253 | ``` 254 | 255 | **Query:** Identify API events made from a public IP (i.e. a non-RFC1918 source IP address from a publicly routed address). Useful to filter internal API calls. 256 | 257 | > NOTE: this is an example of the new IPADDRESS data type added in Athena engine v2 and IP Address contains function added in the Athena engine v3. Be sure that you've [enabled Athena engine v3](https://aws.amazon.com/blogs/big-data/upgrade-to-athena-engine-version-3-to-increase-query-performance-and-access-more-analytics-features/) 258 | 259 | ``` 260 | SELECT * 261 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 262 | WHERE src_endpoint.ip <> '' 263 | AND NOT contains('10.0.0.0/8', CAST(src_endpoint.ip AS IPADDRESS)) 264 | AND NOT contains('172.16.0.0/12', CAST(src_endpoint.ip AS IPADDRESS)) 265 | AND NOT contains('192.168.0.0/16', CAST(src_endpoint.ip AS IPADDRESS)) 266 | AND eventday >= '20230530' 267 | AND eventday <= '20230631' 268 | AND accountid = '111122223333' 269 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 270 | ``` 271 | 272 | ## Cloudtrail Lambda Data Events 273 | 274 | ### CLOUDTRAIL LAMBDA DATA EVENTS PARTITION TESTS 275 | 276 | **Query:** Preview first 10 rows with all fields, quick way to verify everything is setup correctly 277 | 278 | ``` 279 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 280 | LIMIT 10; 281 | ``` 282 | 283 | ### CLOUDTRAIL LAMBDA PARTITION TESTS 284 | 285 | > **NOTE:** if there are no partition constraints (accountid, region, or eventday) then by default ALL data will be scanned this could lead to costly query, always consider using at least one partition constraint. 286 | > 287 | > Note that this is the case even if you have other constraints in a query (e.g. sourceipaddress = '192.0.2.1'), only constraints using partition fields (eventday, region, accountid) will limit the amount of data scanned. 288 | 289 | **Query:** Preview first 10 rows with all fields, limited to a single account 290 | 291 | ``` 292 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 293 | WHERE accountid = '111122223333' 294 | LIMIT 10; 295 | ``` 296 | **Query:** Preview first 10 rows with all fields, limited to multiple accounts 297 | 298 | ``` 299 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 300 | WHERE accountid in ('111122223333','444455556666','123456789012') 301 | LIMIT 10; 302 | ``` 303 | 304 | **Query:** Preview first 10 rows with all fields, limited to a single region 305 | 306 | ``` 307 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 308 | WHERE region = 'us-east-1' 309 | LIMIT 10; 310 | ``` 311 | 312 | **Query:** Preview first 10 rows with all fields, limited to multiple regions 313 | 314 | ``` 315 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 316 | WHERE region in ('us-east-1','us-east-2','us-west-2') 317 | LIMIT 10; 318 | ``` 319 | 320 | **Query:** preview first 10 rows with all fields, limited to a certain date range 321 | > **NOTE:** eventday format is 'YYYYMMDD' as a string 322 | 323 | ``` 324 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 325 | WHERE eventday >= '20230530' 326 | AND eventday <= '20230631' 327 | LIMIT 10; 328 | ``` 329 | 330 | **Query:** Preview first 10 rows with all fields, limited to the past 30 days (relative) 331 | 332 | ``` 333 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 334 | WHERE eventday >= date_format(date_add('day',-30,current_timestamp), '%Y%m%d') 335 | LIMIT 10; 336 | ``` 337 | 338 | **Query:** Preview first 10 rows with all fields, limited by a combination of partition constraints 339 | > **NOTE:** narrowing the scope of the query as much as possible will improve performance and minimize cost 340 | 341 | ``` 342 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 343 | WHERE eventday >= '20230530' 344 | AND eventday <= '20230631' 345 | AND accountid = '111122223333' 346 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 347 | LIMIT 10; 348 | ``` 349 | 350 | ### CLOUDTRAIL LAMBDA DATA EVENT ANALYSIS EXAMPLES 351 | 352 | **Query:** Query all Cloudtrail Lambda data events for a specific Lambda function named 'MyLambdaFunction' 353 | ``` 354 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 355 | WHERE any_match(transform(resources, x -> x.uid), y -> y like '%MyLambdaFunction%') 356 | AND eventday >= '20230530' 357 | AND eventday <= '20230631' 358 | AND accountid = '111122223333' 359 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2'); 360 | ``` 361 | 362 | ## VPC Flow 363 | 364 | ### PREVIEW TABLE 365 | 366 | **Query:** Preview first 10 rows with all fields, quick way to verify everything is setup correctly 367 | 368 | ``` 369 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 370 | LIMIT 10; 371 | ``` 372 | 373 | ### VPCFLOW PARTITION TESTS 374 | > **NOTE:** if there are no partition constraints (accountid, region, or eventday) then by default ALL data will be scanned this could lead to costly query, always consider using at least one partition constraint. 375 | > 376 | > Note that this is the case even if you have other constraints in a query (e.g. sourceipaddress = '192.0.2.1'), only constraints using partition fields (eventday, region, accountid) will limit the amount of data scanned. 377 | 378 | **Query:** Preview first 10 rows with all fields, limited to a single account 379 | 380 | ``` 381 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 382 | WHERE accountid = '111122223333' 383 | LIMIT 10; 384 | ``` 385 | 386 | **Query:** Preview first 10 rows with all fields, limited to multiple accounts 387 | ``` 388 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 389 | WHERE accountid in ('111122223333','444455556666','123456789012') 390 | LIMIT 10; 391 | ``` 392 | 393 | **Query:** Preview first 10 rows with all fields, limited to a single region 394 | ``` 395 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 396 | WHERE region = 'us-east-1' 397 | LIMIT 10; 398 | ``` 399 | 400 | **Query:** Preview first 10 rows with all fields, limited to multiple regions 401 | ``` 402 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 403 | WHERE region in ('us-east-1','us-east-2','us-west-2') 404 | LIMIT 10; 405 | ``` 406 | 407 | **Query:** Preview first 10 rows with all fields, limited to a certain date range 408 | > NOTE: eventday format is 'YYYYMMDD' as a string 409 | 410 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 411 | WHERE eventday >= '20230530' 412 | AND eventday <= '20230631' 413 | LIMIT 10; 414 | 415 | **Query:** Preview first 10 rows with all fields, limited to the past 30 days (relative) 416 | ``` 417 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 418 | WHERE eventday >= date_format(date_add('day',-30,current_timestamp), '%Y%m%d') 419 | LIMIT 10; 420 | ``` 421 | 422 | **Query:** Preview first 10 rows with all fields, limited by a combination partition constraints 423 | 424 | > NOTE: narrowing the scope of the query as much as possible will improve performance and minimize cost 425 | ``` 426 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 427 | WHERE eventday >= '20230530' 428 | AND eventday <= '20230631' 429 | AND accountid = '111122223333' 430 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 431 | LIMIT 10; 432 | ``` 433 | ### VPC FLOW ANALYSIS EXAMPLES 434 | 435 | > NOTE: default partition constraints have been provided for each query, be sure to add the appropriate partition constraints to the WHERE clause as shown in the section above 436 | 437 | > DEFAULT partition constraints: 438 | ``` 439 | WHERE eventday >= '20230530' 440 | AND eventday <= '20230631' 441 | AND accountid = '111122223333' 442 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 443 | ``` 444 | 445 | **Query:** Get list source/destination IP pairs ordered by the number of records 446 | ``` 447 | SELECT region, src_endpoint.ip as src_ip, dst_endpoint.ip as dst_ip, count(*) as record_count FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 448 | WHERE eventday >= '20230530' 449 | AND eventday <= '20230631' 450 | AND accountid = '111122223333' 451 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 452 | GROUP BY region, src_endpoint.ip, dst_endpoint.ip 453 | ORDER BY record_count DESC 454 | ``` 455 | 456 | **Query:** Get a summary of records between a given source/destination IP pair, ordered by the total number of bytes 457 | 458 | ``` 459 | SELECT region, src_endpoint.ip as src_ip, dst_endpoint.ip as dst_ip, sum(traffic.bytes) as byte_count FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 460 | WHERE (src_endpoint.ip = '192.0.2.1' OR dst_endpoint.ip = '192.0.2.1') 461 | AND (src_endpoint.ip = '203.0.113.2' OR dst_endpoint.ip = '203.0.113.2') 462 | AND eventday >= '20230530' 463 | AND eventday <= '20230631' 464 | AND accountid = '111122223333' 465 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 466 | GROUP BY region, dst_endpoint.instance_uid, src_endpoint.ip, dst_endpoint.ip 467 | ORDER BY byte_count DESC 468 | ``` 469 | 470 | **Query:** Get a summary of the number of bytes sent from port 443 limited to a single instance 471 | > NOTE: for remote IPs this represents the amount data downloaded from port 443 by the instance, for instance IPs this represents the amount data downloaded by remost hosts from the instance on port 443 472 | 473 | ``` 474 | SELECT region, dst_endpoint.instance_uid as dst_instance_id, src_endpoint.ip as src_ip, src_endpoint.port as src_port, dst_endpoint.ip as dst_ip, sum(traffic.bytes) as byte_count FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 475 | WHERE dst_endpoint.instance_uid = 'i-000000000000000' 476 | AND src_endpoint.port = 443 477 | AND eventday >= '20230530' 478 | AND eventday <= '20230631' 479 | AND accountid = '111122223333' 480 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 481 | GROUP BY region, dst_endpoint.instance_uid, src_endpoint.ip, src_endpoint.port, dst_endpoint.ip 482 | ORDER BY byte_count DESC 483 | ``` 484 | 485 | **Query:** Get a summary with the number of bytes for each src_ip,src_port,dst_ip,dst_port quad across all records to or from a specific IP 486 | 487 | ``` 488 | SELECT src_endpoint.ip as src_ip, dst_endpoint.ip as dst_ip, src_endpoint.port as src_port, dst_endpoint.port as dst_port, sum(traffic.bytes) as byte_count FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 489 | WHERE (src_endpoint.ip = '192.0.2.1' OR dst_endpoint.ip = '192.0.2.1') 490 | AND eventday >= '20230530' 491 | AND eventday <= '20230631' 492 | AND accountid = '111122223333' 493 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 494 | GROUP BY src_endpoint.ip, dst_endpoint.ip, src_endpoint.port, dst_endpoint.port 495 | ORDER BY byte_count DESC 496 | ``` 497 | 498 | **Query:** Get all flow records between two IPs showing connection_info.direction 499 | ``` 500 | SELECT from_unixtime(start_time/1000) AS start_time, 501 | from_unixtime(end_time/1000) AS end_time, 502 | src_endpoint.ip, 503 | dst_endpoint.ip, 504 | src_endpoint.port, 505 | dst_endpoint.port, 506 | traffic.packets, 507 | traffic.bytes, 508 | connection_info.direction, 509 | activity_name 510 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 511 | WHERE (src_endpoint.ip = '192.0.2.1' 512 | AND dst_endpoint.ip = '192.0.2.254') 513 | OR (src_endpoint.ip = '192.0.2.254' 514 | AND dst_endpoint.ip = '192.0.2.1') 515 | ORDER BY start_time ASC 516 | ``` 517 | 518 | **Query:** List when source ips were first seen / last seen with a summary of destination ip/instances/ports 519 | ``` 520 | SELECT src_endpoint.ip, 521 | from_unixtime(min(start_time)/1000) AS first_seen, 522 | from_unixtime(max(end_time)/1000) AS last_seen, 523 | array_agg(DISTINCT(dst_endpoint.ip)), 524 | array_agg(DISTINCT(dst_endpoint.instance_uid)), 525 | array_agg(DISTINCT(dst_endpoint.port)) 526 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 527 | WHERE dst_endpoint.port < 32768 -- skip ephemeral ports, since we're looking for inbound connections to service ports 528 | AND eventday >= '20230530' 529 | AND eventday <= '20230631' 530 | AND accountid = '111122223333' 531 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 532 | GROUP BY src_endpoint.ip 533 | ORDER by first_seen ASC 534 | ``` 535 | 536 | 537 | **Query:** Transfer Report on Top 10 Internal IPs with large transfers, limited to source addresses in network 192.0.2.0/24 538 | 539 | ``` 540 | SELECT vpcflow.eventday, vpcflow.src_endpoint.ip as src_endpoint_ip, vpcflow.dst_endpoint.ip as dst_endpoint_ip, sum(vpcflow.traffic.bytes) as byte_count 541 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" as vpcflow 542 | INNER JOIN (SELECT src_endpoint.ip as src_endpoint_ip, sum(traffic.bytes) as byte_count FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 543 | WHERE src_endpoint.ip <> '-' 544 | AND contains('192.0.2.0/24', cast(src_endpoint.ip as IPADDRESS)) 545 | AND eventday >= '20230530' 546 | AND eventday <= '20230631' 547 | AND accountid = '111122223333' 548 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 549 | GROUP BY region, dst_endpoint.instance_uid, src_endpoint.ip, dst_endpoint.ip, dst_endpoint.port 550 | ORDER BY byte_count DESC 551 | LIMIT 10 ) as top_n 552 | ON top_n.src_endpoint_ip = vpcflow.src_endpoint.ip 553 | WHERE eventday >= '20230530' 554 | AND eventday <= '20230631' 555 | AND accountid = '111122223333' 556 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 557 | GROUP BY vpcflow.eventday, vpcflow.src_endpoint.ip, vpcflow.dst_endpoint.ip 558 | ORDER BY vpcflow.eventday ASC, vpcflow.src_endpoint.ip ASC, vpcflow.dst_endpoint.ip ASC, byte_count DESC 559 | ``` 560 | 561 | **Query:** Search for traffic between a private (RFC1918) IP address and a public (non-RFC1918) IP address 562 | > NOTE: this is an example of the new IPADDRESS data type added in Athena engine v2 and IP Address contains function added in the Athena engine v3. Be sure that you've [enabled Athena engine v3](https://aws.amazon.com/blogs/big-data/upgrade-to-athena-engine-version-3-to-increase-query-performance-and-access-more-analytics-features/) 563 | 564 | ``` 565 | SELECT * 566 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 567 | WHERE src_endpoint.ip <> '-' 568 | AND dst_endpoint.ip <> '-' 569 | AND ( 570 | ( 571 | NOT ( 572 | contains( 573 | '10.0.0.0/8', 574 | (CAST(src_endpoint.ip AS IPADDRESS)) 575 | ) 576 | OR contains( 577 | '172.16.0.0/12', 578 | (CAST(src_endpoint.ip AS IPADDRESS)) 579 | ) 580 | OR contains( 581 | '192.168.0.0/16', 582 | (CAST(src_endpoint.ip AS IPADDRESS)) 583 | ) 584 | ) 585 | AND ( 586 | contains( 587 | '10.0.0.0/8', 588 | (CAST(dst_endpoint.ip AS IPADDRESS)) 589 | ) 590 | OR contains( 591 | '172.16.0.0/12', 592 | (CAST(dst_endpoint.ip AS IPADDRESS)) 593 | ) 594 | OR contains( 595 | '192.168.0.0/16', 596 | (CAST(dst_endpoint.ip AS IPADDRESS)) 597 | ) 598 | ) 599 | ) 600 | OR ( 601 | NOT ( 602 | contains( 603 | '10.0.0.0/8', 604 | (CAST(dst_endpoint.ip AS IPADDRESS)) 605 | ) 606 | OR contains( 607 | '172.16.0.0/12', 608 | (CAST(dst_endpoint.ip AS IPADDRESS)) 609 | ) 610 | OR contains( 611 | '192.168.0.0/16', 612 | (CAST(dst_endpoint.ip AS IPADDRESS)) 613 | ) 614 | ) 615 | AND ( 616 | contains( 617 | '10.0.0.0/8', 618 | (CAST(src_endpoint.ip AS IPADDRESS)) 619 | ) 620 | OR contains( 621 | '172.16.0.0/12', 622 | (CAST(src_endpoint.ip AS IPADDRESS)) 623 | ) 624 | OR contains( 625 | '192.168.0.0/16', 626 | (CAST(src_endpoint.ip AS IPADDRESS)) 627 | ) 628 | ) 629 | ) 630 | ) 631 | AND eventday >= '20230530' 632 | AND eventday <= '20230631' 633 | AND accountid = '111122223333' 634 | AND region in ('us-east-1','us-east-2','us-west-2','us-west-2') 635 | ``` 636 | 637 | 638 | **Query:** Search for all internal-to-internal VPC Flow records for the internal VPC Subnets in the private 172.16.0.0/12 address space 639 | > NOTE: this is an example of the new IPADDRESS data type added in Athena engine v2 and IP Address contains function added in the Athena engine v3. Be sure that you've [enabled Athena engine v3](https://aws.amazon.com/blogs/big-data/upgrade-to-athena-engine-version-3-to-increase-query-performance-and-access-more-analytics-features/) 640 | 641 | ``` 642 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 643 | WHERE src_endpoint.ip <> '-' 644 | AND dst_endpoint.ip <> '-' 645 | AND contains('172.16.0.0/12', cast(src_endpoint.ip as IPADDRESS)) 646 | AND contains('172.16.0.0/12', cast(dst_endpoint.ip as IPADDRESS)) 647 | AND eventday >= '20230530' 648 | AND eventday <= '20230631' 649 | AND accountid = '111122223333' 650 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 651 | ``` 652 | 653 | **Query:** Search for all VPC Flow records _except_ the internal-to-internal records for VPC Subnets in the private 172.16.0.0/12 address space. Useful to filter out internal VPC traffic and only show traffic to or from external IP Addresses. 654 | 655 | ``` 656 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 657 | WHERE src_endpoint.ip <> '-' 658 | AND dst_endpoint.ip <> '-' 659 | AND NOT ( 660 | contains('172.16.0.0/12', cast(src_endpoint.ip as IPADDRESS)) 661 | AND contains('172.16.0.0/12', cast(dst_endpoint.ip as IPADDRESS)) 662 | ) 663 | AND eventday >= '20230530' 664 | AND eventday <= '20230631' 665 | AND accountid = '111122223333' 666 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 667 | ``` 668 | 669 | ## Route53 670 | 671 | ### PREVIEW TABLE 672 | 673 | **Query:** Preview first 10 rows with all fields, quick way to verify everything is setup correctly 674 | ``` 675 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 676 | LIMIT 10; 677 | ``` 678 | 679 | ### ROUTE 53 PARTITION TESTS 680 | > **NOTE:** if there are no partition constraints (accountid, region, or eventday) then by default ALL data will be scanned this could lead to costly query, always consider using at least one partition constraint. 681 | > 682 | > Note that this is the case even if you have other constraints in a query (e.g. sourceipaddress = '192.0.2.1'), only constraints using partition fields (eventday, region, accountid) will limit the amount of data scanned. 683 | 684 | 685 | **Query:** Preview first 10 rows with all fields, limited to a single account 686 | ``` 687 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 688 | WHERE accountid = '111122223333' 689 | LIMIT 10; 690 | ``` 691 | 692 | **Query:** Preview first 10 rows with all fields, limited to multiple accounts 693 | ``` 694 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 695 | WHERE accountid in ('111122223333','444455556666','123456789012') 696 | LIMIT 10; 697 | ``` 698 | 699 | 700 | **Query:** Preview first 10 rows with all fields, limited to a single region 701 | ``` 702 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 703 | WHERE region = 'us-east-1' 704 | LIMIT 10; 705 | ``` 706 | 707 | **Query:** Preview first 10 rows with all fields, limited to multiple regions 708 | ``` 709 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 710 | WHERE region in ('us-east-1','us-east-2','us-west-2') 711 | LIMIT 10; 712 | ``` 713 | 714 | **Query:** preview first 10 rows with all fields, limited to a certain date range 715 | > NOTE: eventday format is 'YYYYMMDD' as a string 716 | ``` 717 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 718 | WHERE eventday >= '20230530' 719 | AND eventday <= '20230631' 720 | LIMIT 10; 721 | ``` 722 | 723 | **Query:** Preview first 10 rows with all fields, limited to the past 30 days (relative) 724 | ``` 725 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 726 | WHERE eventday >= date_format(date_add('day',-30,current_timestamp), '%Y%m%d') 727 | LIMIT 10; 728 | ``` 729 | 730 | **Query:** Preview first 10 rows with all fields, limited by a combination of partition constraints 731 | > NOTE: narrowing the scope of the query as much as possible will improve performance and minimize cost 732 | 733 | ``` 734 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 735 | WHERE eventday >= '20230530' 736 | AND eventday <= '20230631' 737 | AND accountid = '111122223333' 738 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 739 | LIMIT 10; 740 | ``` 741 | 742 | ### ROUTE53 ANALYSIS EXAMPLES 743 | 744 | **Query:** Sort queries by the number of isntances that requested each hostname 745 | 746 | ``` 747 | SELECT query.hostname, query.type, cardinality(array_distinct(filter(array_agg(src_endpoint), q -> q.instance_uid IS NOT NULL))) as instance_count, array_distinct(filter(array_agg(src_endpoint), q -> q.instance_uid IS NOT NULL)) as instances 748 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 749 | WHERE eventday >= '20230530' 750 | AND eventday <= '20230631' 751 | AND accountid = '111122223333' 752 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 753 | GROUP BY query.hostname, query.type 754 | ORDER by instance_count DESC; 755 | ``` 756 | 757 | **Query:** Sort queries by the number of queries for each each hostname 758 | 759 | ``` 760 | SELECT query.hostname, query.type, count(*) as query_count FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 761 | WHERE eventday >= '20230530' 762 | AND eventday <= '20230631' 763 | AND accountid = '111122223333' 764 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 765 | GROUP BY query.hostname, query.type 766 | ORDER BY query_count DESC; 767 | ``` 768 | 769 | **Query:** Summary with count of each time an A record type of a hostname was queried 770 | ``` 771 | SELECT query.hostname, query.type, count(*) as query_count FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 772 | WHERE query.type = 'A' 773 | AND eventday >= '20230530' 774 | AND eventday <= '20230631' 775 | AND accountid = '111122223333' 776 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 777 | GROUP BY query.hostname, query.type 778 | ORDER BY query_count DESC; 779 | ``` 780 | 781 | **Query:** Summary with count of each time an A record type of a hostname was queried. Split out TLD and SLD (note: doesn't properly handle TLDs containing a '.' (e.g. .com.br) 782 | 783 | ``` 784 | SELECT element_at(split(query.hostname,'.'),-2) AS tld, 785 | element_at(split(query.hostname,'.'),-3) AS sld, 786 | query.hostname, query.type, 787 | count(*) AS query_count 788 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 789 | WHERE query.type = 'A' 790 | AND eventday >= '20230530' 791 | AND eventday <= '20230631' 792 | AND accountid = '111122223333' 793 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 794 | GROUP BY query.hostname, query.type 795 | ORDER BY query_count DESC; 796 | ``` 797 | 798 | **Query:** Get records that that resolve to a specific IP (e.g., 203.0.113.2) 799 | 800 | ``` 801 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 802 | WHERE contains(transform(answers, x-> x.rdata), '203.0.113.2') 803 | AND eventday >= '20230530' 804 | AND eventday <= '20230631' 805 | AND accountid = '111122223333' 806 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2'); 807 | ``` 808 | 809 | ## Security Hub 810 | 811 | ### PREVIEW SECURITY HUB TABLE 812 | 813 | **Query:** Preview first 10 rows with all fields, quick way to verify everything is setup correctly 814 | 815 | ```SQL 816 | SELECT * 817 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 818 | LIMIT 10; 819 | ``` 820 | 821 | ### SECURITY HUB PARTITION TESTS 822 | 823 | > **NOTE:** if there are no partition constraints (accountid, region, or eventday) then by default ALL data will be scanned this could lead to costly query, always consider using at least one partition constraint. 824 | > 825 | > Note that this is the case even if you have other constraints in a query (e.g. productname = 'Macice'), only constraints using partition fields (eventday, region, accountid) will limit the amount of data scanned. 826 | 827 | **Query:** Preview first 10 rows with all fields, limited to a single account 828 | ```SQL 829 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 830 | WHERE accountid = '111122223333' 831 | LIMIT 10; 832 | ``` 833 | **Query:** Preview first 10 rows with all fields, limited to multiple accounts 834 | ```SQL 835 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 836 | WHERE accountid in ('111122223333','444455556666','123456789012') 837 | LIMIT 10; 838 | ``` 839 | 840 | **Query:** Preview first 10 rows with all fields, limited to a single region 841 | ```SQL 842 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 843 | WHERE region = 'us-east-1' 844 | LIMIT 10; 845 | ``` 846 | 847 | **Query:** Preview first 10 rows with all fields, limited to multiple regions 848 | ```SQL 849 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 850 | WHERE region in ('us-east-1','us-east-2','us-west-2') 851 | LIMIT 10; 852 | ``` 853 | 854 | **Query:** preview first 10 rows with all fields, limited to a certain date range 855 | > NOTE: eventday format is 'YYYYMMDD' as a string 856 | ```SQL 857 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 858 | WHERE eventday >= '20230530' 859 | AND eventday <= '20230631' 860 | LIMIT 10; 861 | ``` 862 | 863 | **Query:** Preview first 10 rows with all fields, limited to the past 30 days (relative) 864 | ```SQL 865 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 866 | WHERE eventday >= date_format(date_add('day',-30,current_timestamp), '%Y%m%d') 867 | LIMIT 10; 868 | ``` 869 | 870 | **Query:** Preview first 10 rows with all fields, limited by a combination of partition constraints 871 | > NOTE: narrowing the scope of the query as much as possible will improve performance and minimize cost 872 | 873 | ```SQL 874 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 875 | WHERE eventday >= '20230530' 876 | AND eventday <= '20230631' 877 | AND accountid = '111122223333' 878 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 879 | LIMIT 10; 880 | ``` 881 | 882 | ## Security Hub Analysis Examples 883 | 884 | ### CONVERT UNIX TIMES TO DATE TIME GROUP (DTG) 885 | The OSCF uses Unix times. You can convert these to a DTG which matches the Security Hub Finding Format 886 | 887 | **Query** Convert the `finding.modified_time` column from Unix time to DTG and change the column name to `UpdatedAt` 888 | ```SQL 889 | SELECT FROM_UNIXTIME(CAST(time AS DOUBLE)/1000.0) AS "Time" 890 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 891 | WHERE cloud.account_uid = '111122223333' 892 | LIMIT 10; 893 | ``` 894 | 895 | ### GET THE CONTROL ID 896 | The data returned from Security Lake for a Security Hub Security Standard findings does not include the Security Standard Control Id, for example `SSM.1`. This can be obtained using `finding.title` which returns the full title and always begins with the Control Id, for example `ECR.2 ECR private repositories should have tag immutability configured`. 897 | 898 | **Query** Use `split_part()` to get the Control Id 899 | ```SQL 900 | SELECT split_part(finding.title,' ',1) AS "ProductFields.ControlId" 901 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 902 | WHERE cloud.account_uid = '111122223333' 903 | LIMIT 10; 904 | ``` 905 | 906 | In this example, `split_part()` is used with the selection of the title `finding.title` and uses the space `' '` to perfom the split and selects the first value `1`, this provides the ControlId, this value is given a column name of `ProductFields.ControlId` 907 | 908 | ### CREATE OR REPLACE A VIEW 909 | > NOTE: This view will be used for the following queries indicated by `FROM "sh_findings_view"` 910 | 911 | **Query** Create or replace a view called `sh_findings_view` which flattens the OCSF structure and maps column names to the AWS Security Hub Finding Format. This is just an example and not an exhaustive mapping of all the columns in the schema. 912 | ```SQL 913 | CREATE OR REPLACE VIEW "sh_findings_view" AS 914 | SELECT 915 | metadata.product.version "SchemaVersion", 916 | metadata.product.feature.uid "GeneratorId", 917 | metadata.product.feature.name "ProductName", 918 | metadata.product.uid "ProductArn", 919 | metadata.product.vendor_name "CompanyName", 920 | metadata.product.name "Security Hub", 921 | metadata.version "Version", 922 | FROM_UNIXTIME(CAST(time AS DOUBLE)/1000.0) AS "Time", 923 | confidence "Confidence", 924 | severity "Severity", 925 | state "WorkflowStatus", 926 | cloud.account_uid "AWSAccountId", 927 | element_at(resources,1).type "ResourceType", 928 | element_at(resources,1).uid "ResourceId", 929 | element_at(resources,1).cloud_partition "ResourcePartition", 930 | element_at(resources,1).region "ResourceRegion", 931 | element_at(resources,1).details "resources.details", 932 | FROM_UNIXTIME(CAST(finding.created_time AS DOUBLE)/1000.0) AS "CreatedAt", 933 | finding.uid "Id", 934 | finding.desc "Description", 935 | finding.title "Title", 936 | split_part(finding.title,' ',1) AS "ProductFields.ControlId", 937 | FROM_UNIXTIME(CAST(finding.modified_time AS DOUBLE)/1000.0) AS "UpdatedAt", 938 | FROM_UNIXTIME(CAST(finding.first_seen_time AS DOUBLE)/1000.0) AS "FirstObservedAt", 939 | FROM_UNIXTIME(CAST(finding.last_seen_time AS DOUBLE)/1000.0) AS "LastObservedAt", 940 | element_at(finding.types,1) "Types", 941 | finding.remediation.desc "RecomendationText", 942 | finding.src_url "RecommendationUrl", 943 | compliance.status "ComplianceStatus", 944 | compliance.status_detail "Compliance.StatusReasons.Description", 945 | class_name "SecurityFinding", 946 | unmapped['Severity_Normalized'] "SeverityNormalized", 947 | unmapped['Severity_Original'] "SeverityOriginal", 948 | unmapped['FindingProviderFields'] "FindingProviderFields", 949 | region "Region", 950 | accountid "AccountId", 951 | eventday "EventDay" 952 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 953 | ``` 954 | 955 | ### SELECT SECURITY FINDINGS FROM OTHER INTEGRATION PROVIDERS 956 | 957 | Security Hub has integrations with lots of other AWS security services such as GuardDuty and Config. It also has integrations with other 3rd party security tools which can send findings to AWS Security Hub. You might want to select the findings from one or more security services. 958 | 959 | **Query** Select findings from Amazon Macie and only return the first 10 results using the `sh_findings_view` 960 | ```SQL 961 | SELECT * 962 | FROM "sh_findings_view" 963 | WHERE lower(ProductName) = 'macie' 964 | AND AWSAccountId = '111122223333' 965 | LIMIT 10; 966 | ``` 967 | **Query** Select findings from Amazon Macie and AWS Config and only return 10 results 968 | 969 | ```SQL 970 | SELECT * 971 | FROM "sh_findings_view" 972 | WHERE lower(ProductName) in ('macie', 'config') 973 | AND AWSAccountId = '111122223333' 974 | LIMIT 10; 975 | ``` 976 | 977 | ***Query** Select `Config` findings from the `sh_findings_view` view 978 | ```SQL 979 | SELECT *, 980 | CASE 981 | WHEN lower(Severity) = 'high' THEN 1 982 | WHEN lower(Severity) = 'medium' THEN 2 983 | WHEN lower(Severity) = 'low' THEN 3 984 | ELSE 4 985 | END as display_order 986 | FROM "sh_findings_view" 987 | WHERE lower(ProductName) = 'config' 988 | AND AWSAccountId = '111122223333' 989 | AND lower(WorkFlowStatus) = 'new' 990 | AND lower(Severity) in ('high', 'medium', 'low') 991 | ORDER BY display_order 992 | LIMIT 10; 993 | ``` 994 | -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/amazon_security_lake_queries/ocsf/amazon_security_lake_queries_cloudtrail_lambda.md: -------------------------------------------------------------------------------- 1 | 5 | 6 | # Amazon Security Lake Example Queries 7 | 8 | ## Cloudtrail Lambda Data Events 9 | > **NOTE:** The example queries in this file are intended to query *Cloudtrail Lambda data events*. CloudTrail management events, S3 data events, and Lambda data events are three separate sources in Security Lake. For more information about enabling Cloudtrail sources in Amazon Security Lake please review the official [documentation](https://docs.aws.amazon.com/security-lake/latest/userguide/internal-sources.html). 10 | 11 | ### CLOUDTRAIL LAMBDA DATA EVENTS PARTITION TESTS 12 | 13 | **Query:** Preview first 10 rows with all fields, quick way to verify everything is setup correctly 14 | 15 | ``` 16 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 17 | LIMIT 10; 18 | ``` 19 | 20 | ### CLOUDTRAIL LAMBDA PARTITION TESTS 21 | 22 | > **NOTE:** if there are no partition constraints (accountid, region, or eventday) then by default ALL data will be scanned this could lead to costly query, always consider using at least one partition constraint. 23 | > 24 | > Note that this is the case even if you have other constraints in a query (e.g. sourceipaddress = '192.0.2.1'), only constraints using partition fields (eventday, region, accountid) will limit the amount of data scanned. 25 | 26 | **Query:** Preview first 10 rows with all fields, limited to a single account 27 | 28 | ``` 29 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 30 | WHERE accountid = '111122223333' 31 | LIMIT 10; 32 | ``` 33 | **Query:** Preview first 10 rows with all fields, limited to multiple accounts 34 | 35 | ``` 36 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 37 | WHERE accountid in ('111122223333','444455556666','123456789012') 38 | LIMIT 10; 39 | ``` 40 | 41 | **Query:** Preview first 10 rows with all fields, limited to a single region 42 | 43 | ``` 44 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 45 | WHERE region = 'us-east-1' 46 | LIMIT 10; 47 | ``` 48 | 49 | **Query:** Preview first 10 rows with all fields, limited to multiple regions 50 | 51 | ``` 52 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 53 | WHERE region in ('us-east-1','us-east-2','us-west-2') 54 | LIMIT 10; 55 | ``` 56 | 57 | **Query:** preview first 10 rows with all fields, limited to a certain date range 58 | > **NOTE:** eventday format is 'YYYYMMDD' as a string 59 | 60 | ``` 61 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 62 | WHERE eventday >= '20230530' 63 | AND eventday <= '20230631' 64 | LIMIT 10; 65 | ``` 66 | 67 | **Query:** Preview first 10 rows with all fields, limited to the past 30 days (relative) 68 | 69 | ``` 70 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 71 | WHERE eventday >= date_format(date_add('day',-30,current_timestamp), '%Y%m%d') 72 | LIMIT 10; 73 | ``` 74 | 75 | **Query:** Preview first 10 rows with all fields, limited by a combination of partition constraints 76 | > **NOTE:** narrowing the scope of the query as much as possible will improve performance and minimize cost 77 | 78 | ``` 79 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 80 | WHERE eventday >= '20230530' 81 | AND eventday <= '20230631' 82 | AND accountid = '111122223333' 83 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 84 | LIMIT 10; 85 | ``` 86 | 87 | ### CLOUDTRAIL LAMBDA DATA EVENT ANALYSIS EXAMPLES 88 | 89 | **Query:** Query all Cloudtrail Lambda data events for a specific Lambda function named 'MyLambdaFunction' 90 | ``` 91 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_lambda_execution_1_0" 92 | WHERE any_match(transform(resources, x -> x.uid), y -> y like '%MyLambdaFunction%') 93 | AND eventday >= '20230530' 94 | AND eventday <= '20230631' 95 | AND accountid = '111122223333' 96 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2'); 97 | ``` -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/amazon_security_lake_queries/ocsf/amazon_security_lake_queries_cloudtrail_management.md: -------------------------------------------------------------------------------- 1 | 5 | 6 | # Amazon Security Lake Example Queries 7 | 8 | ## Cloudtrail Management Events 9 | 10 | > **NOTE:** The example queries in this file are intended to query Cloudtrail management events. CloudTrail management events, S3 data events, and Lambda data events are three separate sources in Security Lake. For more information about enabling Cloudtrail sources in Amazon Security Lake please review the official [documentation](https://docs.aws.amazon.com/security-lake/latest/userguide/internal-sources.html). 11 | 12 | ### PREVIEW CLOUDTRAIL MANAGEMENT EVENTS TABLE 13 | 14 | **Query:** Preview first 10 rows with all fields, quick way to verify everything is setup correctly 15 | 16 | ``` 17 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 18 | LIMIT 10; 19 | ``` 20 | 21 | ### CLOUDTRAIL MANAGEMENT PARTITION TESTS 22 | > **NOTE:** if there are no partition constraints (accountid, region, or eventday) then by default ALL data will be scanned this could lead to costly query, always consider using at least one partition constraint. 23 | > 24 | > Note that this is the case even if you have other constraints in a query (e.g. sourceipaddress = '192.0.2.1'), only constraints using partition fields (eventday, region, accountid) will limit the amount of data scanned. 25 | 26 | **Query:** Preview first 10 rows with all fields, limited to a single account 27 | ``` 28 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 29 | WHERE accountid = '111122223333' 30 | LIMIT 10; 31 | ``` 32 | 33 | **Query:** Preview first 10 rows with all fields, limited to multiple accounts 34 | ``` 35 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 36 | WHERE accountid in ('111122223333','444455556666','123456789012') 37 | LIMIT 10; 38 | ``` 39 | 40 | **Query:** Preview first 10 rows with all fields, limited to a single region 41 | ``` 42 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 43 | WHERE region = 'us-east-1' 44 | LIMIT 10; 45 | ``` 46 | 47 | **Query:** Preview first 10 rows with all fields, limited to multiple regions 48 | ``` 49 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 50 | WHERE region in ('us-east-1','us-east-2','us-west-2') 51 | LIMIT 10; 52 | ``` 53 | 54 | **Query:** preview first 10 rows with all fields, limited to a certain date range 55 | > NOTE: eventday format is 'YYYYMMDD' as a string 56 | ``` 57 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 58 | WHERE eventday >= '20230530' 59 | AND eventday <= '20230631' 60 | LIMIT 10; 61 | ``` 62 | 63 | **Query:** Preview first 10 rows with all fields, limited to the past 30 days (relative) 64 | ``` 65 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 66 | WHERE eventday >= date_format(date_add('day',-30,current_timestamp), '%Y%m%d') 67 | LIMIT 10; 68 | ``` 69 | 70 | **Query:** Preview first 10 rows with all fields, limited by a combination of partition constraints 71 | > NOTE: narrowing the scope of the query as much as possible will improve performance and minimize cost 72 | 73 | ``` 74 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 75 | WHERE eventday >= '20230530' 76 | AND eventday <= '20230631' 77 | AND accountid = '111122223333' 78 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 79 | LIMIT 10; 80 | ``` 81 | 82 | ### CLOUDTRAIL MANAGEMENT ANALYSIS EXAMPLES 83 | > NOTE: default partition constraints have been provided for each query, be sure to add the appropriate partition constraints to the WHERE clause as shown in the section above 84 | 85 | > DEFAULT partition constraints: 86 | ``` 87 | WHERE eventday >= '20230530' 88 | AND eventday <= '20230631' 89 | AND accountid = '111122223333' 90 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 91 | ``` 92 | > Be sure to modify or remove these to fit the scope of your intended analysis 93 | 94 | 95 | **Query:** Summary of event counts by Region (e.g. where is the most activity) 96 | ``` 97 | SELECT region, count(*) as eventcount FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 98 | WHERE eventday >= '20230530' 99 | AND eventday <= '20230631' 100 | AND accountid = '111122223333' 101 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 102 | GROUP BY region 103 | ORDER BY eventcount DESC 104 | ``` 105 | 106 | **Query:** Summary of event count by Region and EventName, ordered by event count (descending) for each region. This is a quick way to identify top cloudtrail eventnames seen in each region 107 | 108 | ``` 109 | SELECT region, api.operation, count(*) as operation_count FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 110 | WHERE eventday >= '20230530' 111 | AND eventday <= '20230631' 112 | AND accountid = '111122223333' 113 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 114 | GROUP BY region, api.operation 115 | ORDER BY region, operation_count DESC 116 | ``` 117 | 118 | **Query:** User login summary, via AssumeRole or ConsoleLogin includes a list of all source IPs for each user 119 | ``` 120 | SELECT actor.user.uuid, api.operation, array_agg(DISTINCT(src_endpoint.ip) ORDER BY src_endpoint.ip) AS sourceips FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 121 | WHERE actor.user.uuid IS NOT NULL 122 | AND (api.operation = 'AssumeRole' OR api.operation = 'ConsoleLogin') 123 | AND eventday >= '20230530' 124 | AND eventday <= '20230631' 125 | AND accountid = '111122223333' 126 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 127 | GROUP BY actor.user.uuid, api.operation 128 | ORDER BY api.operation 129 | ``` 130 | 131 | **Query:** User login summary, via AssumeRole or ConsoleLogin includes a list of all source IPs for each user 132 | 133 | > NOTE: This query is simlar to the quere above, except it uses the normalized OCSF activityid for login activity (1) rather than explitly searching for login operation names. 134 | 135 | ``` 136 | SELECT actor.user.uuid, api.operation, array_agg(DISTINCT(src_endpoint.ip) ORDER BY src_endpoint.ip) AS sourceips FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 137 | WHERE actor.user.uuid IS NOT NULL 138 | AND activity_id = 1 139 | AND eventday >= '20230530' 140 | AND eventday <= '20230631' 141 | AND accountid = '111122223333' 142 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 143 | GROUP BY actor.user.uuid, api.operation 144 | ORDER BY api.operation 145 | ``` 146 | 147 | **Query:** User Activity Summary: filter high volume read-only GET/LIST/DECRIBE calls 148 | 149 | ``` 150 | SELECT actor.user.uuid, array_agg(DISTINCT(api.operation)) AS operations, 151 | array_agg(DISTINCT(src_endpoint.ip) ORDER BY src_endpoint.ip) AS sourceips, 152 | array_agg(DISTINCT(http_request.user_agent) ORDER BY http_request.user_agent) AS user_agents FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 153 | WHERE api.operation <> 'AssumeRole' 154 | AND api.operation NOT LIKE 'Get%' 155 | AND api.operation NOT LIKE 'List%' 156 | AND api.operation NOT LIKE 'Describe%' 157 | AND eventday >= '20230530' 158 | AND eventday <= '20230631' 159 | AND accountid = '111122223333' 160 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 161 | GROUP BY actor.user.uuid 162 | ``` 163 | 164 | **Query:** User Activity Summary, including username: filter high volume read-only GET/LIST/DECRIBE calls 165 | > NOTE: this query is similar to the one above, but will include the ARN or the username (for IAM Users) of the principal 166 | 167 | ``` 168 | SELECT actor.user.uuid, actor.user.name, 169 | array_agg(DISTINCT(api.operation) ORDER BY api.operation) AS operations, 170 | array_agg(DISTINCT(src_endpoint.ip) ORDER BY src_endpoint.ip) AS sourceips, 171 | array_agg(DISTINCT(http_request.user_agent) ORDER BY http_request.user_agent) AS user_agents FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 172 | WHERE api.operation <> 'AssumeRole' 173 | AND api.operation NOT LIKE 'Get%' 174 | AND api.operation NOT LIKE 'List%' 175 | AND api.operation NOT LIKE 'Describe%' 176 | AND eventday >= '20230530' 177 | AND eventday <= '20230631' 178 | AND accountid = '111122223333' 179 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 180 | GROUP BY actor.user.uuid, actor.user.uid, actor.user.name 181 | ``` 182 | 183 | **Query:** Search for activity by a specific IAM User 184 | > NOTE: this query is similar to the one above, but will search for just a certain access key that's associated with an IAM User 185 | ``` 186 | SELECT time, eventday, actor.user.uuid, actor.user.name, actor.user.credential_uid, api.operation, unmapped['requestParameters.userName'] as requestParametersUsername, unmapped['requestParameters.policyArn'] as requestParametersPolicyArn, api.response 187 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 188 | WHERE actor.user.type = 'IAMUser' 189 | AND actor.user.name = '{username}' 190 | AND eventday >= '20230530' 191 | AND eventday <= '20230631' 192 | AND accountid = '111122223333' 193 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2'); 194 | ``` 195 | 196 | **Query:** Search for activity associated with a specific IAM User's Access Key 197 | > NOTE: this query is similar to the one above, but will search for just a certain access key that's associated with an IAM User 198 | ``` 199 | SELECT time, eventday, actor.user.uuid, actor.user.name, actor.user.credential_uid, api.operation, unmapped['requestParameters.userName'] as requestParametersUsername, unmapped['requestParameters.policyArn'] as requestParametersPolicyArn, api.response 200 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 201 | WHERE actor.user.type = 'IAMUser' 202 | AND actor.user.credential_uid = '{accesskeyid}' 203 | AND eventday >= '20230530' 204 | AND eventday <= '20230631' 205 | AND accountid = '111122223333' 206 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2'); 207 | ``` 208 | 209 | **Query:** IAM change summary: Filter read-only GET/LIST/DESCRIBE and Filter unsuccessful calls 210 | 211 | ``` 212 | SELECT time, actor.user.uuid, actor.user.name, api.operation, unmapped['requestParameters'] AS request_parameters 213 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 214 | WHERE api.service.name = 'iam.amazonaws.com' 215 | AND api.operation NOT LIKE 'Get%' 216 | AND api.operation NOT LIKE 'List%' 217 | AND api.operation NOT LIKE 'Describe%' 218 | AND api.response.error IS NULL 219 | AND eventday >= '20230530' 220 | AND eventday <= '20230631' 221 | AND accountid = '111122223333' 222 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 223 | ORDER BY accountid, time 224 | ``` 225 | 226 | **Query:** Access Key creations with extract of username and keyid. Filter unsuccessful calls 227 | 228 | ``` 229 | SELECT time, actor.user.uuid, actor.user.name, api.operation, 230 | unmapped['responseElements.accessKey.userName'] AS user_name, 231 | unmapped['responseElements.accessKey.accessKeyId'] AS access_key 232 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 233 | WHERE api.operation = 'CreateAccessKey' 234 | AND api.response.error IS NULL 235 | AND eventday >= '20230530' 236 | AND eventday <= '20230631' 237 | AND accountid = '111122223333' 238 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 239 | ORDER BY accountid, time 240 | ``` 241 | 242 | **Query:** Password changes with extract of username. Filter unsuccessful calls 243 | 244 | ``` 245 | SELECT time, actor.user.uuid, actor.user.name, api.operation, 246 | user.name AS "username with password modified" 247 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 248 | WHERE api.operation IN ('UpdateLoginProfile', 'CreateLoginProfile') 249 | AND api.response.error IS NULL 250 | AND eventday >= '20230530' 251 | AND eventday <= '20230631' 252 | AND accountid = '111122223333' 253 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 254 | ORDER BY accountid, time 255 | ``` 256 | 257 | **Query:** Identify API events made from a public IP (i.e. a non-RFC1918 source IP address from a publicly routed address). Useful to filter internal API calls. 258 | 259 | > NOTE: this is an example of the new IPADDRESS data type added in Athena engine v2 and IP Address contains function added in the Athena engine v3. Be sure that you've [enabled Athena engine v3](https://aws.amazon.com/blogs/big-data/upgrade-to-athena-engine-version-3-to-increase-query-performance-and-access-more-analytics-features/) 260 | 261 | ``` 262 | SELECT * 263 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_cloud_trail_mgmt_1_0" 264 | WHERE src_endpoint.ip <> '' 265 | AND NOT contains('10.0.0.0/8', CAST(src_endpoint.ip AS IPADDRESS)) 266 | AND NOT contains('172.16.0.0/12', CAST(src_endpoint.ip AS IPADDRESS)) 267 | AND NOT contains('192.168.0.0/16', CAST(src_endpoint.ip AS IPADDRESS)) 268 | AND eventday >= '20230530' 269 | AND eventday <= '20230631' 270 | AND accountid = '111122223333' 271 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 272 | ``` 273 | -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/amazon_security_lake_queries/ocsf/amazon_security_lake_queries_route53.md: -------------------------------------------------------------------------------- 1 | 5 | 6 | # Amazon Security Lake Example Queries 7 | 8 | ## Route53 9 | 10 | ### PREVIEW TABLE 11 | 12 | **Query:** Preview first 10 rows with all fields, quick way to verify everything is setup correctly 13 | ``` 14 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 15 | LIMIT 10; 16 | ``` 17 | 18 | ### ROUTE 53 PARTITION TESTS 19 | > **NOTE:** if there are no partition constraints (accountid, region, or eventday) then by default ALL data will be scanned this could lead to costly query, always consider using at least one partition constraint. 20 | > 21 | > Note that this is the case even if you have other constraints in a query (e.g. sourceipaddress = '192.0.2.1'), only constraints using partition fields (eventday, region, accountid) will limit the amount of data scanned. 22 | 23 | 24 | **Query:** Preview first 10 rows with all fields, limited to a single account 25 | ``` 26 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 27 | WHERE accountid = '111122223333' 28 | LIMIT 10; 29 | ``` 30 | 31 | **Query:** Preview first 10 rows with all fields, limited to multiple accounts 32 | ``` 33 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 34 | WHERE accountid in ('111122223333','444455556666','123456789012') 35 | LIMIT 10; 36 | ``` 37 | 38 | 39 | **Query:** Preview first 10 rows with all fields, limited to a single region 40 | ``` 41 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 42 | WHERE region = 'us-east-1' 43 | LIMIT 10; 44 | ``` 45 | 46 | **Query:** Preview first 10 rows with all fields, limited to multiple regions 47 | ``` 48 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 49 | WHERE region in ('us-east-1','us-east-2','us-west-2') 50 | LIMIT 10; 51 | ``` 52 | 53 | **Query:** preview first 10 rows with all fields, limited to a certain date range 54 | > NOTE: eventday format is 'YYYYMMDD' as a string 55 | ``` 56 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 57 | WHERE eventday >= '20230530' 58 | AND eventday <= '20230631' 59 | LIMIT 10; 60 | ``` 61 | 62 | **Query:** Preview first 10 rows with all fields, limited to the past 30 days (relative) 63 | ``` 64 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 65 | WHERE eventday >= date_format(date_add('day',-30,current_timestamp), '%Y%m%d') 66 | LIMIT 10; 67 | ``` 68 | 69 | **Query:** Preview first 10 rows with all fields, limited by a combination of partition constraints 70 | > NOTE: narrowing the scope of the query as much as possible will improve performance and minimize cost 71 | 72 | ``` 73 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 74 | WHERE eventday >= '20230530' 75 | AND eventday <= '20230631' 76 | AND accountid = '111122223333' 77 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 78 | LIMIT 10; 79 | ``` 80 | 81 | ### ROUTE53 ANALYSIS EXAMPLES 82 | 83 | **Query:** Sort queries by the number of isntances that requested each hostname 84 | 85 | ``` 86 | SELECT query.hostname, query.type, cardinality(array_distinct(filter(array_agg(src_endpoint), q -> q.instance_uid IS NOT NULL))) as instance_count, array_distinct(filter(array_agg(src_endpoint), q -> q.instance_uid IS NOT NULL)) as instances 87 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 88 | WHERE eventday >= '20230530' 89 | AND eventday <= '20230631' 90 | AND accountid = '111122223333' 91 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 92 | GROUP BY query.hostname, query.type 93 | ORDER by instance_count DESC; 94 | ``` 95 | 96 | **Query:** Sort queries by the number of queries for each each hostname 97 | 98 | ``` 99 | SELECT query.hostname, query.type, count(*) as query_count FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 100 | WHERE eventday >= '20230530' 101 | AND eventday <= '20230631' 102 | AND accountid = '111122223333' 103 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 104 | GROUP BY query.hostname, query.type 105 | ORDER BY query_count DESC; 106 | ``` 107 | 108 | **Query:** Summary with count of each time an A record type of a hostname was queried 109 | ``` 110 | SELECT query.hostname, query.type, count(*) as query_count FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 111 | WHERE query.type = 'A' 112 | AND eventday >= '20230530' 113 | AND eventday <= '20230631' 114 | AND accountid = '111122223333' 115 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 116 | GROUP BY query.hostname, query.type 117 | ORDER BY query_count DESC; 118 | ``` 119 | 120 | **Query:** Summary with count of each time an A record type of a hostname was queried. Split out TLD and SLD (note: doesn't properly handle TLDs containing a '.' (e.g. .com.br) 121 | 122 | ``` 123 | SELECT element_at(split(query.hostname,'.'),-2) AS tld, 124 | element_at(split(query.hostname,'.'),-3) AS sld, 125 | query.hostname, query.type, 126 | count(*) AS query_count 127 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 128 | WHERE query.type = 'A' 129 | AND eventday >= '20230530' 130 | AND eventday <= '20230631' 131 | AND accountid = '111122223333' 132 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 133 | GROUP BY query.hostname, query.type 134 | ORDER BY query_count DESC; 135 | ``` 136 | 137 | **Query:** Get records that that resolve to a specific IP (e.g., 203.0.113.2) 138 | 139 | ``` 140 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_route53_1_0" 141 | WHERE contains(transform(answers, x-> x.rdata), '203.0.113.2') 142 | AND eventday >= '20230530' 143 | AND eventday <= '20230631' 144 | AND accountid = '111122223333' 145 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2'); 146 | ``` 147 | -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/amazon_security_lake_queries/ocsf/amazon_security_lake_queries_securityhub.md: -------------------------------------------------------------------------------- 1 | 5 | 6 | # Amazon Security Lake Example Queries 7 | 8 | ## Security Hub 9 | 10 | ### PREVIEW SECURITY HUB TABLE 11 | 12 | **Query:** Preview first 10 rows with all fields, quick way to verify everything is setup correctly 13 | 14 | ```SQL 15 | SELECT * 16 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 17 | LIMIT 10; 18 | ``` 19 | 20 | ### SECURITY HUB PARTITION TESTS 21 | 22 | > **NOTE:** if there are no partition constraints (accountid, region, or eventday) then by default ALL data will be scanned this could lead to costly query, always consider using at least one partition constraint. 23 | > 24 | > Note that this is the case even if you have other constraints in a query (e.g. productname = 'Macice'), only constraints using partition fields (eventday, region, accountid) will limit the amount of data scanned. 25 | 26 | **Query:** Preview first 10 rows with all fields, limited to a single account 27 | ```SQL 28 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 29 | WHERE accountid = '111122223333' 30 | LIMIT 10; 31 | ``` 32 | **Query:** Preview first 10 rows with all fields, limited to multiple accounts 33 | ```SQL 34 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 35 | WHERE accountid in ('111122223333','444455556666','123456789012') 36 | LIMIT 10; 37 | ``` 38 | 39 | **Query:** Preview first 10 rows with all fields, limited to a single region 40 | ```SQL 41 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 42 | WHERE region = 'us-east-1' 43 | LIMIT 10; 44 | ``` 45 | 46 | **Query:** Preview first 10 rows with all fields, limited to multiple regions 47 | ```SQL 48 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 49 | WHERE region in ('us-east-1','us-east-2','us-west-2') 50 | LIMIT 10; 51 | ``` 52 | 53 | **Query:** preview first 10 rows with all fields, limited to a certain date range 54 | > NOTE: eventday format is 'YYYYMMDD' as a string 55 | ```SQL 56 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 57 | WHERE eventday >= '20230530' 58 | AND eventday <= '20230631' 59 | LIMIT 10; 60 | ``` 61 | 62 | **Query:** Preview first 10 rows with all fields, limited to the past 30 days (relative) 63 | ```SQL 64 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 65 | WHERE eventday >= date_format(date_add('day',-30,current_timestamp), '%Y%m%d') 66 | LIMIT 10; 67 | ``` 68 | 69 | **Query:** Preview first 10 rows with all fields, limited by a combination of partition constraints 70 | > NOTE: narrowing the scope of the query as much as possible will improve performance and minimize cost 71 | 72 | ```SQL 73 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 74 | WHERE eventday >= '20230530' 75 | AND eventday <= '20230631' 76 | AND accountid = '111122223333' 77 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 78 | LIMIT 10; 79 | ``` 80 | 81 | ## Security Hub Analysis Examples 82 | 83 | ### CONVERT UNIX TIMES TO DATE TIME GROUP (DTG) 84 | The OSCF uses Unix times. You can convert these to a DTG which matches the Security Hub Finding Format 85 | 86 | **Query** Convert the `finding.modified_time` column from Unix time to DTG and change the column name to `UpdatedAt` 87 | ```SQL 88 | SELECT FROM_UNIXTIME(CAST(time AS DOUBLE)/1000.0) AS "Time" 89 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 90 | WHERE cloud.account_uid = '111122223333' 91 | LIMIT 10; 92 | ``` 93 | 94 | ### GET THE CONTROL ID 95 | The data returned from Security Lake for a Security Hub Security Standard findings does not include the Security Standard Control Id, for example `SSM.1`. This can be obtained using `finding.title` which returns the full title and always begins with the Control Id, for example `ECR.2 ECR private repositories should have tag immutability configured`. 96 | 97 | **Query** Use `split_part()` to get the Control Id 98 | ```SQL 99 | SELECT split_part(finding.title,' ',1) AS "ProductFields.ControlId" 100 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 101 | WHERE cloud.account_uid = '111122223333' 102 | LIMIT 10; 103 | ``` 104 | 105 | In this example, `split_part()` is used with the selection of the title `finding.title` and uses the space `' '` to perfom the split and selects the first value `1`, this provides the ControlId, this value is given a column name of `ProductFields.ControlId` 106 | 107 | ### CREATE OR REPLACE A VIEW 108 | > NOTE: This view will be used for the following queries indicated by `FROM "sh_findings_view"` 109 | 110 | **Query** Create or replace a view called `sh_findings_view` which flattens the OCSF structure and maps column names to the AWS Security Hub Finding Format. This is just an example and not an exhaustive mapping of all the columns in the schema. 111 | ```SQL 112 | CREATE OR REPLACE VIEW "sh_findings_view" AS 113 | SELECT 114 | metadata.product.version "SchemaVersion", 115 | metadata.product.feature.uid "GeneratorId", 116 | metadata.product.feature.name "ProductName", 117 | metadata.product.uid "ProductArn", 118 | metadata.product.vendor_name "CompanyName", 119 | metadata.product.name "Security Hub", 120 | metadata.version "Version", 121 | FROM_UNIXTIME(CAST(time AS DOUBLE)/1000.0) AS "Time", 122 | confidence "Confidence", 123 | severity "Severity", 124 | state "WorkflowStatus", 125 | cloud.account_uid "AWSAccountId", 126 | element_at(resources,1).type "ResourceType", 127 | element_at(resources,1).uid "ResourceId", 128 | element_at(resources,1).cloud_partition "ResourcePartition", 129 | element_at(resources,1).region "ResourceRegion", 130 | element_at(resources,1).details "resources.details", 131 | FROM_UNIXTIME(CAST(finding.created_time AS DOUBLE)/1000.0) AS "CreatedAt", 132 | finding.uid "Id", 133 | finding.desc "Description", 134 | finding.title "Title", 135 | split_part(finding.title,' ',1) AS "ProductFields.ControlId", 136 | FROM_UNIXTIME(CAST(finding.modified_time AS DOUBLE)/1000.0) AS "UpdatedAt", 137 | FROM_UNIXTIME(CAST(finding.first_seen_time AS DOUBLE)/1000.0) AS "FirstObservedAt", 138 | FROM_UNIXTIME(CAST(finding.last_seen_time AS DOUBLE)/1000.0) AS "LastObservedAt", 139 | element_at(finding.types,1) "Types", 140 | finding.remediation.desc "RecomendationText", 141 | finding.src_url "RecommendationUrl", 142 | compliance.status "ComplianceStatus", 143 | compliance.status_detail "Compliance.StatusReasons.Description", 144 | class_name "SecurityFinding", 145 | unmapped['Severity_Normalized'] "SeverityNormalized", 146 | unmapped['Severity_Original'] "SeverityOriginal", 147 | unmapped['FindingProviderFields'] "FindingProviderFields", 148 | region "Region", 149 | accountid "AccountId", 150 | eventday "EventDay" 151 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_sh_findings_1_0" 152 | ``` 153 | 154 | ### SELECT SECURITY FINDINGS FROM OTHER INTEGRATION PROVIDERS 155 | 156 | Security Hub has integrations with lots of other AWS security services such as GuardDuty and Config. It also has integrations with other 3rd party security tools which can send findings to AWS Security Hub. You might want to select the findings from one or more security services. 157 | 158 | **Query** Select findings from Amazon Macie and only return the first 10 results using the `sh_findings_view` 159 | ```SQL 160 | SELECT * 161 | FROM "sh_findings_view" 162 | WHERE lower(ProductName) = 'macie' 163 | AND AWSAccountId = '111122223333' 164 | LIMIT 10; 165 | ``` 166 | **Query** Select findings from Amazon Macie and AWS Config and only return 10 results 167 | 168 | ```SQL 169 | SELECT * 170 | FROM "sh_findings_view" 171 | WHERE lower(ProductName) in ('macie', 'config') 172 | AND AWSAccountId = '111122223333' 173 | LIMIT 10; 174 | ``` 175 | 176 | ***Query** Select `Config` findings from the `sh_findings_view` view 177 | ```SQL 178 | SELECT *, 179 | CASE 180 | WHEN lower(Severity) = 'high' THEN 1 181 | WHEN lower(Severity) = 'medium' THEN 2 182 | WHEN lower(Severity) = 'low' THEN 3 183 | ELSE 4 184 | END as display_order 185 | FROM "sh_findings_view" 186 | WHERE lower(ProductName) = 'config' 187 | AND AWSAccountId = '111122223333' 188 | AND lower(WorkFlowStatus) = 'new' 189 | AND lower(Severity) in ('high', 'medium', 'low') 190 | ORDER BY display_order 191 | LIMIT 10; 192 | ``` 193 | -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/amazon_security_lake_queries/ocsf/amazon_security_lake_queries_vpcflow.md: -------------------------------------------------------------------------------- 1 | 5 | 6 | # Amazon Security Lake Example Queries 7 | 8 | ## VPC Flow 9 | 10 | ### PREVIEW TABLE 11 | 12 | **Query:** Preview first 10 rows with all fields, quick way to verify everything is setup correctly 13 | 14 | ``` 15 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 16 | LIMIT 10; 17 | ``` 18 | 19 | ### VPCFLOW PARTITION TESTS 20 | > **NOTE:** if there are no partition constraints (accountid, region, or eventday) then by default ALL data will be scanned this could lead to costly query, always consider using at least one partition constraint. 21 | > 22 | > Note that this is the case even if you have other constraints in a query (e.g. sourceipaddress = '192.0.2.1'), only constraints using partition fields (eventday, region, accountid) will limit the amount of data scanned. 23 | 24 | **Query:** Preview first 10 rows with all fields, limited to a single account 25 | 26 | ``` 27 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 28 | WHERE accountid = '111122223333' 29 | LIMIT 10; 30 | ``` 31 | 32 | **Query:** Preview first 10 rows with all fields, limited to multiple accounts 33 | ``` 34 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 35 | WHERE accountid in ('111122223333','444455556666','123456789012') 36 | LIMIT 10; 37 | ``` 38 | 39 | **Query:** Preview first 10 rows with all fields, limited to a single region 40 | ``` 41 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 42 | WHERE region = 'us-east-1' 43 | LIMIT 10; 44 | ``` 45 | 46 | **Query:** Preview first 10 rows with all fields, limited to multiple regions 47 | ``` 48 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 49 | WHERE region in ('us-east-1','us-east-2','us-west-2') 50 | LIMIT 10; 51 | ``` 52 | 53 | **Query:** Preview first 10 rows with all fields, limited to a certain date range 54 | > NOTE: eventday format is 'YYYYMMDD' as a string 55 | 56 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 57 | WHERE eventday >= '20230530' 58 | AND eventday <= '20230631' 59 | LIMIT 10; 60 | 61 | **Query:** Preview first 10 rows with all fields, limited to the past 30 days (relative) 62 | ``` 63 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 64 | WHERE eventday >= date_format(date_add('day',-30,current_timestamp), '%Y%m%d') 65 | LIMIT 10; 66 | ``` 67 | 68 | **Query:** Preview first 10 rows with all fields, limited by a combination partition constraints 69 | 70 | > NOTE: narrowing the scope of the query as much as possible will improve performance and minimize cost 71 | ``` 72 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 73 | WHERE eventday >= '20230530' 74 | AND eventday <= '20230631' 75 | AND accountid = '111122223333' 76 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 77 | LIMIT 10; 78 | ``` 79 | ### VPC FLOW ANALYSIS EXAMPLES 80 | 81 | > NOTE: default partition constraints have been provided for each query, be sure to add the appropriate partition constraints to the WHERE clause as shown in the section above 82 | 83 | > DEFAULT partition constraints: 84 | ``` 85 | WHERE eventday >= '20230530' 86 | AND eventday <= '20230631' 87 | AND accountid = '111122223333' 88 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 89 | ``` 90 | 91 | **Query:** Get list source/destination IP pairs ordered by the number of records 92 | ``` 93 | SELECT region, src_endpoint.ip as src_ip, dst_endpoint.ip as dst_ip, count(*) as record_count FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 94 | WHERE eventday >= '20230530' 95 | AND eventday <= '20230631' 96 | AND accountid = '111122223333' 97 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 98 | GROUP BY region, src_endpoint.ip, dst_endpoint.ip 99 | ORDER BY record_count DESC 100 | ``` 101 | 102 | **Query:** Get a summary of records between a given source/destination IP pair, ordered by the total number of bytes 103 | 104 | ``` 105 | SELECT region, src_endpoint.ip as src_ip, dst_endpoint.ip as dst_ip, sum(traffic.bytes) as byte_count FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 106 | WHERE (src_endpoint.ip = '192.0.2.1' OR dst_endpoint.ip = '192.0.2.1') 107 | AND (src_endpoint.ip = '203.0.113.2' OR dst_endpoint.ip = '203.0.113.2') 108 | AND eventday >= '20230530' 109 | AND eventday <= '20230631' 110 | AND accountid = '111122223333' 111 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 112 | GROUP BY region, dst_endpoint.instance_uid, src_endpoint.ip, dst_endpoint.ip 113 | ORDER BY byte_count DESC 114 | ``` 115 | 116 | **Query:** Get a summary of the number of bytes sent from port 443 limited to a single instance 117 | > NOTE: for remote IPs this represents the amount data downloaded from port 443 by the instance, for instance IPs this represents the amount data downloaded by remost hosts from the instance on port 443 118 | 119 | ``` 120 | SELECT region, dst_endpoint.instance_uid as dst_instance_id, src_endpoint.ip as src_ip, src_endpoint.port as src_port, dst_endpoint.ip as dst_ip, sum(traffic.bytes) as byte_count FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 121 | WHERE dst_endpoint.instance_uid = 'i-000000000000000' 122 | AND src_endpoint.port = 443 123 | AND eventday >= '20230530' 124 | AND eventday <= '20230631' 125 | AND accountid = '111122223333' 126 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 127 | GROUP BY region, dst_endpoint.instance_uid, src_endpoint.ip, src_endpoint.port, dst_endpoint.ip 128 | ORDER BY byte_count DESC 129 | ``` 130 | 131 | **Query:** Get a summary with the number of bytes for each src_ip,src_port,dst_ip,dst_port quad across all records to or from a specific IP 132 | 133 | ``` 134 | SELECT src_endpoint.ip as src_ip, dst_endpoint.ip as dst_ip, src_endpoint.port as src_port, dst_endpoint.port as dst_port, sum(traffic.bytes) as byte_count FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 135 | WHERE (src_endpoint.ip = '192.0.2.1' OR dst_endpoint.ip = '192.0.2.1') 136 | AND eventday >= '20230530' 137 | AND eventday <= '20230631' 138 | AND accountid = '111122223333' 139 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 140 | GROUP BY src_endpoint.ip, dst_endpoint.ip, src_endpoint.port, dst_endpoint.port 141 | ORDER BY byte_count DESC 142 | ``` 143 | 144 | **Query:** Get all flow records between two IPs showing connection_info.direction 145 | ``` 146 | SELECT from_unixtime(start_time/1000) AS start_time, 147 | from_unixtime(end_time/1000) AS end_time, 148 | src_endpoint.ip, 149 | dst_endpoint.ip, 150 | src_endpoint.port, 151 | dst_endpoint.port, 152 | traffic.packets, 153 | traffic.bytes, 154 | connection_info.direction, 155 | activity_name 156 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 157 | WHERE (src_endpoint.ip = '192.0.2.1' 158 | AND dst_endpoint.ip = '192.0.2.254') 159 | OR (src_endpoint.ip = '192.0.2.254' 160 | AND dst_endpoint.ip = '192.0.2.1') 161 | ORDER BY start_time ASC 162 | ``` 163 | 164 | **Query:** List when source ips were first seen / last seen with a summary of destination ip/instances/ports 165 | ``` 166 | SELECT src_endpoint.ip, 167 | from_unixtime(min(start_time)/1000) AS first_seen, 168 | from_unixtime(max(end_time)/1000) AS last_seen, 169 | array_agg(DISTINCT(dst_endpoint.ip)), 170 | array_agg(DISTINCT(dst_endpoint.instance_uid)), 171 | array_agg(DISTINCT(dst_endpoint.port)) 172 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 173 | WHERE dst_endpoint.port < 32768 -- skip ephemeral ports, since we're looking for inbound connections to service ports 174 | AND eventday >= '20230530' 175 | AND eventday <= '20230631' 176 | AND accountid = '111122223333' 177 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 178 | GROUP BY src_endpoint.ip 179 | ORDER by first_seen ASC 180 | ``` 181 | 182 | 183 | **Query:** Transfer Report on Top 10 Internal IPs with large transfers, limited to source addresses in network 192.0.2.0/24 184 | 185 | ``` 186 | SELECT vpcflow.eventday, vpcflow.src_endpoint.ip as src_endpoint_ip, vpcflow.dst_endpoint.ip as dst_endpoint_ip, sum(vpcflow.traffic.bytes) as byte_count 187 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" as vpcflow 188 | INNER JOIN (SELECT src_endpoint.ip as src_endpoint_ip, sum(traffic.bytes) as byte_count FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 189 | WHERE src_endpoint.ip <> '-' 190 | AND contains('192.0.2.0/24', cast(src_endpoint.ip as IPADDRESS)) 191 | AND eventday >= '20230530' 192 | AND eventday <= '20230631' 193 | AND accountid = '111122223333' 194 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 195 | GROUP BY region, dst_endpoint.instance_uid, src_endpoint.ip, dst_endpoint.ip, dst_endpoint.port 196 | ORDER BY byte_count DESC 197 | LIMIT 10 ) as top_n 198 | ON top_n.src_endpoint_ip = vpcflow.src_endpoint.ip 199 | WHERE eventday >= '20230530' 200 | AND eventday <= '20230631' 201 | AND accountid = '111122223333' 202 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 203 | GROUP BY vpcflow.eventday, vpcflow.src_endpoint.ip, vpcflow.dst_endpoint.ip 204 | ORDER BY vpcflow.eventday ASC, vpcflow.src_endpoint.ip ASC, vpcflow.dst_endpoint.ip ASC, byte_count DESC 205 | ``` 206 | 207 | **Query:** Search for traffic between a private (RFC1918) IP address and a public (non-RFC1918) IP address 208 | > NOTE: this is an example of the new IPADDRESS data type added in Athena engine v2 and IP Address contains function added in the Athena engine v3. Be sure that you've [enabled Athena engine v3](https://aws.amazon.com/blogs/big-data/upgrade-to-athena-engine-version-3-to-increase-query-performance-and-access-more-analytics-features/) 209 | 210 | ``` 211 | SELECT * 212 | FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 213 | WHERE src_endpoint.ip <> '-' 214 | AND dst_endpoint.ip <> '-' 215 | AND ( 216 | ( 217 | NOT ( 218 | contains( 219 | '10.0.0.0/8', 220 | (CAST(src_endpoint.ip AS IPADDRESS)) 221 | ) 222 | OR contains( 223 | '172.16.0.0/12', 224 | (CAST(src_endpoint.ip AS IPADDRESS)) 225 | ) 226 | OR contains( 227 | '192.168.0.0/16', 228 | (CAST(src_endpoint.ip AS IPADDRESS)) 229 | ) 230 | ) 231 | AND ( 232 | contains( 233 | '10.0.0.0/8', 234 | (CAST(dst_endpoint.ip AS IPADDRESS)) 235 | ) 236 | OR contains( 237 | '172.16.0.0/12', 238 | (CAST(dst_endpoint.ip AS IPADDRESS)) 239 | ) 240 | OR contains( 241 | '192.168.0.0/16', 242 | (CAST(dst_endpoint.ip AS IPADDRESS)) 243 | ) 244 | ) 245 | ) 246 | OR ( 247 | NOT ( 248 | contains( 249 | '10.0.0.0/8', 250 | (CAST(dst_endpoint.ip AS IPADDRESS)) 251 | ) 252 | OR contains( 253 | '172.16.0.0/12', 254 | (CAST(dst_endpoint.ip AS IPADDRESS)) 255 | ) 256 | OR contains( 257 | '192.168.0.0/16', 258 | (CAST(dst_endpoint.ip AS IPADDRESS)) 259 | ) 260 | ) 261 | AND ( 262 | contains( 263 | '10.0.0.0/8', 264 | (CAST(src_endpoint.ip AS IPADDRESS)) 265 | ) 266 | OR contains( 267 | '172.16.0.0/12', 268 | (CAST(src_endpoint.ip AS IPADDRESS)) 269 | ) 270 | OR contains( 271 | '192.168.0.0/16', 272 | (CAST(src_endpoint.ip AS IPADDRESS)) 273 | ) 274 | ) 275 | ) 276 | ) 277 | AND eventday >= '20230530' 278 | AND eventday <= '20230631' 279 | AND accountid = '111122223333' 280 | AND region in ('us-east-1','us-east-2','us-west-2','us-west-2') 281 | ``` 282 | 283 | 284 | **Query:** Search for all internal-to-internal VPC Flow records for the internal VPC Subnets in the private 172.16.0.0/12 address space 285 | > NOTE: this is an example of the new IPADDRESS data type added in Athena engine v2 and IP Address contains function added in the Athena engine v3. Be sure that you've [enabled Athena engine v3](https://aws.amazon.com/blogs/big-data/upgrade-to-athena-engine-version-3-to-increase-query-performance-and-access-more-analytics-features/) 286 | 287 | ``` 288 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 289 | WHERE src_endpoint.ip <> '-' 290 | AND dst_endpoint.ip <> '-' 291 | AND contains('172.16.0.0/12', cast(src_endpoint.ip as IPADDRESS)) 292 | AND contains('172.16.0.0/12', cast(dst_endpoint.ip as IPADDRESS)) 293 | AND eventday >= '20230530' 294 | AND eventday <= '20230631' 295 | AND accountid = '111122223333' 296 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 297 | ``` 298 | 299 | **Query:** Search for all VPC Flow records _except_ the internal-to-internal records for VPC Subnets in the private 172.16.0.0/12 address space. Useful to filter out internal VPC traffic and only show traffic to or from external IP Addresses. 300 | 301 | ``` 302 | SELECT * FROM "amazon_security_lake_glue_db_us_east_1"."amazon_security_lake_table_us_east_1_vpc_flow_1_0" 303 | WHERE src_endpoint.ip <> '-' 304 | AND dst_endpoint.ip <> '-' 305 | AND NOT ( 306 | contains('172.16.0.0/12', cast(src_endpoint.ip as IPADDRESS)) 307 | AND contains('172.16.0.0/12', cast(dst_endpoint.ip as IPADDRESS)) 308 | ) 309 | AND eventday >= '20230530' 310 | AND eventday <= '20230631' 311 | AND accountid = '111122223333' 312 | AND region in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 313 | ``` 314 | -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/cfn/Athena_IAM_setup.yml: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: Apache-2.0 3 | 4 | --- 5 | AWSTemplateFormatVersion: 2010-09-09 6 | Description: >- 7 | Athena IAM roles and policies initial setup 8 | Parameters: 9 | ParamLogSourceLocations: 10 | Description: "Full path(s) for logs Athena will query in the form '//AWSLogs/' (comma seperated, no spaces between values)" 11 | Type: String 12 | Default: 'bucket_name/prefix/AWSLogs/,bucket_name2/prefix2/AWSLogs/' 13 | ParamQueryOutputLocation: 14 | Description: "Full path for Athena output in the form '//'" 15 | Type: String 16 | Default: 'query_history_bucket/optional_prefix/' 17 | ParamAllBucketNames: 18 | Type: String 19 | Description: 'The name of all buckets, including log buckets and Athena output bucket (comma seperated, no spaces between values)' 20 | Default: 'log_bucket_1,log_bucket_2,output_bucket' 21 | Resources: 22 | ResourceAthenaSecurityAnalystRole: 23 | Type: AWS::IAM::Role 24 | Metadata: 25 | Comment: "TODO: Update the Assume Role Policy Document according to your organization's policies (e.g. from org, from account, using SSO or identity federation, etc.)" 26 | DeletionPolicy: Delete 27 | Properties: 28 | RoleName: AthenaSecurityAnalystRole 29 | AssumeRolePolicyDocument: 30 | Version: 2012-10-17 31 | Statement: 32 | # TODO: Replace this with the approriate least-privilege AssumeRole policy for your organization 33 | # Reference: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_permissions-to-switch.html#roles-usingrole-createpolicy 34 | - Effect: Deny 35 | Principal: 36 | AWS: !Sub 'arn:aws:iam::${AWS::AccountId}:root' 37 | Action: 'sts:AssumeRole' 38 | Path: / 39 | ResourceAthenaSecurityAdminRole: 40 | Type: AWS::IAM::Role 41 | Metadata: 42 | Comment: "TODO: Update the Assume Role Policy Document according to your organization's policies (e.g. from org, from account, using SSO or identity federation, etc.)" 43 | DeletionPolicy: Delete 44 | Properties: 45 | RoleName: AthenaSecurityAdminRole 46 | AssumeRolePolicyDocument: 47 | Version: 2012-10-17 48 | Statement: 49 | # TODO: Replace this with the approriate least-privilege AssumeRole policy for your organization 50 | # Reference: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_permissions-to-switch.html#roles-usingrole-createpolicy 51 | - Effect: Deny 52 | Principal: 53 | AWS: !Sub 'arn:aws:iam::${AWS::AccountId}:root' 54 | Action: 'sts:AssumeRole' 55 | Path: / 56 | ResourceAthenaSecurityAnalystPolicy: 57 | Type: AWS::IAM::Policy 58 | Properties: 59 | PolicyName: AthenaSecurityAnalystPolicy 60 | PolicyDocument: 61 | Version: '2012-10-17' 62 | Statement: 63 | - Sid: SecurityNamedQueryFullAccess 64 | Effect: Allow 65 | Action: 66 | - athena:BatchGetNamedQuery 67 | - athena:CreateNamedQuery 68 | - athena:DeleteNamedQuery 69 | - athena:GetNamedQuery 70 | - athena:ListNamedQueries 71 | Resource: 72 | # Note: The first '*' in each resource below is to allow the actions above for Athena Workgroup resources in all regions, if only one region will be used you may specify the specific region or use ${AWS::Region} to specify the current region 73 | # The second '*' in each resource below is to allow the actions above for Athena Workgroup resources which start with 'Security' or 'security_' as a standard naming scheme, which allows flexibility if you want to use multiple workgroups. 74 | # If you only plan to use a single workgroup, you may specify the name here and replace 'Security*' or 'security_' with the explicit name(s) of the workgroups you plan to use. 75 | - !Sub 'arn:aws:athena:*:${AWS::AccountId}:workgroup/Security*' 76 | - !Sub 'arn:aws:athena:*:${AWS::AccountId}:workgroup/security_*' 77 | - Sid: SecurityWorkgroupReadOnly 78 | Effect: Allow 79 | Action: 80 | - athena:GetWorkGroup 81 | - athena:ListWorkGroups 82 | - athena:BatchGetQueryExecution 83 | - athena:GetQueryExecution 84 | - athena:GetQueryResults 85 | - athena:GetQueryResultsStream 86 | - athena:ListQueryExecutions 87 | - athena:ListTagsForResource 88 | - athena:StartQueryExecution 89 | - athena:StopQueryExecution 90 | Resource: 91 | # Note: The first '*' in each resource below is to allow the actions above for Athena Workgroup resources in all regions, if only one region will be used you may specify the specific region or use ${AWS::Region} to specify the current region 92 | # The second '*' in each resource below is to allow the actions above for Athena Workgroup resources which start with 'Security' or 'security_' as a standard naming scheme, which allows flexibility if you want to use multiple workgroups. 93 | # If you only plan to use a single workgroup, you may specify the name here and replace 'Security*' or 'security_' with the explicit name(s) of the workgroups you plan to use. 94 | - !Sub 'arn:aws:athena:*:${AWS::AccountId}:workgroup/Security*' 95 | - !Sub 'arn:aws:athena:*:${AWS::AccountId}:workgroup/security_*' 96 | - Sid: SecurityAthenaDataCatalogReadOnly 97 | Effect: Allow 98 | Action: 99 | - athena:GetDataCatalog 100 | - athena:ListDataCatalogs 101 | - athena:GetDatabase 102 | - athena:ListDatabases 103 | - athena:GetTableMetadata 104 | - athena:ListTableMetadata 105 | Resource: 106 | # Note: The first '*' in each resource below is to allow the actions above for Athena Data Catalog resources in all regions, if only one region will be used you may specify the specific region or use ${AWS::Region} to specify the current region 107 | # The second '*' in each resource below is to allow the actions above for all Athena Data Catalog resources which start with 'Security' or 'security_' as a standard naming scheme, which allows flexibility if you want to use multiple data catalogs. 108 | # If you only plan to use a single data catalog, you may specify the name here and replace 'Security*' or 'security_' with the explicit name(s) of the data catalog you plan to use. 109 | - !Sub 'arn:aws:athena:*:${AWS::AccountId}:datacatalog/Security*' 110 | - !Sub 'arn:aws:athena:*:${AWS::AccountId}:datacatalog/security_*' 111 | - Sid: SecurityGlueDatabaseReadOnly 112 | Effect: Allow 113 | Action: 114 | - glue:GetDatabase 115 | - glue:GetDatabases 116 | Resource: 117 | # Note: The first '*' in each resource below is to allow the actions above for Glue Database resources in all regions, if only one region will be used you may specify the specific region or use ${AWS::Region} to specify the current region 118 | # The second '*' in each resource below is to allow the actions above for all Glue Database resources which start with 'Security' or 'security_' as a standard naming scheme, which allows flexibility if you want to use multiple databases. 119 | # If you only plan to use a single database, you may specify the name here and replace 'Security*' or 'security_' with the explicit name(s) of the database you plan to use. 120 | - !Sub 'arn:aws:glue:*:${AWS::AccountId}:database/Security*' 121 | - !Sub 'arn:aws:glue:*:${AWS::AccountId}:database/security_*' 122 | - Sid: SecurityGlueTableReadOnly 123 | Effect: Allow 124 | Action: 125 | - glue:GetTable 126 | - glue:GetTables 127 | Resource: 128 | # Note: The first '*' in each resource below is to allow the actions above for Glue Table resources in all regions, if only one region will be used you may specify the specific region or use ${AWS::Region} to specify the current region 129 | # The second '*' in each resource below is to allow the actions above for all Glue Table resources are associated with a Glue Database that starts with 'Security' or 'security_' as a standard naming scheme, which allows flexibility if you want to use multiple databases. 130 | # Note that Glue Table ARNs are in the format arn:aws:glue:region:account-id:table/database name/table name, so this will allow any table naming scheme as long as it is associated with a Glue Database staringing with 'Security' or 'security_'. 131 | # If you only plan to use a single database, you may specify the name here and replace 'Security*' or 'security_' with the explicit name(s) of the database you plan to use. 132 | # You may also specify the full Database/Table name explicitly in the ARN if you know would like to limit the actions to only those tables explicitly. 133 | - !Sub 'arn:aws:glue:*:${AWS::AccountId}:table/Security*' 134 | - !Sub 'arn:aws:glue:*:${AWS::AccountId}:table/security_*' 135 | - Sid: SecurityGluePartitionReadOnly 136 | Effect: Allow 137 | Action: 138 | - glue:BatchGetPartition 139 | - glue:GetPartition 140 | - glue:GetPartitions 141 | Resource: 142 | # Note: The first '*' in each resource below is to allow the actions above for Glue Database resources in all regions, if only one region will be used you may specify the specific region or use ${AWS::Region} to specify the current region 143 | # The second '*' in each resource below is to allow the actions above for all Glue Database resources which start with 'Security' or 'security_' as a standard naming scheme, which allows flexibility if you want to use multiple databases. 144 | # If you only plan to use a single database, you may specify the name here and replace 'Security*' or 'security_' with the explicit name(s) of the database you plan to use. 145 | - !Sub 'arn:aws:glue:*:${AWS::AccountId}:database/Security*' 146 | - !Sub 'arn:aws:glue:*:${AWS::AccountId}:database/security_*' 147 | - Sid: AthenaOutputBucketReadWrite 148 | Effect: Allow 149 | Action: 150 | - s3:AbortMultipartUpload 151 | - s3:GetBucketLocation 152 | - s3:GetObject 153 | - s3:ListBucket 154 | - s3:ListBucketMultipartUploads 155 | - s3:ListMultipartUploadParts 156 | - s3:PutObject 157 | Resource: 158 | # Note: The '*' in each resource below is to allow the actions above on all S3 Objects in the specified S3 Bucket. If you'd like to limit it to a specific path you could for example specify '/athena/*' 159 | - !Join [ '', ['arn:aws:s3:::', !Ref ParamQueryOutputLocation, '*']] 160 | - Sid: LogSourceBucketReadOnly 161 | Effect: Allow 162 | Action: 163 | - s3:GetObject 164 | - s3:ListBucket 165 | # Note: The '*' in each resources below is to allow the actions above on all S3 Objects in the specified S3 Buckets. If you'd like to limit it to a specific path you could for example specify '/AWSLogs/*' 166 | Resource: !Split ["," , !Join [ "*,arn:aws:s3:::" , !Split [",", !Join [ '', ['arn:aws:s3:::', !Ref ParamLogSourceLocations, '*']]]]] 167 | - Sid: ListLogAndOutputBuckets 168 | Effect: Allow 169 | Action: 170 | - s3:GetBucketLocation 171 | - s3:ListBucket 172 | Resource: !Split ["," , !Join [ ",arn:aws:s3:::" , !Split [",", !Join [ '', ['arn:aws:s3:::', !Ref ParamAllBucketNames]]]]] 173 | Roles: 174 | - !Ref ResourceAthenaSecurityAnalystRole 175 | ResourceAthenaSecurityAdminPolicy: 176 | Type: AWS::IAM::Policy 177 | Properties: 178 | PolicyName: AthenaSecurityAdminPolicy 179 | PolicyDocument: 180 | Version: '2012-10-17' 181 | Statement: 182 | - Sid: SecurityNamedQueryFullAccess 183 | Effect: Allow 184 | Action: 185 | - athena:BatchGetNamedQuery 186 | - athena:CreateNamedQuery 187 | - athena:DeleteNamedQuery 188 | - athena:GetNamedQuery 189 | - athena:ListNamedQueries 190 | Resource: 191 | # Note: The first '*' in each resource below is to allow the actions above for Athena Workgroup resources in all regions, if only one region will be used you may specify the specific region or use ${AWS::Region} to specify the current region 192 | # The second '*' in each resource below is to allow the actions above for Athena Workgroup resources which start with 'Security' or 'security_' as a standard naming scheme, which allows flexibility if you want to use multiple workgroups. 193 | # If you only plan to use a single workgroup, you may specify the name here and replace 'Security*' or 'security_' with the explicit name(s) of the workgroups you plan to use. 194 | - !Sub 'arn:aws:athena:*:${AWS::AccountId}:workgroup/Security*' 195 | - !Sub 'arn:aws:athena:*:${AWS::AccountId}:workgroup/security_*' 196 | - Sid: SecurityWorkgroupFullAccess 197 | Effect: Allow 198 | Action: 199 | - athena:CreateWorkGroup 200 | - athena:DeleteWorkGroup 201 | - athena:GetWorkGroup 202 | - athena:ListWorkGroups 203 | - athena:UpdateWorkGroup 204 | - athena:BatchGetQueryExecution 205 | - athena:GetQueryExecution 206 | - athena:GetQueryResults 207 | - athena:GetQueryResultsStream 208 | - athena:ListQueryExecutions 209 | - athena:ListTagsForResource 210 | - athena:StartQueryExecution 211 | - athena:StopQueryExecution 212 | Resource: 213 | # Note: The first '*' in each resource below is to allow the actions above for Athena Workgroup resources in all regions, if only one region will be used you may specify the specific region or use ${AWS::Region} to specify the current region 214 | # The second '*' in each resource below is to allow the actions above for Athena Workgroup resources which start with 'Security' or 'security_' as a standard naming scheme, which allows flexibility if you want to use multiple workgroups. 215 | # If you only plan to use a single workgroup, you may specify the name here and replace 'Security*' or 'security_' with the explicit name(s) of the workgroups you plan to use. 216 | - !Sub 'arn:aws:athena:*:${AWS::AccountId}:workgroup/Security*' 217 | - !Sub 'arn:aws:athena:*:${AWS::AccountId}:workgroup/security_*' 218 | - Sid: SecurityAthenaDataCatalogFullAccess 219 | Effect: Allow 220 | Action: 221 | - athena:CreateDataCatalog 222 | - athena:DeleteDataCatalog 223 | - athena:GetDataCatalog 224 | - athena:ListDataCatalogs 225 | - athena:UpdateDataCatalog 226 | - athena:GetDatabase 227 | - athena:ListDatabases 228 | - athena:GetTableMetadata 229 | - athena:ListTableMetadata 230 | Resource: 231 | # Note: The first '*' in each resource below is to allow the actions above for Athena Data Catalog resources in all regions, if only one region will be used you may specify the specific region or use ${AWS::Region} to specify the current region 232 | # The second '*' in each resource below is to allow the actions above for all Athena Data Catalog resources which start with 'Security' or 'security_' as a standard naming scheme, which allows flexibility if you want to use multiple data catalogs. 233 | # If you only plan to use a single data catalog, you may specify the name here and replace 'Security*' or 'security_' with the explicit name(s) of the data catalog you plan to use. 234 | - !Sub 'arn:aws:athena:*:${AWS::AccountId}:datacatalog/Security*' 235 | - !Sub 'arn:aws:athena:*:${AWS::AccountId}:datacatalog/security_*' 236 | - Sid: SecurityGlueDatabaseFullAccess 237 | Effect: Allow 238 | Action: 239 | - glue:CreateDatabase 240 | - glue:DeleteDatabase 241 | - glue:GetDatabase 242 | - glue:GetDatabases 243 | - glue:UpdateDatabase 244 | Resource: 245 | # Note: The first '*' in each resource below is to allow the actions above for Glue Database resources in all regions, if only one region will be used you may specify the specific region or use ${AWS::Region} to specify the current region 246 | # The second '*' in each resource below is to allow the actions above for all Glue Database resources which start with 'Security' or 'security_' as a standard naming scheme, which allows flexibility if you want to use multiple databases. 247 | # If you only plan to use a single database, you may specify the name here and replace 'Security*' or 'security_' with the explicit name(s) of the database you plan to use. 248 | - !Sub 'arn:aws:glue:*:${AWS::AccountId}:database/Security*' 249 | - !Sub 'arn:aws:glue:*:${AWS::AccountId}:database/security_*' 250 | - Sid: SecurityGlueTableFullAccess 251 | Effect: Allow 252 | Action: 253 | - glue:BatchDeleteTable 254 | - glue:CreateTable 255 | - glue:DeleteTable 256 | - glue:GetTable 257 | - glue:GetTables 258 | - glue:UpdateTable 259 | Resource: 260 | # Note: The first '*' in each resource below is to allow the actions above for Glue Table resources in all regions, if only one region will be used you may specify the specific region or use ${AWS::Region} to specify the current region 261 | # The second '*' in each resource below is to allow the actions above for all Glue Table resources are associated with a Glue Database that starts with 'Security' or 'security_' as a standard naming scheme, which allows flexibility if you want to use multiple databases. 262 | # Note that Glue Table ARNs are in the format arn:aws:glue:region:account-id:table/database name/table name, so this will allow any table naming scheme as long as it is associated with a Glue Database staringing with 'Security' or 'security_'. 263 | # If you only plan to use a single database, you may specify the name here and replace 'Security*' or 'security_' with the explicit name(s) of the database you plan to use. 264 | # You may also specify the full Database/Table name explicitly in the ARN if you know would like to limit the actions to only those tables explicitly. 265 | - !Sub 'arn:aws:glue:*:${AWS::AccountId}:table/Security*' 266 | - !Sub 'arn:aws:glue:*:${AWS::AccountId}:table/security_*' 267 | - Sid: SecurityGluePartitionReadWrite 268 | Effect: Allow 269 | Action: 270 | - glue:BatchCreatePartition 271 | - glue:BatchDeletePartition 272 | - glue:BatchGetPartition 273 | - glue:CreatePartition 274 | - glue:DeletePartition 275 | - glue:GetPartition 276 | - glue:GetPartitions 277 | - glue:UpdatePartition 278 | Resource: 279 | # Note: The first '*' in each resource below is to allow the actions above for Glue Database resources in all regions, if only one region will be used you may specify the specific region or use ${AWS::Region} to specify the current region 280 | # The second '*' in each resource below is to allow the actions above for all Glue Database resources which start with 'Security' or 'security_' as a standard naming scheme, which allows flexibility if you want to use multiple databases. 281 | # If you only plan to use a single database, you may specify the name here and replace 'Security*' or 'security_' with the explicit name(s) of the database you plan to use. 282 | - !Sub 'arn:aws:glue:*:${AWS::AccountId}:database/Security*' 283 | - !Sub 'arn:aws:glue:*:${AWS::AccountId}:database/security_*' 284 | - Sid: SecurityAthenaTagResources 285 | Effect: Allow 286 | Action: 287 | - athena:TagResource 288 | - athena:UntagResource 289 | Resource: 290 | # Note: The first '*' in each resource below is to allow the actions above for Athena Workgroup resources in all regions, if only one region will be used you may specify the specific region or use ${AWS::Region} to specify the current region 291 | # The second '*' in each resource below is to allow the actions above for Athena Workgroup resources which start with 'Security' or 'security_' as a standard naming scheme, which allows flexibility if you want to use multiple workgroups. 292 | # If you only plan to use a single workgroup, you may specify the name here and replace 'Security*' or 'security_' with the explicit name(s) of the workgroups you plan to use. 293 | - !Sub 'arn:aws:athena:*:${AWS::AccountId}:workgroup/Security*' 294 | - !Sub 'arn:aws:athena:*:${AWS::AccountId}:workgroup/security_*' 295 | # Note: The first '*' in each resource below is to allow the actions above for Athena Data Catalog resources in all regions, if only one region will be used you may specify the specific region or use ${AWS::Region} to specify the current region 296 | # The second '*' in each resource below is to allow the actions above for all Athena Data Catalog resources which start with 'Security' or 'security_' as a standard naming scheme, which allows flexibility if you want to use multiple data catalogs. 297 | # If you only plan to use a single data catalog, you may specify the name here and replace 'Security*' or 'security_' with the explicit name(s) of the data catalog you plan to use. 298 | - !Sub 'arn:aws:athena:*:${AWS::AccountId}:datacatalog/Security*' 299 | - !Sub 'arn:aws:athena:*:${AWS::AccountId}:datacatalog/security_*' 300 | - Sid: AthenaOutputBucketReadWrite 301 | Effect: Allow 302 | Action: 303 | - s3:AbortMultipartUpload 304 | - s3:GetBucketLocation 305 | - s3:GetObject 306 | - s3:ListBucket 307 | - s3:ListBucketMultipartUploads 308 | - s3:ListMultipartUploadParts 309 | - s3:PutObject 310 | Resource: 311 | # Note: The '*' in each resource below is to allow the actions above on all S3 Objects in the specified S3 Bucket. If you'd like to limit it to a specific path you could for example specify '/athena/*' 312 | - !Join [ '', ['arn:aws:s3:::', !Ref ParamQueryOutputLocation, '*']] 313 | - Sid: LogSourceBucketReadOnly 314 | Effect: Allow 315 | Action: 316 | - s3:GetObject 317 | - s3:ListBucket 318 | # Note: The '*' in each resources below is to allow the actions above on all S3 Objects in the specified S3 Buckets. If you'd like to limit it to a specific path you could for example specify '/AWSLogs/*' 319 | Resource: !Split ["," , !Join [ "*,arn:aws:s3:::" , !Split [",", !Join [ '', ['arn:aws:s3:::', !Ref ParamLogSourceLocations, '*']]]]] 320 | - Sid: ListAllBuckets 321 | Effect: Allow 322 | Action: 323 | - s3:GetBucketLocation 324 | - s3:ListBucket 325 | - s3:ListAllMyBuckets 326 | Resource: 327 | # Note: The '*' in each resource below is to allow the actions above on all S3 Buckets in your account. 328 | # If you know all of the buckets you plan to use with Athena you may specify them explicitly here instead (e.g. arn:aws:s3:::BucketName) 329 | - 'arn:aws:s3:::*' 330 | Roles: 331 | - !Ref ResourceAthenaSecurityAdminRole 332 | Outputs: 333 | OutputAthenaSecurityAnalystRoleArn: 334 | Value: !GetAtt ResourceAthenaSecurityAnalystRole.Arn 335 | Description: Athena Analyst Role ARN 336 | OutputAthenaSecurityAdminRoleArn: 337 | Value: !GetAtt ResourceAthenaSecurityAdminRole.Arn 338 | Description: Athena Admin Role ARN 339 | 340 | -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/cfn/VPC_enable_flowlogs.yml: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: Apache-2.0 3 | 4 | --- 5 | AWSTemplateFormatVersion: '2010-09-09' 6 | Description: 'Publish flow logs to S3 with a custom format containing all available fields' 7 | Metadata: 8 | 'AWS::CloudFormation::Interface': 9 | ParameterGroups: 10 | - Label: 11 | default: 'Flow Logs Parameters' 12 | Parameters: 13 | - ParamResourceType 14 | - ParamTargetResourceId 15 | - ParamLogDestinationS3ARN 16 | - ParamTrafficType 17 | - ParamFlowlogsNameTag 18 | Parameters: 19 | ParamResourceType: 20 | Description: 'The type of resource for which to create the flow log. For example, if you specified a VPC ID for the ResourceId property, specify VPC for this property.' 21 | Type: String 22 | Default: VPC 23 | AllowedValues: 24 | - VPC 25 | - Subnet 26 | - NetworkInterface 27 | ParamTargetResourceId: 28 | Description: 'The ID of the VPC, subnet, or network interface for which you want to create a flow log' 29 | Type: String 30 | Default: vpc-xxxxxx 31 | ParamLogDestinationS3ARN: 32 | Description: 'ARN of the S3 Bucket and optional path where logs will be delivered' 33 | Type: String 34 | Default: 'arn:aws:s3:::bucketname/optional_prefix/' 35 | ParamTrafficType: 36 | Description: 'The type of traffic to log' 37 | Type: String 38 | Default: ALL 39 | AllowedValues: 40 | - ACCEPT 41 | - REJECT 42 | - ALL 43 | ParamFlowlogsNameTag: 44 | Description: 'The value of the "Name" tag attached to the flow logs' 45 | Type: String 46 | Default: flowlogs_allTraffic_allFields_v5_s3 47 | Resources: 48 | ResourceFlowLog: 49 | Type: AWS::EC2::FlowLog 50 | Properties: 51 | LogDestinationType: s3 52 | LogDestination: !Ref ParamLogDestinationS3ARN 53 | ResourceId: !Ref ParamTargetResourceId 54 | ResourceType: !Ref ParamResourceType 55 | TrafficType: !Ref ParamTrafficType 56 | LogFormat: '${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status} ${vpc-id} ${subnet-id} ${instance-id} ${tcp-flags} ${type} ${pkt-srcaddr} ${pkt-dstaddr} ${region} ${az-id} ${sublocation-type} ${sublocation-id} ${pkt-src-aws-service} ${pkt-dst-aws-service} ${flow-direction} ${traffic-path}' 57 | MaxAggregationInterval: 60 58 | Tags: 59 | - Key: Name 60 | Value: !Ref ParamFlowlogsNameTag 61 | Outputs: 62 | OutputFlowLogId: 63 | Value: !Ref ResourceFlowLog 64 | Description: ID of the created flow log 65 | -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/docs/aws_security_analytics_bootstrap_deployment_guide.md: -------------------------------------------------------------------------------- 1 | # AWS Security Analytics Bootstrap Deployment Guide 2 | 3 | ## 1 Prerequisites 4 | ### 1.1 Verify AWS Service Logs are Enabled and Correctly Configured 5 | The first step is to verify that the expected logs have been enabled and are currently being archived directly to an Amazon S3 bucket. AWS Security Analytics Bootstrap currently requires AWS service logs to be configured to be sent directly to S3 and unmodified. For additional assistance enabling AWS service logs, consider using [Assisted Log Enabler for AWS](https://github.com/awslabs/assisted-log-enabler-for-aws). Although AWS service log configuration and enablement is out of scope for this project, the following are a few suggestions: 6 | - Centralize all of your AWS service logs into a single S3 bucket or an S3 bucket per log type 7 | - Enable [AWS CloudTrail](https://docs.aws.amazon.com/cloudtrail/index.html) for all accounts 8 | - Enable [Amazon Virtual Private Cloud (VPC) Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) and [Amazon Route 53 DNS resolver query logs](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resolver-query-logs.html) for all VPCs 9 | - Enable Amazon S3 data events in CloudTrail to monitor S3 object level events (If a high volume of S3 data events is expected, data events can be enabled in a separate trail so they can be searched seperately) 10 | - Enable VPC Flow Logs with a custom field configuration including [all available fields through v5](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html#flow-logs-fields). 11 | 12 | 13 | **WARNING:** AWS Security Analytics Bootstrap expects VPC Flow Log fields to be in exactly this order (specifically for custom configurations): 14 | 15 | ```${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status} ${vpc-id} ${subnet-id} ${instance-id} ${tcp-flags} ${type} ${pkt-srcaddr} ${pkt-dstaddr} ${region} ${az-id} ${sublocation-type} ${sublocation-id} ${pkt-src-aws-service} ${pkt-dst-aws-service} ${flow-direction} ${traffic-path}``` 16 | 17 | AWS Security Analytics Bootstrap will still work for existing VPC Flow Log configurations which only include the default (v2) field configuration: 18 | 19 | ```${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status}``` 20 | 21 | as long as the fields are in the same order, the additional fields at the end would simply appear empty. If existing VPC Flow Logs need to be supported that include fields in a different order, then the order of the VPC Flow Logs table fields would need to be updated to be in the same order in the [AWS Security Analytics Bootstrap Infrastructure CloudFormation Template](../../AthenaBootstrap/cfn/Athena_infra_setup.yml) or the [VPC Flow Log Create Table Statement](AthenaBootstrap/sql/ddl/create_tables/create_vpcflowlog_table.sql). 22 | 23 | ### 1.2 Determine Whether To Deploy In The Same Account Or Cross-account 24 | The next step is to determine what account AWS Security Analytics Bootstrap will be deployed in. If there's only a single AWS account the the choice is simple, however if it will be deployed in a multi-account environment (e.g. [AWS Control Tower](https://aws.amazon.com/controltower)) AWS Security Analytics Bootstrap can be deployed in different account than the logs such as the Audit or Security accounts. If AWS Security Analytics Bootstrap will be deployed in the same account as your logs then you can skip the rest of this section, otherwise if AWS Security Analytics Bootstrap will be deployed in a different account than your logs are stored in please complete the following steps: 25 | - Ensure that each of the S3 buckets containing the AWS service logs allows the Athena role(s)/user(s) to read the S3 bucket and its log objects from the account AWS Security Analytics Bootstrap will be deployed in. The Athena documentation includes some [example cross-account configurations](https://docs.aws.amazon.com/athena/latest/ug/cross-account-permissions.html) and the example below is currently the least permissions required: 26 | ``` 27 | "Action": [ 28 | "s3:GetBucketLocation", 29 | "s3:GetObject", 30 | "s3:ListBucket" 31 | ], 32 | "Effect": "Allow", 33 | "Resource": [ 34 | arn:aws:s3:::BUCKET_NAME, 35 | arn:aws:s3:::BUCKET_NAME/* 36 | ], 37 | "Principal": { 38 | "AWS": [ 39 | "arn:aws:iam::111122223333:role/ROLENAME" 40 | ] 41 | } 42 | ``` 43 | 44 | ### 1.3 Ensure Athena Role(s)/User(s) Can Access Encrypted S3 Buckets 45 | If the source *S3 buckets* that AWS service logs are stored in are encrypted using KMS follow the [examples provided in cross-account configuration documentation](https://docs.aws.amazon.com/athena/latest/ug/cross-account-permissions.html) to ensure that the Athena role(s)/user(s) have the appropriate KMS permissions to decrypt objects from the encrypted S3 buckets. AWS IAM principal(s) that will be used to submit Athena queries will need to have permissions for `kms:Decrypt` and `kms:DescribeKey` in their IAM policy and the associated KMS key policy will need to grant them the same access. 46 | 47 | ### 1.4 Ensure Athena Role(s)/User(s) Can Decrypt Encrypted AWS Service Logs 48 | Some AWS Service Logs can be configured to be encrypted using KMS at an object level in addition to S3 bucket encryption (e.g. [AWS Cloudtrail](https://docs.aws.amazon.com/kms/latest/developerguide/logging-using-cloudtrail.html)). AWS IAM principal(s) that will be used to submit Athena queries will need to have permissions for `kms:Decrypt` and `kms:DescribeKey` in their IAM policy and the associated KMS key policy will need to grant them the same access. 49 | 50 | ### 1.5 Ensure That The Service Log Objects Are Owned By The S3 Bucket Owner 51 | For cross-account deployments it will be required that the AWS service log objects stored in S3 belong to the bucket owner (e.g. the same account as the bucket is in). By default, even with the `bucket-owner-full-control` ACL applied, objects written by another AWS principal (e.g. the logging service) will still be owned by that principal. While the bucket owner will have full control, they will not be able to delegate access to another account (such as cross-account to an Athena user) unless they have ownership of the object. More about object ownership can be found in this [Amazon S3 documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/about-object-ownership.html). To configure the bucket owner as the owner of all objects going forward enable `bucket owner preferred` described in this `S3 documentation`(https://docs.aws.amazon.com/AmazonS3/latest/userguide/about-object-ownership.html#enable-object-ownership). To update the ownership of objects already stored in the S3 bucket, the `bucket owner preferred` setting should be enabled then each object will need to be overwritten (e.g. recursively copy objects in the bucket to the same path in the same bucket, similar to what's described in this [knowledge center article](https://aws.amazon.com/premiumsupport/knowledge-center/s3-object-change-anonymous-ownership/)). Full instructructions to update existing object ownership is out of scope for this project, please contact AWS Customer Support if you need assistance. 52 | 53 | ### 1.6 Determine Whether To Create A New S3 Bucket For The Athena Query Output Or Use An Existing S3 Bucket 54 | Amazon Athena automatically stores [query results and metadata information for each query that runs](https://docs.aws.amazon.com/athena/latest/ug/querying.html) in a query result location that you can specify in Amazon S3. An existing S3 bucket can be used or a new S3 bucket can be created to store the resulting Athena output files. If an existing S3 bucket is used to store the Athena output files, it's recommended to create a unique prefix (e.g. /athena ) to keep the resulting output better organized and enable specific access policies if desired. It is also recommended that the output S3 bucket is in the same region as AWS Security Analytics Bootstrap will be deployed, if possible, to provide the best output performance. 55 | 56 | ### 1.7 Document All Required Parameters 57 | In preparation to deploy AWS Security Analytics Bootstrap, document the following values which will be required as parameters to deploy the CloudFormation stack(s): 58 | 59 | **AWS Security Analytics Bootstrap IAM CloudFormation Template (optional)** 60 | 61 | Parameter | Description | Example 62 | |-----|-----|-----| 63 | AWS Source Log Location(s) | Full path(s) for logs Athena will query in the form '//AWSLogs/' (comma separated, no spaces between values) | `bucket_name/prefix/AWSLogs/,bucket_name2/prefix2/AWSLogs/` | 64 | Athena Output Location | Full path for Athena output in the form '//' | `query_history_bucket/optional_prefix/` 65 | All S3 Bucket Names | The name of all buckets, including log buckets and Athena output bucket (comma separated, no spaces between values) | `log_bucket_1,log_bucket_2,output_bucket` 66 | 67 | **AWS Security Analytics Bootstrap CloudFormation Infrastructure Template** 68 | 69 | Parameter | Description | Example 70 | |-----|-----|-----| 71 | Athena Workgroup Name | Name of the initial Athena Security Analysis workgroup to create | `SecurityAnalysis` 72 | Athena Workgroup Description | Description of the initial Athena Security Analysis workgroup | `Security Analysis Athena Workgroup` 73 | Athena Output Location | S3 path to write all query results, store 45-day query history, and created tables via queries | `s3://query_history_bucket/optional_prefix/` 74 | Glue Database Name | Name of the Glue database to create, which will contain all security analysis tables created by this template (**cannot contain hyphen**) |`security_analysis` 75 | Enable CloudTrail Glue Table | Do you want to create and enable a table for CloudTrail? | `Yes` 76 | CloudTrail Glue Table Name | Name of the CloudTrail Glue table to create | `cloudtrail` 77 | CloudTrail Source Location | S3 base path of CloudTrail logs to be included in the CloudTrail table (must end with /AWSLogs/ or /AWSLogs// if you're using an organization trail) | `s3:////AWSLogs/` 78 | CloudTrail Projection Event Start Date | Start date for CloudTrail logs (replace //
with the first date of your logs, example: 2020/11/30) | `//
` 79 | CloudTrail Account List | Account(s) to include in the CloudTrail log table in a comma separated list with NO SPACES (example: "0123456789,0123456788,0123456777"); note that all accounts must be logging to the same source, with contents in {ParamVPCFlowSource}/AWSLogs/{account_number}/CloudTrail | `0123456789,0123456788,0123456777` 80 | CloudTrail Region List | Regions to include in the CloudTrail log table in a comma separated list with NO SPACES; Include all regions for full coverage even if there are no logs currently in that region | `us-east-2,us-east-1,us-west-1,us-west-2,af-south-1,ap-east-1,ap-south-1,ap-northeast-3,ap-northeast-2,ap-southeast-1,ap-southeast-2,ap-northeast-1,ca-central-1,cn-north-1,cn-northwest-1,eu-central-1,eu-west-1,eu-west-2,eu-south-1,eu-west-3,eu-north-1,me-south-1,sa-east-1` 81 | Enable VPC Flow Log Glue Table | Do you want to create and enable a table for VPC Flow Logs? | `Yes` 82 | VPC Flow Log Glue Table Name | Name of the VPC Flow Log Glue table to create | `vpcflow` 83 | VPC Flow Log Source Location | S3 base path of VPC Flow Logs to be included in the VPC flow table (must end with /AWSLogs/) | `s3:////AWSLogs/` 84 | VPC Flow Log Projection Event Start Date | Start date for VPC Flow Logs (replace //
with the first date of your logs, example: 2020/11/30) | `//
` 85 | VPC Flow Log Account List | Account(s) to include in the VPC Flow Log log table in a comma separated list with NO SPACES (example: "0123456789,0123456788,0123456777"); note that all accounts must be logging to the same source, with contents in {ParamVPCFlowSource}/AWSLogs/{account_number}/vpcflowlogs | `0123456789,0123456788,0123456777` 86 | VPC Flow Log Region List | Regions to include in the VPC Flow Log log table in a comma separated list with NO SPACES; Include all regions for full coverage even if there are no logs currently in that region | `us-east-2,us-east-1,us-west-1,us-west-2,af-south-1,ap-east-1,ap-south-1,ap-northeast-3,ap-northeast-2,ap-southeast-1,ap-southeast-2,ap-northeast-1,ca-central-1,cn-north-1,cn-northwest-1,eu-central-1,eu-west-1,eu-west-2,eu-south-1,eu-west-3,eu-north-1,me-south-1,sa-east-1` 87 | Enable Route53 DNS Resolver Logs Glue Table | Do you want to create and enable a table for Route53 DNS Resolver Logs? | `Yes` 88 | Route53 DNS Resolver Logs Glue Table Name | Name of the Route53 DNS Resolver Logs Glue table to create | `r53dns` 89 | Route53 DNS Resolver Logs Source Location | S3 base path of Route53 DNS Resolver logs to be included in the Route53 DNS Resolver table (must end with /AWSLogs/) | `s3:////AWSLogs/` 90 | Route53 DNS Resolver Logs Projection Event Start Date | Start date for Route53 DNS Resolver logs (replace //
with the first date of your logs, example: 2020/11/30) | `//
` 91 | Route53 DNS Resolver Logs Account List | Account(s) to include in the Route53 DNS Resolver Logs table in a comma separated list with NO SPACES (example: "0123456789,0123456788,0123456777"); note that all accounts must be logging to the same source, with contents in {ParamVPCFlowSource}/AWSLogs/{account_number}/vpcdnsquerylogs | `0123456789,0123456788,0123456777` 92 | Route53 DNS Resolver Logs VPC List | VPC IDs to include in the Route53 DNS Resolver log table in a comma seperated list with NO SPACES; Include all VPC IDs for full coverage even if there are no logs currently in that VPC | `,,...` 93 | 94 | ## 2 Deploy AWS Security Analytics Bootstrap IAM CloudFormation Template (optional) 95 | 96 | To deploy Amazon Athena admin and user roles following the principle of least privilege, the [AWS Security Analytics Bootstrap IAM CloudFormation Template](../../AthenaBootstrap/cfn/Athena_IAM_setup.yml) has been provided. This step is optional as AWS customers who already have appropriate roles and policies to manage Athena and query AWS service logs for their organization may continue use those roles with AWS Security Analytics Bootstrap. For customers who would like to create their own IAM policies, documentation on [Identity and Access Management in Athena](https://docs.aws.amazon.com/athena/latest/ug/security-iam-athena.html) is provided in the Amazon Athena service documentation. Customer creating their own policies may want to review the admin and user policies provided in [AWS Security Analytics Bootstrap IAM CloudFormation Template](../../AthenaBootstrap/cfn/Athena_IAM_setup.yml) as a reference, which aim to provide policies with the least privilege required. 97 | 98 | ### 2.1 Update The AthenaAdmin And AthenaUser AssumeRole Policies In The CloudFormation Template 99 | 100 | Before deploying [AWS Security Analytics Bootstrap IAM CloudFormation Template](../../AthenaBootstrap/cfn/Athena_IAM_setup.yml), the template must first be modified with the appropriate AssumeRole policies for AthenaAdmin and Athena users specified by in the template by the comment `# TODO: Replace this with the appropriate least-privilege AssumeRole policy for your organization`. By default each role will currently deny all AssumeRole attempts, however this statement should be replaced before the template is deployed to be the appropriate least-privilege policy for the account in which it will be deployed. For reference please review the [AWS Identity and Access Management documentation on AssumeRole policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_permissions-to-switch.html#roles-usingrole-createpolicy). 101 | 102 | ### 2.2 Further Restrict Privilege In CloudFormation Template As Desired 103 | 104 | Currently the policies in the [AWS Security Analytics Bootstrap IAM CloudFormation Template](../../AthenaBootstrap/cfn/Athena_IAM_setup.yml) restrict certain actions to resources with the naming convention `Security*` or `security_*`. This is to limit the roles' access to resources within these naming conventions in the event that there are other Amazon Athena or AWS Glue resources in the same account. Both naming conventions were supported for flexibility, however strictly speaking only one is needed, or another naming convention could be used instead. If desired, one of these naming conventions could be removed, or another naming convention could be used in their place. Obviously, keep in mind when naming resources to always follow the specified naming convention to ensure these roles will have the appropriate access. 105 | 106 | ### 2.3 Create The IAM AWS Security Analytics Bootstrap IAM CloudFormation Stack 107 | 108 | The updated [AWS Security Analytics Bootstrap IAM CloudFormation Template](../../AthenaBootstrap/cfn/Athena_IAM_setup.yml) can now be deployed to the target account via [console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-using-console.html), [cli](https://docs.aws.amazon.com/cli/latest/reference/cloudformation/deploy/index.html), or as part of a custom change management pipeline (e.g. [Customizations for AWS Control Tower](https://aws.amazon.com/solutions/implementations/customizations-for-aws-control-tower/)). The parameters documented in section 1.6 above will need to be provided, be sure to follow the formatting requirements exactly as specified for each parameter. 109 | 110 | ## 3 Test The AWS Security Analytics Bootstrap IAM Roles 111 | 112 | Once the IAM AWS Security Analytics Bootstrap IAM CloudFormation Stack is deployed test to ensure that it's possible to assume the roles `AthenaSecurityAdminRole` and `AthenaSecurityAnalystRole` according to the updated AssumeRole policy. While the Glue and Athena resources don't exist yet at this point, you may test to ensure that both the admin and user roles are able to access and read from the S3 buckets specified. If the roles aren't able to read from the specified S3 buckets then the Athena queries won't work either, the prerequisites will need to be reviewed again to ensure that the user has access to any KMS keys required and that the user has access to the bucket and objects via bucket policy and object ownership. 113 | 114 | ## 4 Deploy AWS Security Analytics Bootstrap Infrastructure CloudFormation Stack 115 | With all of the prerequisites met, the [AWS Security Analytics Bootstrap CloudFormation Template](../../AthenaBootstrap/docs/security_analytics_bootstrap_deployment_guide.md) can now be deployed to the target account via [console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-using-console.html), [cli](https://docs.aws.amazon.com/cli/latest/reference/cloudformation/deploy/index.html), or as part of a custom change management pipeline (e.g. [Customizations for AWS Control Tower](https://aws.amazon.com/solutions/implementations/customizations-for-aws-control-tower/)). The parameters documented in section 1.6 above will need to be provided, be sure to follow the formatting requirements exactly as specified for each parameter. Also note that this stack will deploy the infrastructure in the same region that the stack is created, and it's recommended that 116 | 117 | ## 5 Testing the AWS Security Analytics Bootstrap Infrastructure 118 | Once the AWS Security Analytics Bootstrap Infrastructure CloudFormation Stack is deployed test to ensure that the environment is working as expected. Test the following with the intended Athena Admin and User user(s)/role(s): 119 | 120 | ### 5.1 Setting the Athena Workgroup 121 | The Athena Workgroup will determine where/how Athena output results are stored, track Athena query history of all users/roles using the current workgroup, and organize available saved queries. 122 | 123 | - In the AWS Console access the `Athena` Service Console 124 | - In the `Athena` Service Console Click on `Workgroup` (in the top bar) 125 | - In the `Workgroups` view, click on the name of the workgroup (`SecurityAnalysis` by default or as specified in the CloudFormation Stack parameters) 126 | - With the Athena Workgroup selected click `View details` 127 | - Verify that the `Query result location` is the desired location, that encryption is enabled under `Encrypt query results` (e.g. SSE_S3), and that `Override client-side settings` is Enabled 128 | - Navigate back to the `Workgroups` view, and click on the name of the workgroup (`SecurityAnalysis` by default or as specified in the CloudFormation Stack parameters) 129 | - With the Athena Workgroup selected click `Switch workgroup`, this will set this workgroup as the active workgroup. Note that Athena may ask for acknowledgment to confirm the user's query output is being logged to the specified S3 location. 130 | 131 | ### 5.2 Setting the Glue Database 132 | The Glue Database will determine what Glue tables are visible, and will be the default database for all queries which don't explicitly specify a database name. For this reason it's recommended to ensure that users ensure that have selected the appropriate database prior to running queries. This may need to be reset periodically if the user navigates away from the `Query editor` view. 133 | 134 | - In the `Athena` Service Console Click on `Query editor` (in the top bar) 135 | - On the left sidebar select the database from the drop down menu (`security_analysis` by default or as specified in the CloudFormation Stack parameters) 136 | - Confirm that the expected AWS service log tables appear in the `Tables` section of the sidebar 137 | 138 | ### 5.3 Testing the Glue Tables 139 | The Glue Tables will be the resource which Athena queries are typically run against. When Athena queries specify Glue Tables in the `FROM`, these table definitions will determine how the data is interpreted at the time of the query. It is important to understand that these tables determine the *representation* of the data in the query, it doesn't not change anything about underlying data itself. 140 | 141 | - In the `Athena` Service Console Click on `Query editor` 142 | - On the left sidebar select the database from the drop down menu (`security_analysis` by default or as specified in the CloudFormation Stack parameters) 143 | - Confirm that the expected AWS service log tables appear in the `Tables` section of the sidebar 144 | - For each table in the side bar: 145 | - Click on the 3-dots to the right of the table name and select `Preview table` 146 | - Verify that the preview executes successfully (10 rows should print out in the `Results` section) 147 | 148 | If any of the table results don't appear, the prerequisites will need to be reviewed again to ensure that the user has access to any KMS keys required and that the user has access to the bucket and objects via bucket policy and object ownership. If the prerequisites are met and the user has the ability to read the respective logs from the source S3 bucket, then review the parameters provided to the AWS Security Analytics Bootstrap Infrastructure CloudFormation Stack (specifically source locations) to ensure they match the target log configuration. 149 | 150 | ### 5.3 Testing the Saved Queries 151 | Demo queries have been provided in the created Athena Workgroup under `Saved queries`. It's recommended to use narrow partitioning, where possible, otherwise Athena will default to scanning all of the data in the table leading to higher cost and longer query times. Demo queries have default partition constraints for account, region, and date which will need to be updated or removed according to the desired query parameters. Demo queries may include partition constraints for account, region, and date which will need to be updated or removed according to the desired query parameters. Additional saved queries can be added by any workgroup group user by selecting `Save as` under the query editor window. 152 | 153 | - In the `Athena` Service Console Click on `Saved queries` 154 | - Click on a query name or description. This will open the `Query editor` view with the selected query in the query editor section 155 | - On the left sidebar select the database from the drop down menu (`security_analysis` by default or as specified in the CloudFormation Stack parameters) 156 | - Review and update the query (specifically the `account_partition`, `region_partition`, and `date_partition` conditions) 157 | - When ready, click the `Run query` button under the query editor section 158 | - Ensure that the query completes successfully and review the results in the `Results` 159 | 160 | ## 6 Using AWS Security Analytics Bootstrap Infrastructure 161 | Once the AWS Security Analytics Bootstrap Infrastructure is deployed, the Athena environment can be used to run queries against the respective tables created. It's beyond the scope of this project to explain how to use Athena, but there are a few other resources listed below which may be helpful: 162 | 163 | ### Resources from AWS 164 | - [[Doc] Running SQL Queries Using Amazon Athena](https://docs.aws.amazon.com/athena/latest/ug/querying-athena-tables.html) 165 | - [[Doc] Querying AWS Service Logs](https://docs.aws.amazon.com/athena/latest/ug/querying-AWS-service-logs.html) 166 | - [[Blog] Analyze Security, Compliance, and Operational Activity Using AWS CloudTrail and Amazon Athena](https://aws.amazon.com/blogs/big-data/aws-cloudtrail-and-amazon-athena-dive-deep-to-analyze-security-compliance-and-operational-activity/) 167 | - [[Blog] Athena Performance Tips](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/) 168 | - [[Q&A] CTAS Bucketing Guidance](https://aws.amazon.com/premiumsupport/knowledge-center/set-file-number-size-ctas-athena/) 169 | 170 | ### Third Party Resources 171 | - [[Guide] The Athena Guide](https://the.athena.guide/) 172 | - [[Doc] Presto Documentation (current)](https://prestodb.io/docs/current/) 173 | - [[Book] Presto The Definitive Guide (e-book)](https://www.starburst.io/wp-content/uploads/2020/04/OReilly-Presto-The-Definitive-Guide.pdf) 174 | -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/docs/img/architecture/MultiAccount.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awslabs/aws-security-analytics-bootstrap/f7b8795c25bd26c594277e042938e905076313da/AWSSecurityAnalyticsBootstrap/docs/img/architecture/MultiAccount.png -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/docs/img/architecture/SingleAccount.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awslabs/aws-security-analytics-bootstrap/f7b8795c25bd26c594277e042938e905076313da/AWSSecurityAnalyticsBootstrap/docs/img/architecture/SingleAccount.png -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/docs/img/architecture/SingleAccount_BootstrapOnly.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awslabs/aws-security-analytics-bootstrap/f7b8795c25bd26c594277e042938e905076313da/AWSSecurityAnalyticsBootstrap/docs/img/architecture/SingleAccount_BootstrapOnly.png -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/sql/ddl/create_tables/create_cloudtrail_table.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: Apache-2.0 4 | 5 | TODO: optionally update the table name "cloudtrail" to the name you'd like to use for the CloudTrail table 6 | */ 7 | CREATE EXTERNAL TABLE cloudtrail ( 8 | eventversion STRING, 9 | useridentity STRUCT< 10 | type:STRING, 11 | principalid:STRING, 12 | arn:STRING, 13 | accountid:STRING, 14 | invokedby:STRING, 15 | accesskeyid:STRING, 16 | userName:STRING, 17 | sessioncontext:STRUCT< 18 | attributes:STRUCT< 19 | mfaauthenticated:STRING, 20 | creationdate:STRING>, 21 | sessionissuer:STRUCT< 22 | type:STRING, 23 | principalId:STRING, 24 | arn:STRING, 25 | accountId:STRING, 26 | userName:STRING>>>, 27 | eventtime STRING, 28 | eventsource STRING, 29 | eventname STRING, 30 | awsregion STRING, 31 | sourceipaddress STRING, 32 | useragent STRING, 33 | errorcode STRING, 34 | errormessage STRING, 35 | requestparameters STRING, 36 | responseelements STRING, 37 | additionaleventdata STRING, 38 | requestid STRING, 39 | eventid STRING, 40 | resources ARRAY>, 44 | eventtype STRING, 45 | apiversion STRING, 46 | readonly STRING, 47 | recipientaccountid STRING, 48 | serviceeventdetails STRING, 49 | sharedeventid STRING, 50 | vpcendpointid STRING 51 | ) 52 | PARTITIONED BY 53 | ( 54 | date_partition STRING, 55 | region_partition STRING, 56 | account_partition STRING 57 | ) 58 | ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde' 59 | STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat' 60 | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 61 | /* 62 | TODO: replace bucket_name and optional_prefix in LOCATION value, 63 | if there's no prefix then remove the extra / 64 | example: s3://my_central_log_bucket/AWSLogs/ or s3://my_central_log_bucket/PROD/AWSLogs/ 65 | */ 66 | LOCATION 's3:////AWSLogs/' 67 | TBLPROPERTIES 68 | ( 69 | "projection.enabled" = "true", 70 | "projection.date_partition.type" = "date", 71 | /* TODO: replace //
with the first date of your logs, example: 2020/11/30 */ 72 | "projection.date_partition.range" = "//
,NOW", 73 | "projection.date_partition.format" = "yyyy/MM/dd", 74 | "projection.date_partition.interval" = "1", 75 | "projection.date_partition.interval.unit" = "DAYS", 76 | "projection.region_partition.type" = "enum", 77 | "projection.region_partition.values" = "us-east-2,us-east-1,us-west-1,us-west-2,af-south-1,ap-east-1,ap-south-1,ap-northeast-3,ap-northeast-2,ap-southeast-1,ap-southeast-2,ap-northeast-1,ca-central-1,cn-north-1,cn-northwest-1,eu-central-1,eu-west-1,eu-west-2,eu-south-1,eu-west-3,eu-north-1,me-south-1,sa-east-1", 78 | "projection.account_partition.type" = "enum", 79 | /* 80 | TODO: replace values in projection.account_partition.values with the list of AWS account numbers that you want to include in this table 81 | example: "0123456789,0123456788,0123456777" 82 | note: do not use any spaces, separate the values with a comma only (including spaces will cause a syntax error) 83 | if there is only one account, include it by itself with no comma, for example: "0123456789" 84 | */ 85 | "projection.account_partition.values" = ",,...", 86 | /* 87 | TODO: Same as LOCATION, replace bucket_name and optional_prefix in storage.location.template value, 88 | if there's no prefix then remove the extra / 89 | example: s3://my_central_log_bucket/AWSLogs/... or s3://my_central_log_bucket/PROD/AWSLogs/... 90 | NOTE: do not change parameters that look like ${...}, those are template variables, only replace values in angle brackets <...> 91 | */ 92 | "storage.location.template" = "s3:////AWSLogs/${account_partition}/CloudTrail/${region_partition}/${date_partition}" 93 | ); -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/sql/ddl/create_tables/create_dnsquerylog_table.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: Apache-2.0 4 | 5 | TODO: optionally update the table name "dns" to the name you'd like to use for the DNS resolver log table 6 | */ 7 | 8 | CREATE EXTERNAL TABLE IF NOT EXISTS r53dns ( 9 | version FLOAT, 10 | account_id STRING, 11 | region STRING, 12 | vpc_id STRING, 13 | query_timestamp STRING, 14 | query_name STRING, 15 | query_type STRING, 16 | query_class STRING, 17 | rcode STRING, 18 | answers ARRAY, 19 | srcaddr STRING, 20 | srcport INT, 21 | transport STRING, 22 | srcids STRING 23 | ) 24 | PARTITIONED BY 25 | ( 26 | account_partition STRING, 27 | vpc_partition STRING, 28 | date_partition STRING 29 | ) 30 | ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 31 | STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' 32 | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 33 | /* 34 | TODO: replace bucket_name and optional_prefix in LOCATION value, 35 | if there's no prefix then remove the extra / 36 | example: s3://my_central_log_bucket/AWSLogs/ or s3://my_central_log_bucket/PROD/AWSLogs/ 37 | */ 38 | LOCATION 's3:////AWSLogs/' 39 | TBLPROPERTIES 40 | ( 41 | "projection.enabled" = "true", 42 | "projection.date_partition.type" = "date", 43 | /* TODO: replace //
with the first date of your logs, example: 2020/11/30 */ 44 | "projection.date_partition.range" = "//
,NOW", 45 | "projection.date_partition.format" = "yyyy/MM/dd", 46 | "projection.date_partition.interval" = "1", 47 | "projection.date_partition.interval.unit" = "DAYS", 48 | "projection.vpc_partition.type" = "enum", 49 | /* 50 | TODO: replace values in projection.vpc_partition.values with the list of VPC IDs that you want to include in this table 51 | example: "vpc-00000001,vpc-00000002,vpc-00000003" 52 | note: do not use any spaces, separate the values with a comma only (including spaces will cause a syntax error) 53 | if there is only one VPC ID, include it by itself with no comma, for example: "vpc-00000001" 54 | */ 55 | "projection.vpc_partition.values" = ",,...", 56 | "projection.account_partition.type" = "enum", 57 | /* 58 | TODO: replace values in projection.account_partition.values with the list of AWS account numbers that you want to include in this table 59 | example: "0123456789,0123456788,0123456777" 60 | note: do not use any spaces, separate the values with a comma only (including spaces will cause a syntax error) 61 | if there is only one account, include it by itself with no comma, for example: "0123456789" 62 | */ 63 | "projection.account_partition.values" = ",,...", 64 | /* 65 | TODO: Same as LOCATION, replace bucket_name and optional_prefix in storage.location.template value, 66 | if there's no prefix then remove the extra / 67 | example: s3://my_central_log_bucket/AWSLogs/... or s3://my_central_log_bucket/PROD/AWSLogs/... 68 | NOTE: do not change parameters that look like ${...}, those are template variables, only replace values in angle brackets <...> 69 | */ 70 | "storage.location.template" = "s3:////AWSLogs/${account_partition}/vpcdnsquerylogs/${vpc_partition}/${date_partition}" 71 | ); -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/sql/ddl/create_tables/create_vpcflowlog_table.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: Apache-2.0 4 | 5 | Creates a VPC flow table for VPC flow logs delivered directly to an S3 bucket 6 | Includes partitioning configuration for multi-account, multi-region deployment as well as date in YYYY/MM/DD format 7 | Includes all available fields, including non-default v3 and v4 fields. 8 | NOTE: VPC flow fields must be configured in the order listed when configured, if they were configured in a different order you'll need to adjust the order their listed in the DDL below 9 | */ 10 | 11 | /* 12 | TODO: optionally update the table name "vpcflow" to the name you'd like to use for the VPC flow log table 13 | */ 14 | CREATE EXTERNAL TABLE IF NOT EXISTS vpcflow ( 15 | /* 16 | TODO: verify that VPC flow logs were configured to be generated in this order, if they weren't you'll need to rearrange the order below to match the order in which they were generated 17 | TIP: download a sample and check the first line of each log file for the field order, 18 | don't worry if they field names don't match exactly with the names in the top line of the log, Athena will import them based on their order 19 | NOTE: These v2 fields are in the default format and usually don't need to be adjusted, pay closer attention to the order of the v3/v4 fields below as there is no default ordering of those fields 20 | */ 21 | version INT, 22 | account STRING, 23 | interfaceid STRING, 24 | sourceaddress STRING, 25 | destinationaddress STRING, 26 | sourceport INT, 27 | destinationport INT, 28 | protocol INT, 29 | numpackets INT, 30 | numbytes BIGINT, 31 | starttime INT, 32 | endtime INT, 33 | action STRING, 34 | logstatus STRING, 35 | /* 36 | NOTE: start of non-default v3 and v4 fields 37 | don't worry if you're source logs don't all include these fields, they'll just show as blank 38 | TODO: If you're VPC flow logs include these fields, be sure to check the order their listed below is the same order they're configured in the flow log format 39 | */ 40 | vpcid STRING, 41 | subnetid STRING, 42 | instanceid STRING, 43 | tcpflags SMALLINT, 44 | type STRING, 45 | pktsrcaddr STRING, 46 | pktdstaddr STRING, 47 | region STRING, 48 | azid STRING, 49 | sublocationtype STRING, 50 | sublocationid STRING, 51 | pktsrcawsservice STRING, 52 | pktdstawsservice STRING, 53 | flowdirection STRING, 54 | trafficpath STRING 55 | ) 56 | PARTITIONED BY 57 | ( 58 | date_partition STRING, 59 | region_partition STRING, 60 | account_partition STRING 61 | ) 62 | ROW FORMAT DELIMITED 63 | FIELDS TERMINATED BY ' ' 64 | /* 65 | TODO: replace bucket_name and optional_prefix in LOCATION value, 66 | if there's no prefix then remove the extra / 67 | example: s3://my_central_log_bucket/AWSLogs/ or s3://my_central_log_bucket/PROD/AWSLogs/ 68 | */ 69 | LOCATION 's3:////AWSLogs/' 70 | TBLPROPERTIES 71 | ( 72 | "skip.header.line.count"="1", 73 | "projection.enabled" = "true", 74 | "projection.date_partition.type" = "date", 75 | /* TODO: replace //
with the first date of your logs, example: 2020/11/30 */ 76 | "projection.date_partition.range" = "//
,NOW", 77 | "projection.date_partition.format" = "yyyy/MM/dd", 78 | "projection.date_partition.interval" = "1", 79 | "projection.date_partition.interval.unit" = "DAYS", 80 | "projection.region_partition.type" = "enum", 81 | "projection.region_partition.values" = "us-east-2,us-east-1,us-west-1,us-west-2,af-south-1,ap-east-1,ap-south-1,ap-northeast-3,ap-northeast-2,ap-southeast-1,ap-southeast-2,ap-northeast-1,ca-central-1,cn-north-1,cn-northwest-1,eu-central-1,eu-west-1,eu-west-2,eu-south-1,eu-west-3,eu-north-1,me-south-1,sa-east-1", 82 | "projection.account_partition.type" = "enum", 83 | /* 84 | TODO: replace values in projection.account_partition.values with the list of AWS account numbers that you want to include in this table 85 | example: "0123456789,0123456788,0123456777" 86 | note: do not use any spaces, separate the values with a comma only (including spaces will cause a syntax error) 87 | if there is only one account, include it by itself with no comma, for example: "0123456789" 88 | */ 89 | "projection.account_partition.values" = ",,...", 90 | /* 91 | TODO: Same as LOCATION, replace bucket_name and optional_prefix in storage.location.template value, 92 | if there's no prefix then remove the extra / 93 | example: s3://my_central_log_bucket/AWSLogs/... or s3://my_central_log_bucket/PROD/AWSLogs/... 94 | NOTE: do not change parameters that look like ${...}, those are template variables, only replace values in angle brackets <...> 95 | */ 96 | "storage.location.template" = "s3:////AWSLogs/${account_partition}/vpcflowlogs/${region_partition}/${date_partition}" 97 | ); -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/sql/dml/analytics/cloudtrail/cloudtrail_demo_queries.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: Apache-2.0 4 | */ 5 | 6 | -- PREVIEW TABLE 7 | -- preview first 10 rows with all fields, quick way to verify everything is setup correctly 8 | 9 | SELECT * from cloudtrail 10 | LIMIT 10; 11 | 12 | -- PARTITION TESTS 13 | /* NOTE: if there are no constraints a partition (account, region, or date) then by default ALL data will be scanned 14 | this could lead to costly query, always consider using at least one partition constraint. 15 | 16 | Note that this is the case even if you have other constraints in a query (e.g. sourceipaddress = '192.0.2.1'), 17 | only constraints using partition fields (date_partition, region_partition, account_partition) 18 | will limit the amount of data scanned. 19 | */ 20 | 21 | -- preview first 10 rows with all fields, limited to a single account 22 | SELECT * from cloudtrail 23 | WHERE account_partition = '111122223333' 24 | LIMIT 10; 25 | 26 | -- preview first 10 rows with all fields, limited to multiple accounts 27 | SELECT * from cloudtrail 28 | WHERE account_partition in ('111122223333','444455556666','123456789012') 29 | LIMIT 10; 30 | 31 | -- preview first 10 rows with all fields, limited to a single region 32 | SELECT * from cloudtrail 33 | WHERE region_partition = 'us-east-1' 34 | LIMIT 10; 35 | 36 | -- preview first 10 rows with all fields, limited to multiple regions 37 | SELECT * from cloudtrail 38 | WHERE region_partition in ('us-east-1','us-east-2','us-west-2') 39 | LIMIT 10; 40 | 41 | -- NOTE: date_partition format is 'YYYY/MM/DD' as a string 42 | -- preview first 10 rows with all fields, limited to a certain date range 43 | SELECT * from cloudtrail 44 | WHERE date_partition >= '2021/07/01' 45 | AND date_partition <= '2021/07/31' 46 | LIMIT 10; 47 | 48 | -- preview first 10 rows with all fields, limited to the past 30 days (relative) 49 | SELECT * from cloudtrail 50 | WHERE date_partition >= date_format(date_add('day',-30,current_timestamp), '%Y/%m/%d') 51 | LIMIT 10; 52 | 53 | -- preview first 10 rows with all fields, limited by a combination partition constraints 54 | -- NOTE: narrowing the scope of the query as much as possible will improve performance and minimize cost 55 | SELECT * from cloudtrail 56 | WHERE date_partition >= '2021/07/01' 57 | AND date_partition <= '2021/07/31' 58 | AND account_partition = '111122223333' 59 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 60 | LIMIT 10; 61 | 62 | -- ANALYSIS EXAMPLES 63 | -- NOTE: default partition constraints have been provided for each query, 64 | -- be sure to add the appropriate partition constraints to the WHERE clause as shown above 65 | /* 66 | DEFAULT partition constraints: 67 | WHERE date_partition >= '2021/07/01' 68 | AND date_partition <= '2021/07/31' 69 | AND account_partition = '111122223333' 70 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 71 | 72 | Be sure to modify or remove these to fit the scope of your intended analysis 73 | */ 74 | 75 | -- Summary of event counts by Region (e.g. where is the most activity) 76 | SELECT region_partition, count(*) as eventcount FROM cloudtrail 77 | WHERE date_partition >= '2021/07/01' 78 | AND date_partition <= '2021/07/31' 79 | AND account_partition = '111122223333' 80 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 81 | GROUP BY region_partition 82 | ORDER BY eventcount DESC 83 | 84 | -- Summary of event count by Region and EventName, ordered by event count (descending) for each region 85 | -- Quick way to identify top EventNames seen in each region 86 | SELECT region_partition, eventname, count(*) as eventcount FROM cloudtrail 87 | WHERE date_partition >= '2021/07/01' 88 | AND date_partition <= '2021/07/31' 89 | AND account_partition = '111122223333' 90 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 91 | GROUP BY region_partition, eventname 92 | ORDER BY region_partition, eventcount DESC 93 | 94 | -- User login summary, via AssumeRole or ConsoleLogin 95 | -- includes a list of all source IPs for each user 96 | SELECT useridentity.arn, eventname, array_agg(DISTINCT(sourceipaddress) ORDER BY sourceipaddress) AS sourceips FROM cloudtrail 97 | WHERE useridentity.arn IS NOT NULL 98 | AND (eventname = 'AssumeRole' OR eventname = 'ConsoleLogin') 99 | AND date_partition >= '2021/07/01' 100 | AND date_partition <= '2021/07/31' 101 | AND account_partition = '111122223333' 102 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 103 | GROUP BY useridentity.arn, eventname 104 | ORDER BY eventname 105 | 106 | -- User Activity Summary 107 | -- filter high volume read-only GET/LIST/DECRIBE calls 108 | SELECT useridentity.arn, array_agg(DISTINCT(eventname)) AS eventnames, 109 | array_agg(DISTINCT(sourceipaddress) ORDER BY sourceipaddress) AS sourceips, 110 | array_agg(DISTINCT(useragent) ORDER BY useragent) AS useragents FROM cloudtrail 111 | WHERE eventname <> 'AssumeRole' 112 | AND eventname NOT LIKE 'Get%' 113 | AND eventname NOT LIKE 'List%' 114 | AND eventname NOT LIKE 'Describe%' 115 | AND date_partition >= '2021/07/01' 116 | AND date_partition <= '2021/07/31' 117 | AND account_partition = '111122223333' 118 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 119 | GROUP BY useridentity.arn 120 | 121 | -- User Activity Summary, including username 122 | -- filter high volume read-only GET/LIST/DECRIBE calls 123 | -- same as above, but will include the ARN or the username (for IAM Users) of the principal 124 | SELECT useridentity.arn, useridentity.username, 125 | array_agg(DISTINCT(eventname) ORDER BY eventname) AS eventnames, 126 | array_agg(DISTINCT(sourceipaddress) ORDER BY sourceipaddress) AS sourceips, 127 | array_agg(DISTINCT(useragent) ORDER BY useragent) AS useragents FROM cloudtrail 128 | WHERE eventname <> 'AssumeRole' 129 | AND eventname NOT LIKE 'Get%' 130 | AND eventname NOT LIKE 'List%' 131 | AND eventname NOT LIKE 'Describe%' 132 | AND date_partition >= '2021/07/01' 133 | AND date_partition <= '2021/07/31' 134 | AND account_partition = '111122223333' 135 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 136 | GROUP BY useridentity.arn, useridentity.principalid, useridentity.username 137 | 138 | -- IAM change summary 139 | -- * filter read-only GET/LIST/DESCRIBE 140 | -- * filter unsuccessful calls 141 | SELECT eventtime, useridentity.arn, useridentity.username, eventname, requestparameters 142 | FROM cloudtrail 143 | WHERE eventsource = 'iam.amazonaws.com' 144 | AND eventname NOT LIKE 'Get%' 145 | AND eventname NOT LIKE 'List%' 146 | AND eventname NOT LIKE 'Describe%' 147 | AND errorcode IS NULL 148 | AND date_partition >= '2021/07/01' 149 | AND date_partition <= '2021/07/31' 150 | AND account_partition = '111122223333' 151 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 152 | ORDER BY account_partition, eventtime 153 | 154 | -- Access Key creations with extract of username and keyid 155 | -- * filter unsuccessful calls 156 | SELECT eventtime, useridentity.arn, useridentity.username, eventname, 157 | JSON_EXTRACT_SCALAR(JSON_EXTRACT(responseelements, '$.accessKey'), '$.userName') AS userName, 158 | JSON_EXTRACT_SCALAR(JSON_EXTRACT(responseelements, '$.accessKey'), '$.accessKeyId') AS accessKey 159 | FROM cloudtrail 160 | WHERE eventname = 'CreateAccessKey' 161 | AND errorcode IS NULL 162 | AND date_partition >= '2021/07/01' 163 | AND date_partition <= '2021/07/31' 164 | AND account_partition = '111122223333' 165 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 166 | ORDER BY account_partition, eventtime 167 | 168 | -- Password changes with extract of username 169 | -- * filter unsuccessful calls 170 | SELECT eventtime, useridentity.arn, useridentity.username, eventname, 171 | JSON_EXTRACT_SCALAR(requestparameters, '$.userName') AS "username with password modified" 172 | FROM cloudtrail 173 | WHERE eventname IN ('UpdateLoginProfile', 'CreateLoginProfile') 174 | AND errorcode IS NULL 175 | AND date_partition >= '2021/07/01' 176 | AND date_partition <= '2021/07/31' 177 | AND account_partition = '111122223333' 178 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 179 | ORDER BY account_partition, eventtime 180 | 181 | -- Identify API events made from a public IP (i.e. a non-RFC1918 source IP address) 182 | -- NOTE: this is an example of using the new IPADDRESS data type, as string a string comparison would correctly compare IP addresses 183 | SELECT * 184 | FROM cloudtrail 185 | WHERE regexp_like(sourceipaddress, '^\d') 186 | AND NOT ( (CAST(sourceipaddress AS IPADDRESS) > IPADDRESS '10.0.0.0' 187 | AND CAST(sourceipaddress AS IPADDRESS) < IPADDRESS '10.255.255.255') 188 | OR (CAST(sourceipaddress AS IPADDRESS) > IPADDRESS '172.16.0.0' 189 | AND CAST(sourceipaddress AS IPADDRESS) < IPADDRESS '172.31.255.255') 190 | OR (CAST(sourceipaddress AS IPADDRESS) > IPADDRESS '192.168.0.0' 191 | AND CAST(sourceipaddress AS IPADDRESS) < IPADDRESS '192.168.255.255')) 192 | AND date_partition >= '2021/07/01' 193 | AND date_partition <= '2021/07/31' 194 | AND account_partition = '111122223333' 195 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 196 | 197 | -- Create optimized ORC columnar format table for a single account and region for the past 90 days 198 | -- NOTE: single query limit is 100 partitions, to add additional accounts, regions, or days use the following INSERT INTO method 199 | -- Reference: https://docs.aws.amazon.com/athena/latest/ug/ctas-insert-into.html 200 | CREATE TABLE cloudtrail_orc 201 | WITH (format = 'ORC', orc_compression = 'SNAPPY', partitioned_by = ARRAY['account_partition','region_partition','date_partition'] ) AS 202 | SELECT eventversion, 203 | useridentity, 204 | eventtime, 205 | eventsource, 206 | eventname, 207 | awsregion, 208 | sourceipaddress, 209 | useragent, 210 | errorcode, 211 | errormessage, 212 | requestparameters, 213 | responseelements, 214 | additionaleventdata, 215 | requestid, 216 | eventid, 217 | resources, 218 | eventtype, 219 | apiversion, 220 | readonly, 221 | recipientaccountid, 222 | serviceeventdetails, 223 | sharedeventid, 224 | vpcendpointid, 225 | account_partition, 226 | region_partition, 227 | date_partition 228 | FROM cloudtrail 229 | WHERE account_partition = '111122223333' 230 | AND region_partition = 'us-east-1' 231 | AND date_partition >= date_format(date_add('day',-90,current_timestamp), '%Y/%m/%d') 232 | 233 | -- Add optimized ORC columnar format table for a single account and region for the past 90 days 234 | -- NOTE: single query limit is 100 partitions, to add additional accounts, regions, or days keep using this INSERT INTO method 235 | -- Reference: https://docs.aws.amazon.com/athena/latest/ug/ctas-insert-into.html 236 | INSERT INTO cloudtrail_orc 237 | SELECT eventversion, 238 | useridentity, 239 | eventtime, 240 | eventsource, 241 | eventname, 242 | awsregion, 243 | sourceipaddress, 244 | useragent, 245 | errorcode, 246 | errormessage, 247 | requestparameters, 248 | responseelements, 249 | additionaleventdata, 250 | requestid, 251 | eventid, 252 | resources, 253 | eventtype, 254 | apiversion, 255 | readonly, 256 | recipientaccountid, 257 | serviceeventdetails, 258 | sharedeventid, 259 | vpcendpointid, 260 | account_partition, 261 | region_partition, 262 | date_partition 263 | FROM cloudtrail 264 | WHERE account_partition = '111122223333' 265 | AND region_partition = 'us-east-2' 266 | AND date_partition >= date_format(date_add('day',-90,current_timestamp), '%Y/%m/%d') -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/sql/dml/analytics/dns/dns_demo_queries.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: Apache-2.0 4 | */ 5 | 6 | -- PREVIEW TABLE 7 | -- preview first 10 rows with all fields, quick way to verify everything is setup correctly 8 | 9 | SELECT * from r53dns 10 | LIMIT 10; 11 | 12 | -- PARTITION TESTS 13 | /* NOTE: if there are no constraints a partition (account, region, or date) then by default ALL data will be scanned 14 | this could lead to costly query, always consider using at least one partition constraint. 15 | 16 | Note that this is the case even if you have other constraints in a query (e.g. sourceaddress = '192.0.2.1'), 17 | only constraints using partition fields (date_partition, region_partition, account_partition) 18 | will limit the amount of data scanned. 19 | */ 20 | 21 | -- preview first 10 rows with all fields, limited to a single account 22 | SELECT * from r53dns 23 | WHERE account_partition = '111122223333' 24 | LIMIT 10; 25 | 26 | -- preview first 10 rows with all fields, limited to multiple accounts 27 | SELECT * from r53dns 28 | WHERE account_partition in ('111122223333','444455556666','123456789012') 29 | LIMIT 10; 30 | 31 | -- preview first 10 rows with all fields, limited to a single vpc 32 | SELECT * from r53dns 33 | WHERE vpc_partition = 'vpc-00000001' 34 | LIMIT 10; 35 | 36 | -- preview first 10 rows with all fields, limited to multiple vpcs 37 | SELECT * from r53dns 38 | WHERE vpc_partition in ('vpc-00000001','vpc-00000002','vpc-00000003') 39 | LIMIT 10; 40 | 41 | -- NOTE: date_partition format is 'YYYY/MM/DD' as a string 42 | -- preview first 10 rows with all fields, limited to a certain date range 43 | SELECT * from r53dns 44 | WHERE date_partition >= '2020/07/01' 45 | AND date_partition <= '2020/07/31' 46 | LIMIT 10; 47 | 48 | -- preview first 10 rows with all fields, limited to the past 30 days (relative) 49 | SELECT * from r53dns 50 | WHERE date_partition >= date_format(date_add('day',-30,current_timestamp), '%Y/%m/%d') 51 | LIMIT 10; 52 | 53 | -- preview first 10 rows with all fields, limited by a combination partition constraints 54 | -- NOTE: narrowing the scope of the query as much as possible will improve performance and minimize cost 55 | SELECT * from r53dns 56 | WHERE date_partition >= '2020/07/01' 57 | AND date_partition <= '2020/07/31' 58 | AND account_partition = '111122223333' 59 | AND vpc_partition in ('vpc-00000001','vpc-00000002','vpc-00000003') 60 | LIMIT 10; 61 | 62 | -- ANALYSIS EXAMPLES 63 | 64 | -- Sort queries by requestor instance count and query count 65 | SELECT query_name, query_type, array_distinct(filter(array_agg(srcids), q -> q.instance IS NOT NULL)) as instances, 66 | cardinality(array_distinct(filter(array_agg(srcids), q -> q.instance IS NOT NULL))) as query_count 67 | FROM r53dns 68 | WHERE date_partition >= '2020/07/01' 69 | AND date_partition <= '2020/07/31' 70 | AND account_partition = '111122223333' 71 | AND vpc_partition in ('vpc-00000001','vpc-00000002','vpc-00000003') 72 | GROUP BY query_name, query_type 73 | ORDER by query_count DESC; 74 | 75 | -- Summary with count of each time a name name queried 76 | SELECT query_name, query_type, count(*) as query_count FROM r53dns 77 | WHERE date_partition >= '2020/07/01' 78 | AND date_partition <= '2020/07/31' 79 | AND account_partition = '111122223333' 80 | AND vpc_partition in ('vpc-00000001','vpc-00000002','vpc-00000003') 81 | GROUP BY query_name, query_type 82 | ORDER BY query_count DESC; 83 | 84 | -- Summary with count of each time a AAAA record name name queried 85 | SELECT query_name, query_type, count(*) as query_count FROM r53dns 86 | WHERE query_type <> 'AAAA' 87 | AND date_partition >= '2020/07/01' 88 | AND date_partition <= '2020/07/31' 89 | AND account_partition = '111122223333' 90 | AND vpc_partition in ('vpc-00000001','vpc-00000002','vpc-00000003') 91 | GROUP BY query_name, query_type 92 | ORDER BY query_count DESC; 93 | 94 | -- Summary with count of each time a AAAA record name name queried 95 | -- split out TLD and SLD (note: doesn't properly handle TLDs containing a '.' (e.g. .com.br) 96 | SELECT element_at(split(query_name,'.'),-2) AS tld, 97 | element_at(split(query_name,'.'),-3) AS sld, 98 | query_name, query_type, 99 | count(*) AS query_count 100 | FROM r53dns 101 | WHERE query_type <> 'AAAA' 102 | AND date_partition >= '2020/07/01' 103 | AND date_partition <= '2020/07/31' 104 | AND account_partition = '111122223333' 105 | AND vpc_partition in ('vpc-00000001','vpc-00000002','vpc-00000003') 106 | GROUP BY query_name, query_type 107 | ORDER BY query_count DESC; 108 | 109 | -- Get records that that resolve to a specific IP 110 | SELECT * FROM r53dns 111 | WHERE contains(transform(answers, x-> x.rdata), '203.0.113.2') 112 | AND date_partition >= '2020/07/01' 113 | AND date_partition <= '2020/07/31' 114 | AND account_partition = '111122223333' 115 | AND vpc_partition in ('vpc-00000001','vpc-00000002','vpc-00000003'); 116 | -------------------------------------------------------------------------------- /AWSSecurityAnalyticsBootstrap/sql/dml/analytics/vpcflow/vpcflow_demo_queries.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: Apache-2.0 4 | */ 5 | 6 | -- PREVIEW TABLE 7 | -- preview first 10 rows with all fields, quick way to verify everything is setup correctly 8 | 9 | SELECT * from vpcflow 10 | LIMIT 10; 11 | 12 | -- PARTITION TESTS 13 | /* NOTE: if there are no constraints a partition (account, region, or date) then by default ALL data will be scanned 14 | this could lead to costly query, always consider using at least one partition constraint. 15 | 16 | Note that this is the case even if you have other constraints in a query (e.g. sourceaddress = '192.0.2.1'), 17 | only constraints using partition fields (date_partition, region_partition, account_partition) 18 | will limit the amount of data scanned. 19 | */ 20 | 21 | -- preview first 10 rows with all fields, limited to a single account 22 | SELECT * from vpcflow 23 | WHERE account_partition = '111122223333' 24 | LIMIT 10; 25 | 26 | -- preview first 10 rows with all fields, limited to multiple accounts 27 | SELECT * from vpcflow 28 | WHERE account_partition in ('111122223333','444455556666','123456789012') 29 | LIMIT 10; 30 | 31 | -- preview first 10 rows with all fields, limited to a single region 32 | SELECT * from vpcflow 33 | WHERE region_partition = 'us-east-1' 34 | LIMIT 10; 35 | 36 | -- preview first 10 rows with all fields, limited to multiple regions 37 | SELECT * from vpcflow 38 | WHERE region_partition in ('us-east-1','us-east-2','us-west-2') 39 | LIMIT 10; 40 | 41 | -- NOTE: date_partition format is 'YYYY/MM/DD' as a string 42 | -- preview first 10 rows with all fields, limited to a certain date range 43 | SELECT * from vpcflow 44 | WHERE date_partition >= '2020/07/01' 45 | AND date_partition <= '2020/07/31' 46 | LIMIT 10; 47 | 48 | -- preview first 10 rows with all fields, limited to the past 30 days (relative) 49 | SELECT * from vpcflow 50 | WHERE date_partition >= date_format(date_add('day',-30,current_timestamp), '%Y/%m/%d') 51 | LIMIT 10; 52 | 53 | -- preview first 10 rows with all fields, limited by a combination partition constraints 54 | -- NOTE: narrowing the scope of the query as much as possible will improve performance and minimize cost 55 | SELECT * from vpcflow 56 | WHERE date_partition >= '2020/07/01' 57 | AND date_partition <= '2020/07/31' 58 | AND account_partition = '111122223333' 59 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 60 | LIMIT 10; 61 | 62 | -- ANALYSIS EXAMPLES 63 | -- NOTE: default partition constraints have been provided for each query, 64 | -- be sure to add the appropriate partition constraints to the WHERE clause as shown above 65 | /* 66 | DEFAULT partition constraints: 67 | WHERE date_partition >= '2020/07/01' 68 | AND date_partition <= '2020/07/31' 69 | AND account_partition = '111122223333' 70 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 71 | 72 | Be sure to modify or remove these to fit the scope of your intended analysis 73 | */ 74 | 75 | -- Get list source/destination IP pairs ordered by the number of records 76 | SELECT region, instanceid, sourceaddress, destinationaddress, count(*) as record_count FROM vpcflow 77 | WHERE date_partition >= '2020/07/01' 78 | AND date_partition <= '2020/07/31' 79 | AND account_partition = '111122223333' 80 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 81 | GROUP BY region, instanceid, sourceaddress, destinationaddress 82 | ORDER BY record_count DESC 83 | 84 | -- Get a summary of records between a given source/destination IP pair, ordered by the total number of bytes 85 | SELECT region, instanceid, sourceaddress, destinationaddress, sum(numbytes) as byte_count FROM vpcflow 86 | WHERE (sourceaddress = '192.0.2.1' OR destinationaddress = '192.0.2.1') 87 | AND (sourceaddress = '203.0.113.2' OR destinationaddress = '203.0.113.2') 88 | AND date_partition >= '2020/07/01' 89 | AND date_partition <= '2020/07/31' 90 | AND account_partition = '111122223333' 91 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 92 | GROUP BY region, instanceid, sourceaddress, destinationaddress 93 | ORDER BY byte_count DESC 94 | 95 | -- Get a summary of the number of bytes sent from port 443 limited to a single instance 96 | -- NOTE: for remote IPs this represents the amount data downloaded from port 443 by the instance, 97 | -- for instance IPs this represents the amount data downloaded by remost hosts from the instance on port 443 98 | SELECT region, instanceid, sourceaddress, sourceport, destinationaddress, sum(numbytes) as byte_count FROM vpcflow 99 | WHERE instanceid = 'i-000000000000000' 100 | AND sourceport = 443 101 | AND date_partition >= '2020/07/01' 102 | AND date_partition <= '2020/07/31' 103 | AND account_partition = '111122223333' 104 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 105 | GROUP BY region, instanceid, sourceaddress, sourceport, destinationaddress 106 | ORDER BY byte_count DESC 107 | 108 | -- Get a summary of the number of bytes sent to port 443 limited to a single instance 109 | -- NOTE: for remote IPs this represents the amount data uploaded to port 443 by the instance, 110 | -- for instance IPs this represents the amount data uploaded by remost hosts to the instance on port 443 111 | SELECT region, instanceid, sourceaddress, destinationaddress, destinationport, sum(numbytes) as byte_count FROM vpcflow 112 | WHERE instanceid = 'i-000000000000000' 113 | AND destinationport = 443 114 | AND date_partition >= '2020/07/01' 115 | AND date_partition <= '2020/07/31' 116 | AND account_partition = '111122223333' 117 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 118 | GROUP BY region, instanceid, sourceaddress, destinationaddress, destinationport 119 | ORDER BY byte_count DESC 120 | 121 | -- Get a summary with the number of bytes for each src_ip,src_port,dst_ip,dst_port quad across all records to or from a specific IP 122 | SELECT sourceaddress, destinationaddress, sourceport, destinationport, sum(numbytes) as byte_count FROM vpcflow 123 | WHERE (sourceaddress = '192.0.2.1' OR destinationaddress = '192.0.2.1') 124 | AND date_partition >= '2020/07/01' 125 | AND date_partition <= '2020/07/31' 126 | AND account_partition = '111122223333' 127 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 128 | GROUP BY sourceaddress, destinationaddress, sourceport, destinationport 129 | ORDER BY byte_count DESC 130 | 131 | -- Get all flow records between two IPs showing flow_direction (requires v5 flow-direction field to be enabled) 132 | SELECT from_unixtime(starttime) AS start_time, 133 | from_unixtime(endtime) AS end_time, 134 | interfaceid, 135 | sourceaddress, 136 | destinationaddress, 137 | sourceport, 138 | destinationport, 139 | numpackets, 140 | numbytes, 141 | flow_direction, 142 | action 143 | FROM vpcflow 144 | WHERE (sourceaddress = '192.0.2.1' 145 | AND destinationaddress = '192.0.2.254') 146 | OR (sourceaddress = '192.0.2.254' 147 | AND destinationaddress = '192.0.2.1') 148 | ORDER BY starttime ASC 149 | 150 | -- List when source ips were first seen / last seen with a summary of destination ip/instances/ports 151 | SELECT sourceaddress, 152 | min(starttime) AS first_seen, 153 | max(endtime) AS last_seen, 154 | array_agg(DISTINCT(destinationaddress)), 155 | array_agg(DISTINCT(instanceid)), 156 | array_agg(DISTINCT(destinationport)) 157 | FROM vpcflow 158 | WHERE destinationport < 32768 -- skip ephemeral ports, since we're looking for inbound connections to service ports 159 | AND date_partition >= '2020/07/01' 160 | AND date_partition <= '2020/07/31' 161 | AND account_partition = '111122223333' 162 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 163 | GROUP BY sourceaddress 164 | ORDER by first_seen ASC 165 | 166 | 167 | -- Daily Transfer Report on Top 10 Internal IPs with large transfers, limited to source addresses in network 192.0.2.0/24 168 | SELECT vpcflow.event_date, vpcflow.sourceaddress, vpcflow.destinationaddress, sum(vpcflow.numbytes) as byte_count 169 | FROM vpcflow 170 | INNER JOIN (SELECT sourceaddress, sum(numbytes) as byte_count FROM vpcflow 171 | WHERE sourceaddress like '192.0.2.%' 172 | AND date_partition >= '2020/07/01' 173 | AND date_partition <= '2020/07/31' 174 | AND account_partition = '111122223333' 175 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 176 | GROUP BY region, instanceid, sourceaddress, destinationaddress, destinationport 177 | ORDER BY byte_count DESC 178 | LIMIT 10) as top_n 179 | ON top_n.sourceaddress = vpcflow.sourceaddress 180 | WHERE date_partition >= '2020/07/01' 181 | AND date_partition <= '2020/07/31' 182 | AND account_partition = '111122223333' 183 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 184 | GROUP BY vpcflow.event_date, vpcflow.sourceaddress, vpcflow.destinationaddress 185 | ORDER BY vpcflow.event_date ASC, vpcflow.sourceaddress ASC, vpcflow.destinationaddress ASC, byte_count DESC 186 | 187 | -- Search for traffic between a private (RFC1918) IP address and a public (non-RFC1918) IP address 188 | -- NOTE: this is an example of using the new IPADDRESS data type, as string a string comparison would correctly compare IP addresses 189 | SELECT * 190 | FROM vpcflow 191 | WHERE logstatus = 'OK' -- required to filter out '-' in the src/dst IP fields 192 | AND ( ( 193 | NOT ( (CAST(sourceaddress AS IPADDRESS) > IPADDRESS '10.0.0.0' 194 | AND CAST(sourceaddress AS IPADDRESS) < IPADDRESS '10.255.255.255') 195 | OR (CAST(sourceaddress AS IPADDRESS) > IPADDRESS '172.16.0.0' 196 | AND CAST(sourceaddress AS IPADDRESS) < IPADDRESS '172.31.255.255') 197 | OR (CAST(sourceaddress AS IPADDRESS) > IPADDRESS '192.168.0.0' 198 | AND CAST(sourceaddress AS IPADDRESS) < IPADDRESS '192.168.255.255')) 199 | 200 | AND ( (CAST(destinationaddress AS IPADDRESS) > IPADDRESS '10.0.0.0' 201 | AND CAST(destinationaddress AS IPADDRESS) < IPADDRESS '10.255.255.255') 202 | OR (CAST(destinationaddress AS IPADDRESS) > IPADDRESS '172.16.0.0' 203 | AND CAST(destinationaddress AS IPADDRESS) < IPADDRESS '172.31.255.255') 204 | OR (CAST(destinationaddress AS IPADDRESS) > IPADDRESS '192.168.0.0' 205 | AND CAST(destinationaddress AS IPADDRESS) < IPADDRESS '192.168.255.255')) 206 | ) 207 | 208 | OR ( 209 | NOT ( (CAST(destinationaddress AS IPADDRESS) > IPADDRESS '10.0.0.0' 210 | AND CAST(destinationaddress AS IPADDRESS) < IPADDRESS '10.255.255.255') 211 | OR (CAST(destinationaddress AS IPADDRESS) > IPADDRESS '172.16.0.0' 212 | AND CAST(destinationaddress AS IPADDRESS) < IPADDRESS '172.31.255.255') 213 | OR (CAST(destinationaddress AS IPADDRESS) > IPADDRESS '192.168.0.0' 214 | AND CAST(destinationaddress AS IPADDRESS) < IPADDRESS '192.168.255.255')) 215 | 216 | AND ( (CAST(sourceaddress AS IPADDRESS) > IPADDRESS '10.0.0.0' 217 | AND CAST(sourceaddress AS IPADDRESS) < IPADDRESS '10.255.255.255') 218 | OR (CAST(sourceaddress AS IPADDRESS) > IPADDRESS '172.16.0.0' 219 | AND CAST(sourceaddress AS IPADDRESS) < IPADDRESS '172.31.255.255') 220 | OR (CAST(sourceaddress AS IPADDRESS) > IPADDRESS '192.168.0.0' 221 | AND CAST(sourceaddress AS IPADDRESS) < IPADDRESS '192.168.255.255')) 222 | ) 223 | ) 224 | AND date_partition >= '2020/07/01' 225 | AND date_partition <= '2020/07/31' 226 | AND account_partition = '111122223333' 227 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 228 | 229 | -- ******************************************************************************************* 230 | -- Queries Enabled by Amazon Athena v3 231 | -- 232 | -- NOTE: Queries in the section below require Athena Engine version 3 or later to be enabled. 233 | -- If you're used the latest version of AWS Security Analytics Bootstrap, 234 | -- these queries are already supported. If you need to update your environment to 235 | -- use the latest Amazon Athena engine version, review these instructions: 236 | -- https://docs.aws.amazon.com/athena/latest/ug/engine-versions-changing.html 237 | -- ******************************************************************************************* 238 | 239 | -- Search for all internal-to-internal VPC Flow records for the internal VPC Subnets in the private 172.16.0.0/12 address space 240 | SELECT * FROM vpcflow 241 | WHERE logstatus = 'OK' -- required to filter out '-' in the src/dst IP fields 242 | AND contains('172.16.0.0/12', cast(sourceaddress as IPADDRESS)) 243 | AND contains('172.16.0.0/12', cast(destinationaddress as IPADDRESS)) 244 | AND date_partition >= '2020/07/01' 245 | AND date_partition <= '2020/07/31' 246 | AND account_partition = '111122223333' 247 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') 248 | 249 | -- Search for all VPC Flow records _except_ the internal-to-internal records for VPC Subnets in the private 172.16.0.0/12 address space 250 | -- e.g. only return VPC Flow records between internal and external IP addresses 251 | SELECT * FROM vpcflow 252 | WHERE logstatus = 'OK' -- required to filter out '-' in the src/dst IP fields 253 | AND NOT ( 254 | contains('172.16.0.0/12', cast(sourceaddress as IPADDRESS)) 255 | AND contains('172.16.0.0/12', cast(destinationaddress as IPADDRESS)) 256 | ) 257 | AND date_partition >= '2020/07/01' 258 | AND date_partition <= '2020/07/31' 259 | AND account_partition = '111122223333' 260 | AND region_partition in ('us-east-1','us-east-2','us-west-2', 'us-west-2') -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Changelog - AWS Security Analytics Bootstrap 2 | 3 | ## [1.1.3] - 2023-06-12 4 | - Added two new example IAM queries for Amazon Security Lake OCSF formatted Cloudtrail 5 | - Updated all Amazon Security Lake example queries to the GA schema and OSCF formats (resolves issue #13) 6 | - Split Amazon Security Lake example queries for Cloudtrail by event type (e.g. management, lambda, s3) 7 | - Added Amazon Security Lake example queries for Cloudtrail Lambda data events 8 | - Minor: moved preview-formatted example queries to a new branch AmazonSecurityLake_Preview for reference 9 | - Minor: updated start and end time to be the same for all Amazon Security Lake OCSF examples queries (resolves issue #11) 10 | - Minor: added Security Hub Amazon Security Lake queries to all query list 11 | 12 | ## [1.1.2] - 2022-02-02 13 | - Added Amazon Security Lake OSCF formatted queries for Security Hub (contributed by Charles Roberts) 14 | 15 | ## [1.1.1] - 2022-01-10 16 | - Coverted demo queries into Amazon Security Lake OSCF formatted queries 17 | 18 | ## [1.1.0] - 2022-10-17 19 | - Updated to use Amazon Athena engine to v3 (latest) [link](https://docs.aws.amazon.com/athena/latest/ug/engine-versions-reference-0003.html) 20 | - Added new demo VPC Flow log queries for Athena engine v3 21 | 22 | ## [1.0.0] - 2021-07-02 23 | Initial Release under Apache License Version 2.0 24 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 25 | 26 | ### New Features 27 | #### CloudFormation Templates 28 | - CloudFormation template to deploy a ready to use AWS Security Analytics Environment 29 | - Ready to deploy, parameterized with walkthrough comments 30 | - Athena Workgroup 31 | - Encrypted output to specified location 32 | - Demo named queries 33 | - Glue Database 34 | - Glue Tables 35 | - AWS Cloudtrail 36 | - partitioned by account, region, and date with dynamic partition projection configuration 37 | - Amazon Virtual Private Cloud (VPC) Flow Logs 38 | - partitioned by account, region, and date with dynamic partition projection configuration 39 | - Amazon Route53 DNS Resolve Logs 40 | - partitioned by account, VPC-id, and date with dynamic partition projection configuration 41 | - CloudFormation template to deploy IAM admin and user roles 42 | - Provides IAM policy examples to start using Athena following principle of least privilege 43 | - CloudFormation template to enable VPC Flow Logs with all availble fields (v2-v5) for a specified VPC, Subnet, or ENI 44 | #### CREATE TABLE SQL statements 45 | - Enables adhoc creation of Glue Tables via Athena SQL statement: 46 | - AWS Cloudtrail 47 | - partitioned by account, region, and date with dynamic partition projection configuration 48 | - Amazon Virtual Private Cloud (VPC) Flow Logs 49 | - partitioned by account, region, and date with dynamic partition projection configuration 50 | - Amazon Route53 DNS Resolve Logs 51 | - partitioned by account, VPC-id, and date with dynamic partition projection configuration 52 | #### Demo Athena Queries 53 | - AWS Cloudtrail 54 | - Amazon Virtual Private Cloud (VPC) Flow Logs 55 | - Amazon Route53 DNS Resolve Logs 56 | 57 | ### Added 58 | - Deployment Guide 59 | - README 60 | - Misc required project files 61 | 62 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *main* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | -------------------------------------------------------------------------------- /NOTICE: -------------------------------------------------------------------------------- 1 | AWS Security Analytics Bootstrap 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AWS Security Analytics Bootstrap 2 | 3 | AWS Security Analytics Bootstrap enables customers to perform security investigations on AWS service logs by providing an Amazon Athena analysis environment that's quick to deploy, ready to use, and easy to maintain. 4 | 5 | If you're ready to deploy Athena Analytics Bootstrap, jump directly to [Deploying AWS Security Analytics Bootstrap](#deploying-aws-security-analytics-bootstrap) 6 | 7 | ## AWS Security Analytics Bootstrap Use Cases 8 | AWS Security Analytics Bootstrap is for Amazon Web Services (AWS) customers who need a quick method to setup Athena and perform investigations on AWS service logs archived in Amazon S3 buckets. 9 | 10 | AWS Security Analytics Bootstrap is designed to be ready to use for common security investigation use-cases, quick to deploy, and easy to maintain. AWS Security Analytics Bootstrap provides AWS CloudFormation templates to quickly create a fully configured Athena analysis environment including an Amazon Athena Workgroup, AWS Glue Databases, AWS Glue Tables, and Demo Athena Queries. AWS Security Analytics Bootstrap supports common security investigation requirements including partitioning and searches across multiple accounts, regions, and dates. AWS Security Analytics Bootstrap uses [Partition Projection with Amazon Athena](https://docs.aws.amazon.com/athena/latest/ug/partition-projection.html) to provide dynamic partitioning across accounts, regions, and dates without any additional infrastructure, code or frequent maintenance. Partitioning AWS service log data by account, region, and/or date will enable AWS customers to create targeted queries reducing their cost and query times. 11 | 12 | ### Common Use Cases for AWS Security Analytics Bootstrap are When AWS Customers: 13 | - Want to search AWS service logs natively in AWS 14 | - Currently have no SIEM available/accessible 15 | - Need to search logs beyond SIEM retention period 16 | - Don't have the required AWS service logs indexed in the SIEM 17 | - Need to investigate an AWS account which hasn't been centralizing its logs 18 | 19 | ## AWS Service Logs Currently Supported by AWS Security Analytics Bootstrap 20 | AWS Service Log | Event Type 21 | ------|------| 22 | [AWS CloudTrail](https://docs.aws.amazon.com/cloudtrail/index.html) | AWS API Events (Management and Data Events) 23 | [Amazon Virtual Private Cloud (VPC) Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) | Network Events 24 | [Amazon Route 53 DNS resolver query logs](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resolver-query-logs.html) | DNS Resolution Events 25 | 26 | *NOTE:* We will be adding support for additional AWS Service Logs commonly used in security investigations, please feel free to submit or upvote your requests in `Issues` 27 | 28 | ## AWS Security Analytics Bootstrap Architecture Overview 29 | Single Account Deployment | Cross-Account Deployment 30 | :-------------------------:|:-------------------------: 31 | ![Athena Bootstrap Single Account Architecture](AWSSecurityAnalyticsBootstrap/docs/img/architecture/SingleAccount.png "Athena Bootstrap Single Account Architecture") | ![Athena Bootstrap Cross-Account Architecture](AWSSecurityAnalyticsBootstrap/docs/img/architecture/MultiAccount.png "Athena Bootstrap Cross-Account Architecture") 32 | 33 | ## Deploying AWS Security Analytics Bootstrap 34 | 35 | ### Prerequisites 36 | - AWS service logs (e.g. AWS CloudTrail, Amazon VPC Flow Logs, Amazon Route 53 Resolver Query Logs) must be delivered to Amazon S3 buckets unmodified in their native format 37 | - For [cross-account deployments](https://docs.aws.amazon.com/athena/latest/ug/cross-account-permissions.html) bucket policies must be in place and objects need to be owned by the bucket account owner to enable cross-account access 38 | - For logs encrypted via AWS KMS the AWS IAM principal(s) that will be used to submit Athena queries will need to have permissions for `kms:Decrypt` and `kms:DescribeKey` in their IAM policy and the KMS key policy will need to grant them the same access 39 | 40 | ### Getting Started 41 | 42 | 43 | The [Athena Infrastructure CloudFormation Template](AWSSecurityAnalyticsBootstrap/cfn/Athena_infra_setup.yml) will deploy a fully functional security analytics environment including: 44 | 45 | Resource | Notes 46 | ---------|-------- 47 | **Athena Workgroup** | - Configured to provide encrypted output to a specified S3 location
- Includes pre-configured demo queries as Named Queries 48 | **Glue Database** | - Contains associated Glue Tables 49 | **Glue Tables** | Standardized table schemas with dynamic partitions for account, region, and date for:
- CloudTrail Logs
- VPC Flow Logs
- Route53 DNS Resolver Logs 50 | 51 | **Deployment time:** ~10 minutes 52 | 53 | Comments are provided in the CloudFormation Parameters section to assist with the parameters required for deployment, and a detailed walk-through of the deployment process is provided in the [AWS Security Analytics Bootstrap Deployment Guide](AWSSecurityAnalyticsBootstrap/docs/aws_security_analytics_bootstrap_deployment_guide.md). 54 | 55 | ---- 56 | ## AWS Security Analytics Bootstrap Resources 57 | *Note:* The [Athena Infrastructure CloudFormation Template](AWSSecurityAnalyticsBootstrap/cfn/Athena_infra_setup.yml) can be deployed by itself or in combination with any of the additional resources depending on customers' use case(s) and requirements. 58 | 59 | | Resource Type | Resource | Resource Provides | Cleanup/Removal Notes | 60 | | :--- | :--- | :--- | :--- | 61 | | AWS CloudFormation Template | [Athena Infrastructure CloudFormation Template](AWSSecurityAnalyticsBootstrap/cfn/Athena_infra_setup.yml) | Creates the ready-to-use Athena security analytics environment including: Athena Workgroup, Glue Database, Glue Tables, and demo Named Queries. Comments are provided in the CloudFormation Parameters section to walk customers through deployment or customers can review the [AWS Security Analytics Bootstrap Deployment Guide](AWSSecurityAnalyticsBootstrap/docs/aws_security_analytics_bootstrap_deployment_guide.md) for more detail. | All resources created by this template will be deleted when the CloudFormation Stack is deleted. This will not affect the source log data. | 62 | | AWS CloudFormation Template | [IAM Roles and Policies for Athena Admin and Athena Analyst](AWSSecurityAnalyticsBootstrap/cfn/Athena_IAM_setup.yml) | Creates IAM Roles and Policies for a Athena Admin and Athena Analyst Roles designed according to least privilege principals| All resources created by this template will be deleted when the CloudFormation Stack is deleted. | 63 | | AWS CloudFormation Template | [Enable flow logs](AWSSecurityAnalyticsBootstrap/cfn/VPC_enable_flowlogs.yml) | Enables VPC Flow Logs for the specified VPC, Subnet, or ENI with [all fields through v5](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html#flow-logs-fields) in the order expected by Athena Bootstrap | The VPC Flow log configuration will be deleted when the CloudFormation Stack is deleted. Any logs created will need to be deleted separately from the target S3 bucket if desired. | 64 | | CREATE TABLE SQL Statement | [AWS CloudTrail Table Schema](AWSSecurityAnalyticsBootstrap/sql/ddl/create_tables/create_cloudtrail_table.sql) | Creates a Glue Table for CloudTrail Logs partitioned by account, region and date via Athena SQL query statement. This table is also created by the [Athena Infrastructure CloudFormation Template](AWSSecurityAnalyticsBootstrap/cfn/Athena_infra_setup.yml); this SQL statement can be used to create a table in an existing Athena environment for adhoc deployment use cases. "TODO" comments are included above sections which need to be updated with customers' environment details. | The table can be deleted with the Athena query statement `DROP TABLE ` (e.g. `DROP TABLE cloudtrail`) | 65 | | CREATE TABLE SQL Statement | [Amazon VPC Flow Logs Table Schema](AWSSecurityAnalyticsBootstrap/sql/ddl/create_tables/create_vpcflowlog_table.sql) | Creates a Glue Table for VPC Flow Logs partitioned by account, region and date via Athena SQL query statement. This table is also created by the [Athena Infrastructure CloudFormation Template](AWSSecurityAnalyticsBootstrap/cfn/Athena_infra_setup.yml); this SQL statement can be used to create a table in an existing Athena environment for adhoc deployment use cases. "TODO" comments are included above sections which need to be updated with customers' environment details. | The table can be deleted with the Athena query statement `DROP TABLE
` (e.g. `DROP TABLE vpcflowlogs`) | 66 | | CREATE TABLE SQL Statement | [Amazon Route 53 Resolver Query Logs Table Schema](AWSSecurityAnalyticsBootstrap/sql/ddl/create_tables/create_dnsquerylog_table.sql) | Creates a Glue Table for Route 53 DNS Resolver Logs partitioned by account, VPC ID and date via Athena SQL query statement. This table is also created by the [Athena Infrastructure CloudFormation Template](AWSSecurityAnalyticsBootstrap/cfn/Athena_infra_setup.yml); this SQL statement can be used to create a table in an existing Athena environment for adhoc deployment use cases. "TODO" comments are included above sections which need to be updated with customers' environment details. | The table can be deleted with the Athena query statement `DROP TABLE
` (e.g. `DROP TABLE r53dns`) | 67 | | Demo Athena Queries | [AWS CloudTrail Demo Queries](AWSSecurityAnalyticsBootstrap/sql/dml/analytics/cloudtrail/cloudtrail_demo_queries.sql) | Demo Athena queries for CloudTrail Logs. These queries area also created in the Athena Workgroup as Named Queries by the [Athena Infrastructure CloudFormation Template](AWSSecurityAnalyticsBootstrap/cfn/Athena_infra_setup.yml) | N/A - No resources created | 68 | | Demo Athena Queries | [Amazon VPC Flow Log Demo Queries](AWSSecurityAnalyticsBootstrap/sql/dml/analytics/vpcflow/vpcflow_demo_queries.sql) | Demo Athena queries for VPC Flow Logs. These queries area also created in the Athena Workgroup as Named Queries by the [Athena Infrastructure CloudFormation Template](AWSSecurityAnalyticsBootstrap/cfn/Athena_infra_setup.yml) | N/A - No resources created | 69 | | Demo Athena Queries | [Amazon Route 53 Resolver Query Log Demo Queries](AWSSecurityAnalyticsBootstrap/sql/dml/analytics/dns/dns_demo_queries.sql) | Demo Athena queries for Route 53 DNS Resolver Logs. These queries area also created in the Athena Workgroup as Named Queries by the [Athena Infrastructure CloudFormation Template](AWSSecurityAnalyticsBootstrap/cfn/Athena_infra_setup.yml) | N/A - No resources created | 70 | 71 | ## Currently Out of Scope 72 | - How data is provided to Amazon S3 buckets (e.g. configuration of logs) 73 | - Optimization of the underlying data sources (e.g. merging small files, converting to Parquet/ORC columnar formats) 74 | - Bucket policy updates for cross-account deployments 75 | - AWS KMS policy updates are currently out of scope 76 | 77 | ## Cost 78 | By installing AWS Security Analytics Bootstrap, AWS customers may incur charges from the following services: 79 | 80 | - Amazon Athena: https://aws.amazon.com/athena/pricing/ 81 | - Amazon S3: https://aws.amazon.com/s3/pricing/ 82 | - AWS KMS: https://aws.amazon.com/kms/pricing/ 83 | 84 | ## Related Resources 85 | 86 | ### Resources from AWS 87 | - [[Tool] Assisted Log Enabler for AWS](https://github.com/awslabs/assisted-log-enabler-for-aws) 88 | - [[Tool] AthenaGlueService Logs](https://github.com/awslabs/athena-glue-service-logs) 89 | - [[Service] Amazon Detective](https://aws.amazon.com/detective/) 90 | - [[Doc] Running SQL Queries Using Amazon Athena](https://docs.aws.amazon.com/athena/latest/ug/querying-athena-tables.html) 91 | - [[Doc] Querying AWS Service Logs](https://docs.aws.amazon.com/athena/latest/ug/querying-AWS-service-logs.html) 92 | - [[Blog] Analyze Security, Compliance, and Operational Activity Using AWS CloudTrail and Amazon Athena](https://aws.amazon.com/blogs/big-data/) 93 | - [[Blog] Athena Performance Tips](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/) 94 | - [[Q&A] CTAS Bucketing Guidance](https://aws.amazon.com/premiumsupport/knowledge-center/set-file-number-size-ctas-athena/) 95 | - [[Blog] Amazon Route 53 Resolver Query Logs](https://aws.amazon.com/blogs/aws/log-your-vpc-dns-queries-with-route-53-resolver-query-logs) 96 | - [[Doc] Configure Amazon Route 53 Resolver Query Logs](https://console.aws.amazon.com/route53resolver/home/query-logging#/query-logging/configure) (choose Amazon S3 destination) 97 | - [[Doc] Security Reference Architecture: Log Archive Account](https://docs.aws.amazon.com/prescriptive-guidance/latest/security-reference-architecture/log-archive.html) 98 | 99 | ### Third Party Resources 100 | - [[Guide] The Athena Guide](https://the.athena.guide/) 101 | - [[Doc] Presto Documentation (current)](https://prestodb.io/docs/current/) 102 | - [[Book] Presto The Definitive Guide (e-book)](https://www.starburst.io/wp-content/uploads/2020/04/OReilly-Presto-The-Definitive-Guide.pdf) 103 | 104 | ## Contributors 105 | AWS Security Analytics Bootstrap stands on the shoulders of countless giants and has benefited from the assistance of *MANY* collaborators and contributors. Thanks to everyone who has helped or inspired the project so far and thanks in advance to any future contributions. 106 | 107 | Many thanks for your contributions: 108 | - Aaron Lennon 109 | - Anna McAbee 110 | - Bohan Li 111 | - Brian Andrzejewski 112 | - Brian Poole 113 | - Casey Reniker 114 | - Ross Warren 115 | - Clayton Darnell 116 | - Cydney Stude 117 | - Freddy Kasprzykowski 118 | - Jason Hurst 119 | - Jonathon Poling 120 | - Joshua McKiddy 121 | - Justin Fry 122 | - Kyle Dickinson 123 | - Luis Maldonado 124 | - Marc Luescher 125 | - Matt Gurr 126 | - Matt Helgen 127 | - Matthew Harvey 128 | - Pathik Shah 129 | - Ravi Sankar Prasad Kadiri 130 | - Richard Billington 131 | - Ross Warren 132 | - Srinivas Ananda Babu 133 | - Theo Tolv 134 | 135 | ## Security 136 | 137 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. 138 | 139 | ## License 140 | 141 | This project is licensed under the Apache-2.0 License. 142 | 143 | --------------------------------------------------------------------------------