├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── Dockerfile ├── LICENSE ├── README.md ├── minimal_IAM_policy.json ├── requirements.txt └── src ├── account_analyser.py ├── requirements.txt ├── service_analyser.py ├── service_specific_analysers ├── cloudhsm_analyser.py ├── dax_analyser.py ├── dms_analyser.py ├── docdb_analyser.py ├── dx_analyser.py ├── efs_analyser.py ├── elasticache_analyser.py ├── fsx_analyser.py ├── globalaccelerator_analyser.py ├── lambda_analyser.py ├── memorydb_analyser.py ├── opensearch_analyser.py ├── rds_analyser.py ├── redshift_analyser.py ├── sgw_analyser.py └── vpce_analyser.py └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | src/__pycache__ 2 | src/service_specific_analysers/__pycache__ 3 | src/output/ 4 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *main* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3.9 2 | 3 | RUN apt-get update -y 4 | #RUN apt-get install -y python-pip python-dev build-essential 5 | 6 | COPY ./src/requirements.txt /src/ 7 | 8 | WORKDIR /src 9 | RUN pip install --no-cache-dir -r requirements.txt 10 | 11 | COPY ./src/*.py /src/ 12 | COPY ./src/service_specific_analysers/ /src/service_specific_analysers/ 13 | 14 | ENTRYPOINT ["python3", "./account_analyser.py"] 15 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 7 | the Software, and to permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 15 | 16 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Fault Tolerance Analyser 2 | 3 | ## __Table of Contents__ 4 | 1. [Description](#1-description) 5 | 2. [Motivation](#2-motivation) 6 | 3. [Permissions needed to run the tool](#3-permissions-needed-to-run-the-tool) 7 | 4. [Installation](#4-installation) 8 | 5. [Running the tool using Python directly](#5-running-the-tool-using-python-directly) 9 | 6. [Running the tool as a Docker container](#6-running-the-tool-as-a-docker-container) 10 | 7. [Functional Design](#7-functional-design) 11 | 7.1 [VPC Endpoints](#71-vpc-endpoints) 12 | 7.2 [Database Migration Service](#72-database-migration-service) 13 | 7.3 [DocumentDB Clusters](#73-documentdb) 14 | 7.4 [Storage Gateway](#74-storage-gateway) 15 | 7.5 [Elastic File System](#75-elastic-file-system) 16 | 7.6 [Opensearch](#76-opensearch) 17 | 7.7 [FSX](#77-fsx) 18 | 7.8 [Lambda](#78-lambda) 19 | 7.9 [Elasticache](#79-elasticache) 20 | 7.10 [Memory DB](#710-memory-db) 21 | 7.11 [DynamoDB Accelerator](#711-dynamodb-accelerator) 22 | 7.12 [Global Accelerator](#712-global-accelerator) 23 | 7.13 [Relational Database Service](#713-relational-database-service) 24 | 7.14 [Direct Connect](#714-direct-connect) 25 | 7.15 [Cloud HSM](#715-cloud-hsm) 26 | 7.16 [Redshift](#716-redshift) 27 | 8. [Non-Functional Design](#8-non-functional-design) 28 | 9. [Security](#9-security) 29 | 10. [Contributing](#10-contributing) 30 | 11. [Frequently Asked Questions (FAQ)](#11-frequently-asked-questions-faq) 31 | 12. [License](#12-license) 32 | 33 | ## __1. Description__ 34 | A tool to generate a list of potential fault tolerance issues across different services. Please note that these are only *potential* issues. 35 | 36 | There are a number of circumstances in which the finding may not pose a problem including, development workloads, cost, or not viewing this workload as business critical in the event of an AZ impacting event. 37 | 38 | The output is a csv file created locally and also uploaded to an S3 bucket (if provided). 39 | 40 | ## __2. Motivation__ 41 | The intent is to help customers check their workloads for any components with potential fault tolerance issues. 42 | 43 | ## __3. Permissions needed to run the tool__ 44 | 45 | You can run the script on an EC2 with an instance role, or on your own machine with the credentials exported using the usual AWS env variables (as below) or with a profile configured using `aws configure` CLI command 46 | 47 | ``` 48 | export AWS_ACCESS_KEY_ID=abc 49 | export AWS_SECRET_ACCESS_KEY=def 50 | export AWS_SESSION_TOKEN=ghi 51 | ``` 52 | 53 | These credentials are needed as the code will invoke AWS APIs to get information about different AWS services. Except when pushing the output file to the S3 bucket, assume_role API call if a role is passed in, all other operations are "read-only". Here are the list of APIs invoked: 54 | 55 | ``` 56 | #APIs invoked for common functionality like getting account information, list of regions, etc. 57 | STS.get_caller_identity 58 | STS.assume_role 59 | EC2.describe_regions 60 | Organizations.describe_account 61 | S3.put_object 62 | 63 | #APIs invoked for service specific fault tolerance analysis 64 | Lambda.list_functions 65 | StorageGateway.list_gateways 66 | OpenSearchService.list_domain_names 67 | OpenSearchService.describe_domains 68 | OpenSearchService.describe_domain 69 | ElastiCache.describe_cache_clusters 70 | ElastiCache.describe_replication_groups 71 | EFS.describe_file_systems 72 | DirectConnect.describe_connections 73 | DirectConnect.describe_virtual_interfaces 74 | FSx.describe_file_systems 75 | MemoryDB.describe_clusters 76 | DAX.describe_clusters 77 | DatabaseMigrationService.describe_replication_instances 78 | RDS.describe_db_instances 79 | RDS.describe_db_clusters 80 | EC2.describe_vpc_endpoints 81 | EC2.describe_instances 82 | ``` 83 | 84 | You can also provide an IAM role that the above provided profile can assume. 85 | 86 | If you want the least privileged policy to run this, the minimal permissions needed can be seen in minimal_IAM_policy.json. While most of the policy uses * format to provide permissions (because the tool needs to look at all resources of a specific type), but it is a good practice to specify the account id and a specific bucket name. So please replace all occurences of `account_id`, `bucket_name` and `output_folder_name` with the appropriate values. If you are passing an event bus arn to the tool to post events to the bus, then make sure you use the last section in the minimal_IAM_policy.json after modifying the `account_id`, `event-bus-region` and `event_bus_name`. If the event bus is in an account different from where the tool is being run, then make sure the resource policy on the event bus allows posting events from account the tool is running from. Reder to the [Example policy to send custom events in a different bus](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-event-bus-perms.html#eb-event-bus-example-policy-cross-account-custom-bus-source) 87 | 88 | The minimal IAM policy is written in a way that you can remove sections for the resource types that you do not want the tool to look at for your account. If, say, you do not want to run this tool for directconnect, you can remove the section with the sid `DirectConnect`. 89 | 90 | In the IAM policy provided, some SIDs have the suffix "ThatSupportAllResources" which means that the API calls included in that section, by default, work on all resources, and that you cannot specify specific resources. So a "*" there does not go against the best practice that wild cards should not be used in IAM policies. 91 | 92 | Sections with SIDs that have the suffix "ThatRequireWildcardResources" (used for Dynamo DB Accelerator and Direct Connect) are API calls where using the wildcard is unavoidable. 93 | 94 | In all other cases, the region and the resource name/id are wild cards as the tool needs to work across multiple regions and needs to look at all resources. 95 | 96 | ## __4. Installation__ 97 | 1. You will need Python 3.10+ for this tool. If you do not have Python installed, please install it from here: 98 | https://www.python.org/ 99 | 100 | 2. Clone the repo and install the dependencies with the following command: 101 | ``` 102 | pip install -r requirements.txt 103 | ``` 104 | 105 | 4. Once this is set up, you can run the tool as described in the next secion 106 | 107 | ## __5. Running the tool using Python directly__ 108 | Here is a simple example commmand on how you can run the script 109 | 110 | ``` 111 | cd src 112 | python3 account_analyser.py \ 113 | --regions us-east-1 \ 114 | --services lambda opensearch docdb rds \ 115 | --truncate-output 116 | ``` 117 | 118 | In the command above, the script is run for the us-east-1 region, and looks at the services Lambda, Opensearch, Document DB and RDS. It generates the csv file and writes it to the output sub folder in the folder it is run. The truncate-output option ensures that if there is any existing file it is truncated before the findings are added. 119 | 120 | Once the script finishes, check the subfolder output/ and you will see 2 files like below. 121 | 122 | ``` 123 | ls output/ 124 | 125 | Fault_Tolerance_Findings_2022_11_21_17_09_19.csv 126 | Fault_Tolerance_Findings_2022_11_21_17_09_19_run_report.csv 127 | ``` 128 | 129 | The output will look like this. This shows all the findings. 130 | 131 | ``` 132 | service,region,account_id,account_name,payer_account_id,payer_account_name,resource_arn,resource_name,resource_id,potential_issue,engine,message,timestamp 133 | lambda,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:lambda:us-east-1:123456789101:function:test1z,test1z,,True,,VPC Enabled lambda in only one subnet,2022_11_29_16_20_43_+0000 134 | lambda,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:lambda:us-east-1:123456789101:function:test2az,test2az,,False,,VPC Enabled lambda in more than one subnet,2022_11_29_16_20_43_+0000 135 | docdb,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:cluster:docdb-2022-07-08-13-05-30,docdb-2022-07-08-13-05-30,cluster-JKL,True,,Single AZ Doc DB Cluster,2022_11_29_16_20_43_+0000 136 | docdb,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:cluster:docdb-2022-07-19-09-35-14,docdb-2022-07-19-09-35-14,cluster-GHI,True,,Single AZ Doc DB Cluster,2022_11_29_16_20_43_+0000 137 | docdb,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:cluster:docdb-2022-11-10-12-43-07,docdb-2022-11-10-12-43-07,cluster-DEF,True,,Single AZ Doc DB Cluster,2022_11_29_16_20_43_+0000 138 | docdb,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:cluster:docdb-2022-11-10-12-44-23,docdb-2022-11-10-12-44-23,cluster-ABC,True,,Single AZ Doc DB Cluster,2022_11_29_16_20_43_+0000 139 | opensearch,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:es:us-east-1:123456789101:domain/test4,test4,123456789101/test4,True,,Single AZ domain,2022_11_29_16_20_44_+0000 140 | opensearch,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:es:us-east-1:123456789101:domain/test5,test5,123456789101/test5,True,,Single AZ domain,2022_11_29_16_20_44_+0000 141 | opensearch,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:es:us-east-1:123456789101:domain/test2,test2,123456789101/test2,False,,Multi AZ domain,2022_11_29_16_20_44_+0000 142 | opensearch,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:es:us-east-1:123456789101:domain/test3,test3,123456789101/test3,True,,Single AZ domain,2022_11_29_16_20_44_+0000 143 | opensearch,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:es:us-east-1:123456789101:domain/test6,test6,123456789101/test6,True,,Single AZ domain,2022_11_29_16_20_44_+0000 144 | opensearch,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:es:us-east-1:123456789101:domain/test1,test1,123456789101/test1,True,,Single AZ domain,2022_11_29_16_20_44_+0000 145 | rds,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:db:database-3,database-3,,True,sqlserver-ex,RDS Instance has MultiAZ disabled,2022_11_29_16_20_44_+0000 146 | rds,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:cluster:auroraclustersingleaz,auroraclustersingleaz,,True,aurora-mysql,DB Cluster has MultiAZ disabled,2022_11_29_16_20_44_+0000 147 | rds,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:cluster:aurora-mysql-multiaz,aurora-mysql-multiaz,,False,aurora-mysql,DB Cluster has MultiAZ enabled,2022_11_29_16_20_44_+0000 148 | rds,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:cluster:database-4,database-4,,False,postgres,DB Cluster has MultiAZ enabled,2022_11_29_16_20_44_+0000 149 | rds,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:cluster:mysql-cluster,mysql-cluster,,False,mysql,DB Cluster has MultiAZ enabled,2022_11_29_16_20_44_+0000 150 | ``` 151 | 152 | The run report will look like this. This gives an idea of how long each service+region combination took. 153 | ``` 154 | account_id,region,service,result,error_message,start_time,end_time,runtime_in_seconds 155 | 625787456381,us-east-1,opensearch,Success,,2022_11_29_16_20_42_+0000,2022_11_29_16_20_43_+0000,1.05 156 | 625787456381,us-east-1,lambda,Success,,2022_11_29_16_20_42_+0000,2022_11_29_16_20_43_+0000,1.12 157 | 625787456381,us-east-1,docdb,Success,,2022_11_29_16_20_42_+0000,2022_11_29_16_20_44_+0000,1.74 158 | 625787456381,us-east-1,rds,Success,,2022_11_29_16_20_42_+0000,2022_11_29_16_20_45_+0000,2.62 159 | 625787456381,Overall,Overall,N/A,N/A,2022_11_29_16_20_42_+0000,2022_11_29_16_20_45_+0000,2.68 160 | ``` 161 | 162 | The same files will also be pushed to an S3 bucket if you provide a bucket name as a command line argument. When you provide a bucket, please make sure the bucket is properly secured as the output from this tool will be written to that bucket, and it could contain sensitive information (like names of RDS instances or other configuration detail) that you might not want to share widely. 163 | 164 | 165 | Use the option --help to look at all the options. Here are the options. 166 | 167 | ``` 168 | python3 account_analyser.py --help 169 | usage: account_analyser.py -s {vpce,dms,docdb,sgw,efs,opensearch,fsx,lambda,elasticache,dax,globalaccelerator,rds,memorydb,dx,ALL} 170 | [{vpce,dms,docdb,sgw,efs,opensearch,fsx,lambda,elasticache,dax,globalaccelerator,rds,memorydb,dx,ALL} ...] -r REGIONS [REGIONS ...] [-h] 171 | [-m MAX_CONCURRENT_THREADS] [-o OUTPUT_FOLDER_NAME] [-b BUCKET_NAME] [--event-bus-arn EVENT_BUS_ARN] [--aws-profile AWS_PROFILE_NAME] 172 | [--aws-assume-role AWS_ASSUME_ROLE_NAME] [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--single-threaded] [--truncate-output] [--filename-with-accountid] 173 | [--report-only-issues] 174 | 175 | Generate fault tolerance findings for different services 176 | 177 | Required arguments: 178 | -s {vpce,dms,docdb,sgw,efs,opensearch,fsx,lambda,elasticache,dax,globalaccelerator,rds,memorydb,dx,ALL} [{vpce,dms,docdb,sgw,efs,opensearch,fsx,lambda,elasticache,dax,globalaccelerator,rds,memorydb,dx,cloudhsm,ALL} ...], --services {vpce,dms,docdb,sgw,efs,opensearch,fsx,lambda,elasticache,dax,globalaccelerator,rds,memorydb,dx,cloudhsm,ALL} [{vpce,dms,docdb,sgw,efs,opensearch,fsx,lambda,elasticache,dax,globalaccelerator,rds,memorydb,dx,cloudhsm,ALL} ...] 179 | Indicate which service(s) you want to fetch fault tolerance findings for. Options are ['vpce', 'dms', 'docdb', 'sgw', 'efs', 'opensearch', 'fsx', 'lambda', 'elasticache', 'dax', 180 | 'globalaccelerator', 'rds', 'memorydb', 'dx', 'cloudhsm']. Use 'ALL' for all services 181 | -r REGIONS [REGIONS ...], --regions REGIONS [REGIONS ...] 182 | Indicate which region(s) you want to fetch fault tolerance findings for. Use "ALL" for all approved regions 183 | 184 | Optional arguments: 185 | -h, --help show this message and exit 186 | -m MAX_CONCURRENT_THREADS, --max-concurrent-threads MAX_CONCURRENT_THREADS 187 | Maximum number of threads that will be running at any given time. Default is 20 188 | -o OUTPUT_FOLDER_NAME, --output OUTPUT_FOLDER_NAME 189 | Name of the folder where findings output csv file and the run report csv file will be written. If it does not exist, it will be created. If a bucket name is also provided, then 190 | the folder will be looked for under the bucket, and if not present, will be created If a bucket name is not provided, then this folder will be expected under the directory in 191 | which the script is ran. In case a bucket is provided, the files will be generated in this folder first and then pushed to the bucket Please ensure there is a forward slash '/' 192 | at the end of the folder path Output file name will be of the format Fault_Tolerance_Findings___.csv. Example: 193 | Fault_Tolerance_Findings_123456789101_TestAccount_2022_11_01.csv If you do not use the --filename-with-accountid option, the output file name will be of the format: 194 | Fault_Tolerance_Findings_.csv. Example: Fault_Tolerance_Findings_2022_11_01.csv 195 | -b BUCKET_NAME, --bucket BUCKET_NAME 196 | Name of the bucket where findings output csv file and the run report csv file will be uploaded to 197 | --event-bus-arn EVENT_BUS_ARN 198 | ARN of the event bus in AWS Eventbridge to which findings will be published. 199 | --aws-profile AWS_PROFILE_NAME 200 | Use this option if you want to pass in an AWS profile already congigured for the CLI 201 | --aws-assume-role AWS_ASSUME_ROLE_NAME 202 | Use this option if you want the aws profile to assume a role before querying Org related information 203 | --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} 204 | Log level. Needs to be one of the following: 'DEBUG','INFO','WARNING','ERROR','CRITICAL' 205 | --single-threaded Use this option to specify that the service+region level information gathering threads should not run in parallel. Default is False, which means the script uses multi-threading 206 | by default. Same effect as setting max-running-threads to 1 207 | --truncate-output Use this flag to make sure that if the output file already exists, the file is truncated. Default is False. Useful if you are invoking this script to refresh findings within 208 | the same day (on a different day, the output file will have a different file name) 209 | --filename-with-accountid 210 | Use this flag to include account id in the output file name. By default this is False, meaning, account id will not be in the file name. The default mode is useful if you are 211 | running the script for more than one account, and want all the accounts' findings to be in the same output file. 212 | --report-only-issues Use this flag to report only findings that are potential issues. Resources that have no identified issues will not appear in the final csv file. Default is to report all 213 | findings. 214 | 215 | 216 | ``` 217 | 218 | ## __6. Running the tool as a Docker container__ 219 | 220 | Instead of installing Python and the dependencies, you can just use the Docker file and run the tool as a container. Here is how to do it. 221 | 222 | 1. Clone the repo and bulid the image by running the command `docker build . -t fault_tolerance_analyser` 223 | 224 | 2. If you are using an AWS profile use the following command (note how the credentials file is mapped into the container so that the container will have access to the credentials). Also note that the second volume being mapped is the folder into which the output file to be written. 225 | 226 | ``` 227 | docker run \ 228 | -v $HOME/.aws/credentials:/root/.aws/credentials:ro \ 229 | -v $PWD/src/output/:/src/output/:rw \ 230 | fault_tolerance_analyser \ 231 | --regions us-east-1 \ 232 | --services lambda opensearch docdb rds \ 233 | --truncate-output 234 | ``` 235 | 236 | 3. If you are using AWS credentials exported as env variables you can run it as below. You can remove AWS_SESSION_TOKEN if you are using long term credentials 237 | 238 | ``` 239 | docker run \ 240 | -v $PWD/src/output/:/src/output/:rw \ 241 | -e AWS_ACCESS_KEY_ID \ 242 | -e AWS_SECRET_ACCESS_KEY \ 243 | -e AWS_SESSION_TOKEN \ 244 | fault_tolerance_analyser \ 245 | --regions us-east-1 \ 246 | --services lambda opensearch docdb rds \ 247 | --truncate-output 248 | ``` 249 | 250 | 4. If you are running on an EC2 machine with an IAM role associated with the machine, then you can just run it without env variables or credential files as below. 251 | 252 | ``` 253 | docker run \ 254 | -v $PWD/src/output/:/src/output/:rw \ 255 | fault_tolerance_analyser \ 256 | --regions us-east-1 \ 257 | --services lambda opensearch docdb rds \ 258 | --truncate-output 259 | ``` 260 | 261 | ## __7. Functional Design__ 262 | 263 | ### 7.1 VPC Endpoints 264 | It is a best practice to make sure that VPC Interface Endpoints have ENIs in more than one subnet. If a VPC endpoint has an ENI in only a single subnet, this tool will flag that as a potential issue. You cannot create VPC Endpoints in 2 different subnets in the same AZ. So, for the purpose of VPC endpoints, having multiple subnets implies multiple AZs. 265 | 266 | ### 7.2 Database Migration Service 267 | If the DMS Replication Instance is not configured with at least 2 instances in different availability zones, then it will be flagged as a potential issue. 268 | 269 | Reference: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReplicationInstance.html 270 | 271 | ### 7.3 DocumentDB 272 | If the Document DB Cluster does not have a replica in a different AZ, it will be flagged as a potential issue. 273 | 274 | Reference: https://docs.aws.amazon.com/documentdb/latest/developerguide/failover.html 275 | 276 | ### 7.4 Storage Gateway 277 | Storage Gateway, when deployed on AWS, runs on a single Amazon EC2 instance. Therefore this is a single-point of availability failure for any applications expecting highly available access to application storage. Such storage gateways will be flagged as part of this check as a potential issue. 278 | 279 | Customers who are running Storage Gateway as a mechanism for providing file-based application storage that require high-availability should consider migrating their workloads to Amazon EFS, FSx, or other storage services that can provide higher availability architectures than Storage Gateway. 280 | 281 | ### 7.5 Elastic File System 282 | This check tags both of the following scenarios as potential issues: 283 | 1. Running an "EFS One Zone" deployment 284 | 2. Running "EFS Standard" class deployment with a mount target in only one AZ. 285 | 286 | Customers that have chosen to deploy a One Zone class of storage, should ensure these workloads are not "mission-critical" that require high-availability and that the choice was made appropriately. 287 | 288 | For customers identified that are running a Standard class EFS deployment, where multi-az replication is provided by the service, they have only a single mount target to access their file systems. If an availability issue were to occur in that availability zone, the customer would lose access to the EFS deployment, even though other AZs/subnets were unaffected. 289 | 290 | ### 7.6 Opensearch 291 | Any single-node domains, as well as OpenSearch domains with multiple nodes all of which are deployed within the same Availability Zone would be flagged as a potential issue by this tool. 292 | 293 | ### 7.7 FSx 294 | Any FSx Windows systems deployed as Single-AZ is flagged as a potential issue by this tool. 295 | 296 | Customers have the option to choose a Mulit-AZ or Single-AZ deployment when creating their file server deployment. 297 | 298 | ### 7.8 Lambda 299 | Any Lambda function that is configured only to execute in a single Availability Zone are flagged as a potential issue. 300 | Reference: https://docs.aws.amazon.com/lambda/latest/dg/security-resilience.html 301 | 302 | ### 7.9 Elasticache 303 | The following clusters are flagged as potential Single AZ issues 304 | 305 | 1. All Memcached clusters - Data is not replicated between memcached cluster nodes. Even if a customer has deployed nodes across multiple availability zones, the data present on any nodes that have a loss of availability (related to those hosts or their AZ) will result in the data in those cache nodes being unavailable as well. 306 | 307 | 2. Redis clusters - The following clusters are taggeed as a issue 308 | 2.1 Any single node clusters 309 | 2.2 Any "Cluster Mode" disabled clusters. 310 | 2.3 Any "Cluster Mode" enabled clusters with at least one Node group having all the nodes in the same AZ. 311 | 2.4 "Cluster Mode" enabled clusters but Auto Failover disabled. 312 | 2.5 "Cluster Mode" enabled clusters having shards with no replicas. 313 | 314 | Reference: https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Replication.Redis-RedisCluster.html 315 | 316 | ### 7.10 Memory DB 317 | Any Memory DB cluster that has a single node in a shard is flagged as a potential issue by this tool. 318 | 319 | ### 7.11 DynamoDB Accelerator 320 | Any single-node clusters, as well as DAX clusters with multiple nodes all deployed within the same Availability Zone would be flagged as being a potential issue by this tool. 321 | 322 | ### 7.12 Global Accelerator 323 | Any "Standard" Global accelerators that are configured to target endpoints consisting only of EC2 instances in a single Availability Zone are flagged by this tool. "Custom Routing" Global Accelerators are not covered. 324 | 325 | ### 7.13 Relational Database Service 326 | Any single AZ RDS Instance or Cluster is flagged as a potential issue by this tool. 327 | 328 | ### 7.14 Direct Connect 329 | The following scenarios are flagged as potential issue by this tool: 330 | 1. Any region with a single Direct Connect connection. 331 | 2. Any region where there is more than one direct connection, but all of them use the same location. 332 | 3. Any Virtual Gateway with only one VIF 333 | 4. Any Virtual Gateway with more than one VIF but all of the VIFs on the same direct connect Connection. 334 | 335 | ### 7.15 Cloud HSM 336 | The following scenarios are flagged as potential issue by this tool: 337 | 1. Any cluster with a single HSM. 338 | 2. Any cluster with multiple HSMs all of which are in a single AZ. 339 | 340 | ### 7.16 Redshift 341 | Any Redshift cluster with all its nodes in a single AZ will be flagged as a potential issue by this tool. 342 | 343 | ## __8. Non-Functional Design__ 344 | 345 | There are two main classes: 346 | 347 | ### ServiceAnalyser 348 | The ServiceAnalyser is an abstract class from which all the service specific analysers are inherited. The service specific analysers contain the logic to identify potential issues for a given region. 349 | 350 | ### AccountAnalyser 351 | An object of this class is initiated as part of the "main" functionality. This loops through all the services and regions and instantiates the service specific analyser for each region+service combination and triggers the method to gather the findings in that service specific analyser. Once the findings are received, it writes it to a file. 352 | 353 | The AccountAnalyser logic can run either in multi-threaded or single-threaded mode. In multi-threaded mode, the analyser for each service+region combination runs in a separate thread. This is the default mode. This saves a lot of time as there are 14 analysers running making API calls and that too across multiple regions. 354 | 355 | In multi-threaded mode, care is taken to ensure that when writing the findings to an output file, multiple threads do not try to do it at the same time (with the help of a lock). 356 | 357 | When all the analysers are run, the output file is uploaded to an S3 bucket, if provided. 358 | 359 | ## __9. Security__ 360 | 361 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. 362 | 363 | ## __10. Contributing__ 364 | 365 | See [CONTRIBUTING](CONTRIBUTING.md#contributing-via-pull-requests) for more information. 366 | 367 | ## __11. Frequently Asked Questions (FAQ)__ 368 | 369 | ### What is the purpose of the Fault Tolerance tool? 370 | * The Fault Tolerance tool is designed to identify potential single points of failure across different AWS services (see listed of supported services [Functional Design](#7-functional-design)) in your account. By detecting resources that could cause disruption to the system if they were to fail, the tool helps customers build more fault tolerant workloads in AWS. This is in line with the guidance provided by the [Well-Architected Framework - Reliability Pillar - Use fault isolation to protect your workload](https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/use-fault-isolation-to-protect-your-workload.html), which emphasizes the use of fault isolation to protect workloads. 371 | 372 | ### How does the Fault Tolerance tool work? 373 | * The Fault Tolerance tool analyzes on your AWS account and generates a report that highlights potential fault tolerance issues across various AWS services. This report allows customers to identify resources where a single failure could disrupt the system and take steps to add redundancy to prevent downtime. 374 | 375 | ### What permissions are needed to run the tool? 376 | * In order to check the required permission for the tool, please check the detailed instructions on the [Permissions needed to run the tool](#3-permissions-needed-to-run-the-tool) section. 377 | 378 | ### How can I install and run the Fault Tolerance Analyser tool? 379 | * In order to install the tool, please check the detailed instructions on the [Installation](#4-installation) section. 380 | 381 | ### What are the potential fault tolerance issues identified by the tool? 382 | * The Fault Tolerance Analyser tool checks various AWS services for potential issues. Some examples of issues identified include VPC endpoints with ENIs in a single subnet, DMS replication instances without multi-AZ configuration, DocumentDB clusters without replicas in different AZs, Lambda functions executing in a single AZ, and more. The [Functional Design](#7-functional-design) section provides a detailed explanation of risks for each supported service analyzed. 383 | 384 | ### Can the tool be run as a Docker container? 385 | * Yes, the Fault Tolerance Analyser tool can be run as a Docker container. The [Running the tool as a Docker container](#6-running-the-tool-as-a-docker-container) section provides instructions on how to build the Docker image using the provided Dockerfile. You can then run the tool as a container, either with an AWS profile, AWS credentials exported as environment variables, or on an EC2 machine with an associated IAM role. The necessary volume mappings and command examples are provided in the [Running the tool as a Docker container](#6-running-the-tool-as-a-docker-container) secion. 386 | 387 | ### Does the tool support uploading the generated report to an S3 bucket? 388 | * Yes, the tool supports uploading the generated CSV report to an S3 bucket. Customers can provide the name of the bucket as a command-line argument when running the tool. Once the analysis is completed, the tool will upload the report file to the specified S3 bucket, allowing customers to centralize the findings and access them from a secure and controlled location. You can provide the `-b BUCKET_NAME` or `--bucket BUCKET_NAME` flag when running the tool to specify the S3 bucket to have the reports output. 389 | 390 | ### Can I integrate the tool with my internal ticketing systems to track findings? 391 | * Yes, the tool supports sending findings to an Amazon Eventbridge event bus. This allows for integrating with any other system easily. You can use the `--event-bus-arn` option to provide the ARN of the event bus. 392 | 393 | ### How is this tool different from Trusted Advisor and Resilience Hub? 394 | * The fault tolerance tool described here is a fully open-source tool, released under the MIT license, designed to generate a list of potential fault tolerance issues specific to different AWS services. It focuses on identifying potential issues related to fault tolerance and provides a detailed report that helps customers assess the fault tolerance of their workloads. Customers have the ability to customize and deploy the tool as per their requirements, making it a flexible and adaptable solution. 395 | * [AWS Trusted Advisor](https://aws.amazon.com/premiumsupport/technology/trusted-advisor/) is a service that draws upon best practices learned from serving hundreds of thousands of AWS customers. It inspects your AWS environment and provides recommendations when opportunities exist to save money, improve system availability and performance, or close security gaps. It specifically operates across five unique areas: Security, Performance, Cost Optimization, Fault Tolerance, and AWS Service Quotas. 396 | * [AWS Resilience Hub](https://aws.amazon.com/resilience-hub/) offers a centralized location to define, validate, and track the resiliency of your AWS applications. It helps protect your applications from disruptions, reduces recovery costs, and optimizes business continuity. You can describe your applications using AWS CloudFormation, Terraform state files, AWS Resource Groups, or choose from applications already defined in AWS Service Catalog AppRegistry. 397 | 398 | 399 | ### What is the open-source nature of this tool, and can customers contribute to its development? 400 | * This tool is built using some open-source technologies and follows an open-source development approach. The code for this tool is hosted on a public repository (aws-samples), allowing customers to access and contribute to its development. Customers can submit bug reports, feature requests, and even contribute enhancements or additional service-specific analyzers. Open-source collaboration promotes transparency and encourages community involvement in improving the tool's functionality. Check the [Contributing](#10-contributing) section to learn more. 401 | 402 | ## __12. License__ 403 | 404 | This library is licensed under the MIT-0 License. See the LICENSE file. 405 | -------------------------------------------------------------------------------- /minimal_IAM_policy.json: -------------------------------------------------------------------------------- 1 | { 2 | "Version": "2012-10-17", 3 | "Statement": [ 4 | { 5 | "Sid": "VPCEThatSupportAllResources", 6 | "Effect": "Allow", 7 | "Action": [ 8 | "ec2:DescribeVpcEndpoints" 9 | ], 10 | "Resource": "*" 11 | }, 12 | { 13 | "Sid": "LambdaThatSupportAllResources", 14 | "Effect": "Allow", 15 | "Action": [ 16 | "lambda:ListFunctions" 17 | ], 18 | "Resource": "*" 19 | }, 20 | { 21 | "Sid": "FSXThatSupportAllResources", 22 | "Effect": "Allow", 23 | "Action": [ 24 | "fsx:DescribeFileSystems" 25 | ], 26 | "Resource": "*" 27 | }, 28 | { 29 | "Sid": "DMSThatSupportAllResources", 30 | "Effect": "Allow", 31 | "Action": [ 32 | "dms:DescribeReplicationInstances", 33 | "dms:DescribeReplicationTasks" 34 | ], 35 | "Resource": "*" 36 | }, 37 | { 38 | "Sid": "SGWThatSupportAllResources", 39 | "Effect": "Allow", 40 | "Action": [ 41 | "storagegateway:ListGateways" 42 | ], 43 | "Resource": "*" 44 | }, 45 | { 46 | "Sid": "CommonAPIsThatSupportAllResources", 47 | "Effect": "Allow", 48 | "Action": [ 49 | "sts:GetCallerIdentity", 50 | "ec2:DescribeRegions", 51 | "organizations:DescribeOrganization" 52 | ], 53 | "Resource": "*" 54 | }, 55 | { 56 | "Sid": "CommonAPIs", 57 | "Effect": "Allow", 58 | "Action": [ 59 | "organizations:DescribeAccount" 60 | ], 61 | "Resource": [ 62 | "arn:aws:organizations:::account/o-*/*" 63 | ] 64 | }, 65 | { 66 | "Sid": "DAXThatRequireWildcardResources", 67 | "Effect": "Allow", 68 | "Action": [ 69 | "dax:DescribeClusters" 70 | ], 71 | "Resource": [ 72 | "*" 73 | ] 74 | }, 75 | { 76 | "Sid": "DXThatRequireWildcardResources", 77 | "Effect": "Allow", 78 | "Action": [ 79 | "directconnect:DescribeConnections", 80 | "directconnect:DescribeVirtualInterfaces" 81 | ], 82 | "Resource": [ 83 | "*" 84 | ] 85 | }, 86 | { 87 | "Sid": "Elasticache", 88 | "Effect": "Allow", 89 | "Action": [ 90 | "elasticache:DescribeReplicationGroups", 91 | "elasticache:DescribeCacheClusters" 92 | ], 93 | "Resource": [ 94 | "arn:aws:elasticache:*::replicationgroup:*", 95 | "arn:aws:elasticache:*::cluster:*" 96 | ] 97 | }, 98 | { 99 | "Sid": "MemoryDB", 100 | "Effect": "Allow", 101 | "Action": [ 102 | "memorydb:DescribeClusters" 103 | ], 104 | "Resource": [ 105 | "arn:aws:memorydb:*::cluster/*" 106 | ] 107 | }, 108 | { 109 | "Sid": "RDSAndDocumentDB", 110 | "Effect": "Allow", 111 | "Action": [ 112 | "rds:DescribeDBInstances", 113 | "rds:DescribeDBClusters" 114 | ], 115 | "Resource": [ 116 | "arn:aws:rds:*::db:*", 117 | "arn:aws:rds:*::cluster:*" 118 | ] 119 | }, 120 | { 121 | "Sid": "Opensearch", 122 | "Effect": "Allow", 123 | "Action": [ 124 | "es:DescribeDomain", 125 | "es:DescribeDomains" 126 | ], 127 | "Resource": [ 128 | "arn:aws:es:*::domain/*" 129 | ] 130 | }, 131 | { 132 | "Sid": "OpensearchThatSupportAllResources", 133 | "Effect": "Allow", 134 | "Action": [ 135 | "es:ListDomainNames" 136 | ], 137 | "Resource": [ 138 | "*" 139 | ] 140 | }, 141 | { 142 | "Sid": "AGA", 143 | "Effect": "Allow", 144 | "Action": [ 145 | "globalaccelerator:ListEndpointGroups", 146 | "globalaccelerator:ListListeners" 147 | ], 148 | "Resource": [ 149 | "arn:aws:globalaccelerator:::accelerator/*", 150 | "arn:aws:globalaccelerator:::accelerator/*/listener/*" 151 | ] 152 | }, 153 | { 154 | "Sid": "AGAThatSupportAllResources", 155 | "Effect": "Allow", 156 | "Action": [ 157 | "ec2:DescribeInstances", 158 | "globalaccelerator:ListAccelerators" 159 | ], 160 | "Resource": [ 161 | "*" 162 | ] 163 | }, 164 | { 165 | "Sid": "EFS", 166 | "Effect": "Allow", 167 | "Action": [ 168 | "elasticfilesystem:DescribeFileSystems" 169 | ], 170 | "Resource": [ 171 | "arn:aws:elasticfilesystem:*::file-system/*" 172 | ] 173 | }, 174 | { 175 | "Sid": "CloudHSMThatSupportAllResources", 176 | "Effect": "Allow", 177 | "Action": [ 178 | "cloudhsm:DescribeClusters" 179 | ], 180 | "Resource": [ 181 | "arn:aws:elasticfilesystem:*::file-system/*" 182 | ] 183 | }, 184 | { 185 | "Sid": "S3", 186 | "Effect": "Allow", 187 | "Action": [ 188 | "s3:PutObject" 189 | ], 190 | "Resource": [ 191 | "arn:aws:s3::://*" 192 | ] 193 | }, 194 | { 195 | "Sid": "EventBusPermissions", 196 | "Effect": "Allow", 197 | "Action": [ 198 | "events:PutEvents" 199 | ], 200 | "Resource": [ 201 | "arn:aws:events:::event-bus/" 202 | ] 203 | } 204 | ] 205 | } 206 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | boto3==1.26.124 2 | botocore==1.29.124 3 | -------------------------------------------------------------------------------- /src/account_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import threading 5 | import csv 6 | import time 7 | import logging 8 | import utils 9 | import boto3 10 | import botocore 11 | import datetime 12 | import utils 13 | import os 14 | 15 | from service_specific_analysers.vpce_analyser import VPCEAnalyser 16 | from service_specific_analysers.docdb_analyser import DocDBAnalyser 17 | from service_specific_analysers.dms_analyser import DMSAnalyser 18 | from service_specific_analysers.sgw_analyser import SGWAnalyser 19 | from service_specific_analysers.efs_analyser import EFSAnalyser 20 | from service_specific_analysers.opensearch_analyser import OpensearchAnalyser 21 | from service_specific_analysers.fsx_analyser import FSXAnalyser 22 | from service_specific_analysers.lambda_analyser import LambdaAnalyser 23 | from service_specific_analysers.elasticache_analyser import ElasticacheAnalyser 24 | from service_specific_analysers.dax_analyser import DAXAnalyser 25 | from service_specific_analysers.globalaccelerator_analyser import GlobalAcceleratorAnalyser 26 | from service_specific_analysers.rds_analyser import RDSAnalyser 27 | from service_specific_analysers.memorydb_analyser import MemoryDBAnalyser 28 | from service_specific_analysers.dx_analyser import DXAnalyser 29 | from service_specific_analysers.cloudhsm_analyser import CloudHSMAnalyser 30 | from service_specific_analysers.redshift_analyser import RedshiftAnalyser 31 | 32 | from collections import namedtuple 33 | 34 | class AccountAnalyser(): 35 | 36 | analyser_classes = {} 37 | analyser_classes['vpce'] = VPCEAnalyser 38 | analyser_classes['docdb'] = DocDBAnalyser 39 | analyser_classes['dms'] = DMSAnalyser 40 | analyser_classes['sgw'] = SGWAnalyser 41 | analyser_classes['efs'] = EFSAnalyser 42 | analyser_classes['opensearch'] = OpensearchAnalyser 43 | analyser_classes['fsx'] = FSXAnalyser 44 | analyser_classes['lambda'] = LambdaAnalyser 45 | analyser_classes['elasticache'] = ElasticacheAnalyser 46 | analyser_classes['dax'] = DAXAnalyser 47 | analyser_classes['globalaccelerator'] = GlobalAcceleratorAnalyser 48 | analyser_classes['rds'] = RDSAnalyser 49 | analyser_classes['memorydb'] = MemoryDBAnalyser 50 | analyser_classes['dx'] = DXAnalyser 51 | analyser_classes['cloudhsm'] = CloudHSMAnalyser 52 | analyser_classes['redshift'] = RedshiftAnalyser 53 | 54 | def __init__ (self): 55 | #self.services = services 56 | #self.regions = regions 57 | self.lock = threading.Lock() 58 | self.threads = [] 59 | self.account_name = '' 60 | self.payer_account_id = '' 61 | self.payer_account_name = '' 62 | self.run_report = [] 63 | 64 | utils.get_config_info() 65 | 66 | self.account_id = utils.config_info.account_id 67 | self.thread_limiter = threading.BoundedSemaphore(utils.config_info.max_concurrent_threads) 68 | 69 | #Write out an empty csv file with the headers 70 | self.keys = [ 71 | 'service', 72 | 'region', 73 | 'account_id', 74 | 'account_name', 75 | 'payer_account_id', 76 | 'payer_account_name', 77 | 'resource_arn', 78 | 'resource_name', 79 | 'resource_id', 80 | 'potential_issue', 81 | 'engine', #Used for Elasticache, Memory DB and RDS 82 | 'message', 83 | 'timestamp' 84 | ] 85 | 86 | self.get_account_level_information() 87 | 88 | curr_time = datetime.datetime.now() 89 | tm = curr_time.strftime("%Y_%m_%d") 90 | 91 | #Build output file names, either with or without the account id based on the config information 92 | if utils.config_info.filename_with_accountid: 93 | self.output_file_name = f"Fault_Tolerance_Findings_{self.account_id}_{self.account_name}_{tm}.csv" 94 | self.run_report_file_name = f"Fault_Tolerance_Findings_{self.account_id}_{self.account_name}_{tm}_run_report.csv" 95 | else: 96 | self.output_file_name = f"Fault_Tolerance_Findings_{tm}.csv" 97 | self.run_report_file_name = f"Fault_Tolerance_Findings_{tm}_run_report.csv" 98 | 99 | self.output_file_full_path = f"{utils.config_info.output_folder_name}{self.output_file_name}" 100 | self.run_report_file_full_path = f"{utils.config_info.output_folder_name}{self.run_report_file_name}" 101 | 102 | self.create_or_truncate_file = False 103 | 104 | if utils.config_info.truncate_output: 105 | self.create_or_truncate_file = True #If truncate mode is set to True, then create/truncate file 106 | else: 107 | if not os.path.isfile(self.output_file_full_path): 108 | self.create_or_truncate_file = True #If truncate mode is set to False but file does not already exist, then create the file 109 | 110 | #If the folder does not exist, create it. 111 | os.makedirs(os.path.dirname(utils.config_info.output_folder_name), exist_ok=True) 112 | 113 | if self.create_or_truncate_file: #If create or truncate file is true then open the file in 'w' mode and write the header 114 | with open(self.output_file_full_path, 'w', newline='') as output_file: 115 | dict_writer = csv.DictWriter(output_file, self.keys) 116 | dict_writer.writeheader() 117 | 118 | def get_findings(self): 119 | start = datetime.datetime.now().astimezone() 120 | 121 | for region in utils.config_info.regions: 122 | for service in utils.config_info.services: 123 | analyser = self.analyser_classes[service](account_analyser = self, region = region) 124 | if utils.config_info.single_threaded: 125 | analyser.get_and_write_findings() 126 | else: 127 | t = threading.Thread(target = analyser.get_and_write_findings, name = f"{service}+{region}") 128 | self.threads.append(t) 129 | t.start() 130 | 131 | #If running in multi threaded mode wait for all threads to finish 132 | if not utils.config_info.single_threaded: 133 | for t in self.threads: 134 | t.join() 135 | 136 | end = datetime.datetime.now().astimezone() 137 | 138 | self.run_report.append( 139 | { 140 | 'account_id' : self.account_id, 141 | 'region' : 'Overall', 142 | 'service' : 'Overall', 143 | 'result' : 'N/A', 144 | 'error_message' : 'N/A', 145 | 'start_time' : start.strftime("%Y_%m_%d_%H_%M_%S%z"), 146 | 'end_time' : end.strftime("%Y_%m_%d_%H_%M_%S%z"), 147 | 'runtime_in_seconds' : round((end-start).total_seconds(), 2) 148 | } 149 | ) 150 | 151 | logging.info(f"Total time taken for the account {self.account_id} is {end-start} seconds") 152 | self.write_run_report() 153 | 154 | if utils.config_info.bucket_name: 155 | self.push_files_to_s3() 156 | 157 | def write_run_report(self): 158 | run_report_keys = self.run_report[0].keys() 159 | if self.create_or_truncate_file: #Same behaviour as the findings output file. If a new findings file is created or it is truncated, then create or truncate the run_report too. 160 | file_open_mode = 'w' 161 | else: 162 | file_open_mode = 'a+' 163 | with open(self.run_report_file_full_path, file_open_mode, newline='') as output_file: 164 | dict_writer = csv.DictWriter(output_file, run_report_keys) 165 | if self.create_or_truncate_file: 166 | dict_writer.writeheader() 167 | dict_writer.writerows(self.run_report) 168 | 169 | def push_files_to_s3(self): 170 | session = utils.get_aws_session(session_name = 'UploadFilesToS3') 171 | s3 = session.client("s3") 172 | try: 173 | response = s3.upload_file(self.output_file_full_path, utils.config_info.bucket_name, utils.config_info.output_folder_name+self.output_file_name) 174 | logging.info(f"Uploaded output file {utils.config_info.output_folder_name+self.output_file_name} to bucket {utils.config_info.bucket_name}") 175 | 176 | response = s3.upload_file(self.run_report_file_full_path, utils.config_info.bucket_name, utils.config_info.output_folder_name+self.run_report_file_name) 177 | logging.info(f"Uploaded run report file {utils.config_info.output_folder_name+self.output_file_name} to bucket {utils.config_info.bucket_name}") 178 | 179 | except botocore.exceptions.ClientError as error: 180 | logging.error(error) 181 | 182 | def get_account_level_information(self): 183 | session = utils.get_aws_session(session_name = 'InitialAccountInfoGathering') 184 | org = session.client("organizations") 185 | try: 186 | acct_info = org.describe_account(AccountId = self.account_id) 187 | self.account_name = acct_info["Account"]["Name"] 188 | except botocore.exceptions.ClientError as error: 189 | if error.response['Error']['Code'] == 'AWSOrganizationsNotInUseException': 190 | logging.info(f"Account {self.account_id} is not part of an AWS Organization") 191 | self.account_name = '' 192 | self.payer_account_id = 'N/A' 193 | self.payer_account_name = 'N/A' 194 | return 195 | else: 196 | raise error 197 | 198 | org_info = org.describe_organization() 199 | self.payer_account_id = org_info["Organization"]["MasterAccountId"] 200 | 201 | payer_account_info = org.describe_account(AccountId = self.payer_account_id) 202 | self.payer_account_name = payer_account_info["Account"]["Name"] 203 | 204 | if __name__ == "__main__": 205 | #Create an instance of the Account level analyser and trigger the get_findings function. 206 | ara = AccountAnalyser() 207 | ara.get_findings() 208 | -------------------------------------------------------------------------------- /src/requirements.txt: -------------------------------------------------------------------------------- 1 | boto3==1.26.13 2 | botocore==1.29.13 3 | -------------------------------------------------------------------------------- /src/service_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | from abc import ABCMeta, abstractmethod 5 | import utils 6 | from collections import namedtuple 7 | import botocore 8 | import time 9 | import logging 10 | import datetime 11 | import csv 12 | import json 13 | 14 | class ServiceAnalyser(metaclass = ABCMeta): 15 | 16 | def __init__ (self, account_analyser, region, service): 17 | self.service = service 18 | self.region = region 19 | self.account_analyser = account_analyser 20 | self.findings = [] 21 | self.session = None 22 | 23 | def get_aws_session(self): 24 | if self.session: 25 | return self.session 26 | else: 27 | return utils.get_aws_session(session_name = f"{self.service}_{self.region}_FaultToleranceAnalyser") 28 | 29 | @utils.log_func 30 | def get_and_write_findings(self): 31 | 32 | with self.account_analyser.thread_limiter: 33 | start = datetime.datetime.now().astimezone() 34 | 35 | try: 36 | self.get_findings() 37 | self.write_findings() 38 | end = datetime.datetime.now().astimezone() 39 | logging.info(f"Completed processing {self.service}+{self.region} in {round((end-start).total_seconds(), 2)} seconds.") 40 | self.account_analyser.run_report.append( 41 | { 42 | 'account_id' : self.account_analyser.account_id, 43 | 'region' : self.region, 44 | 'service' : self.service, 45 | 'result' :'Success', 46 | 'error_message' :'', 47 | 'start_time' : start.strftime("%Y_%m_%d_%H_%M_%S%z"), 48 | 'end_time' : end.strftime("%Y_%m_%d_%H_%M_%S%z"), 49 | 'runtime_in_seconds' : round((end-start).total_seconds(), 2) 50 | } 51 | ) 52 | except botocore.exceptions.BotoCoreError as error: 53 | end = datetime.datetime.now().astimezone() 54 | self.account_analyser.run_report.append( 55 | { 56 | 'account_id' : self.account_analyser.account_id, 57 | 'region' : self.region, 58 | 'service' : self.service, 59 | 'result' :'Failure', 60 | 'error_message' : str(error), 61 | 'start_time' : start.strftime("%Y_%m_%d_%H_%M_%S%z"), 62 | 'end_time' : end.strftime("%Y_%m_%d_%H_%M_%S%z"), 63 | 'runtime_in_seconds' : round((end-start).total_seconds(), 2) 64 | } 65 | ) 66 | raise error 67 | 68 | 69 | @abstractmethod 70 | def get_findings(self, region): 71 | pass 72 | 73 | def get_finding_rec_with_common_fields(self): 74 | finding_rec = {} 75 | finding_rec["account_id"] = self.account_analyser.account_id 76 | finding_rec["account_name"] = self.account_analyser.account_name 77 | finding_rec["payer_account_id"] = self.account_analyser.payer_account_id 78 | finding_rec["payer_account_name"] = self.account_analyser.payer_account_name 79 | finding_rec['service'] = self.service 80 | finding_rec['region'] = self.region 81 | 82 | curr_time = datetime.datetime.now().astimezone() 83 | finding_rec['timestamp'] = curr_time.strftime("%Y_%m_%d_%H_%M_%S%z") 84 | 85 | return finding_rec 86 | 87 | def write_findings(self): 88 | self.write_findings_to_file() 89 | #If an event bus is provided publish any issues to event bridge 90 | if (utils.config_info.event_bus_arn): 91 | self.publish_findings_to_event_bridge() 92 | 93 | #This function will be called by the threads to write to the output file. So it must use a lock before opening and writing to the file. 94 | def write_findings_to_file(self): 95 | 96 | #Log findings 97 | for finding_rec in self.findings: 98 | if finding_rec['potential_issue']: 99 | logging.error(finding_rec['message']) 100 | else: 101 | logging.info(finding_rec['message']) 102 | 103 | #Write findings to output file 104 | if len(self.findings) > 0: 105 | keys = self.findings[0].keys() 106 | if self.account_analyser.lock.acquire(): 107 | with open(self.account_analyser.output_file_full_path, 'a', newline='') as output_file: 108 | dict_writer = csv.DictWriter(output_file, self.account_analyser.keys) 109 | if utils.config_info.report_only_issues: #If the "report-only-issues" flag is set, go through each finding and write out only those that are identified as a potential issue 110 | for finding_rec in self.findings: 111 | if finding_rec['potential_issue']: 112 | dict_writer.writerow(finding_rec) 113 | else: #If the "report-only-issues" flag is not set, then Write all findings 114 | dict_writer.writerows(self.findings) 115 | self.account_analyser.lock.release() 116 | 117 | def publish_findings_to_event_bridge(self): 118 | session = self.get_aws_session() 119 | 120 | #Get the event bus region name from the event bus ARN. That region has to be used as cross region API calls are not permitted. 121 | event_bus_region = (utils.parse_arn(utils.config_info.event_bus_arn))['region'] 122 | 123 | events = session.client("events", region_name = event_bus_region) 124 | 125 | entries = [] 126 | 127 | total_entries_count = 0 128 | 129 | for finding_rec in self.findings: 130 | if (not utils.config_info.report_only_issues) or (utils.config_info.report_only_issues and finding_rec['potential_issue']): 131 | entries.append( 132 | { 133 | 'Time': datetime.datetime.now().astimezone(), 134 | 'Source': 'FaultToleranceAnalyser', 135 | 'DetailType': 'FaultToleranceIssue', 136 | 'Detail': json.dumps(finding_rec), 137 | 'EventBusName' : utils.config_info.event_bus_arn 138 | } 139 | ) 140 | total_entries_count = total_entries_count+1 141 | if len(entries) == 10: #Call put-events in batches of 10 each because the API does not accept more than that many events in 1 call. 142 | response = events.put_events(Entries = entries) 143 | events.clear() 144 | 145 | if len(entries) > 0: 146 | response = events.put_events(Entries = entries) 147 | 148 | logging.info(f"Published {total_entries_count} finding(s) for {self.service} in {self.region} to Eventbridge") 149 | -------------------------------------------------------------------------------- /src/service_specific_analysers/cloudhsm_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import boto3 5 | import logging 6 | import utils 7 | from service_analyser import ServiceAnalyser 8 | 9 | class CloudHSMAnalyser(ServiceAnalyser): 10 | 11 | def __init__(self, account_analyser, region): 12 | super().__init__(account_analyser, region, 'efs') 13 | 14 | def get_findings(self): 15 | session = self.get_aws_session() 16 | efs = session.client("cloudhsmv2", region_name=self.region) 17 | 18 | for cluster in utils.invoke_aws_api_full_list(efs.describe_clusters, "Clusters"): 19 | 20 | finding_rec = self.get_finding_rec_from_response(cluster) 21 | 22 | if len(cluster["Hsms"]) == 0: 23 | finding_rec['potential_issue'] = False 24 | finding_rec['message'] = f"CloudHSM: Cloud HSM cluster {cluster['ClusterId']} has only no hsms." 25 | elif len(cluster["Hsms"]) == 1: 26 | finding_rec['potential_issue'] = True 27 | finding_rec['message'] = f"CloudHSM: Cloud HSM cluster {cluster['ClusterId']} has only 1 hsm in a single AZ {cluster['Hsms'][0]['AvailabilityZone']}." 28 | elif len(cluster["Hsms"]) > 1: 29 | azs = set() 30 | for hsm in cluster['Hsms']: 31 | azs.add(hsm['AvailabilityZone']) 32 | if len(azs) == 1: 33 | finding_rec['potential_issue'] = True 34 | finding_rec['message'] = f"CloudHSM: Cloud HSM cluster {cluster['ClusterId']} has {len(cluster['Hsms'])} hsms but they are all in the AZ {azs.pop()}" 35 | else: #len(azs) > 1 36 | finding_rec['potential_issue'] = False 37 | finding_rec['message'] = f"CloudHSM: Cloud HSM cluster {cluster['ClusterId']} has {len(cluster['Hsms'])} hsms and they are spread across multiple AZs: {list(azs)}" 38 | self.findings.append(finding_rec) 39 | 40 | #Contains the logic to extract relevant fields from the API response to the output csv file. 41 | def get_finding_rec_from_response(self, cluster): 42 | 43 | finding_rec = self.get_finding_rec_with_common_fields() 44 | finding_rec['resource_id'] = cluster['ClusterId'] 45 | finding_rec['resource_name'] = cluster['ClusterId'] 46 | finding_rec['resource_arn'] = f"arn:aws:cloudhsm:{self.region}:{self.account_analyser.account_id}:cluster/{cluster['ClusterId']}" 47 | return finding_rec 48 | -------------------------------------------------------------------------------- /src/service_specific_analysers/dax_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import boto3 5 | import logging 6 | import utils 7 | from service_analyser import ServiceAnalyser 8 | 9 | class DAXAnalyser(ServiceAnalyser): 10 | 11 | def __init__(self, account_analyser, region): 12 | super().__init__(account_analyser, region, 'dax') 13 | 14 | def get_findings(self): 15 | session = self.get_aws_session() 16 | dax = session.client("dax", region_name=self.region) 17 | 18 | for cluster in utils.invoke_aws_api_full_list(dax.describe_clusters, "Clusters"): 19 | finding_rec = self.get_finding_rec_from_response(cluster) 20 | azs = {node['AvailabilityZone'] for node in cluster["Nodes"]} 21 | 22 | if len(azs) > 1: 23 | finding_rec['potential_issue'] = False 24 | finding_rec['message'] = f"Nodes in DAX Cluster {cluster['ClusterName']} are spread across more than 1 AZ {azs}" 25 | else: 26 | finding_rec['potential_issue'] = True 27 | finding_rec['message'] = f"All nodes in the DAX cluster {cluster['ClusterName']} are in a single AZ {azs}" 28 | self.findings.append(finding_rec) 29 | 30 | #Contains the logic to extract relevant fields from the API response to the output csv file. 31 | def get_finding_rec_from_response(self, cluster): 32 | 33 | finding_rec = self.get_finding_rec_with_common_fields() 34 | finding_rec['resource_id'] = '' 35 | finding_rec['resource_name'] = cluster['ClusterName'] 36 | finding_rec['resource_arn'] = cluster['ClusterArn'] 37 | return finding_rec 38 | -------------------------------------------------------------------------------- /src/service_specific_analysers/dms_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import boto3 5 | import logging 6 | import utils 7 | from service_analyser import ServiceAnalyser 8 | 9 | class DMSAnalyser(ServiceAnalyser): 10 | 11 | def __init__(self, account_analyser, region): 12 | self.dms_instances = {} 13 | super().__init__(account_analyser, region, 'dms') 14 | 15 | def get_findings(self): 16 | 17 | session = self.get_aws_session() 18 | 19 | dms = session.client("dms", region_name=self.region) 20 | 21 | #Go through the instances, and gather findings. 22 | for repl_inst in utils.invoke_aws_api_full_list(dms.describe_replication_instances, "ReplicationInstances"): 23 | self.dms_instances[repl_inst["ReplicationInstanceArn"]] = { 24 | "MultiAZ":repl_inst["MultiAZ"], 25 | "ReplicationInstanceIdentifier":repl_inst["ReplicationInstanceIdentifier"], 26 | "AZs": [ 27 | repl_inst["AvailabilityZone"], 28 | repl_inst["SecondaryAvailabilityZone"] if "SecondaryAvailabilityZone" in repl_inst else None 29 | ] 30 | } 31 | finding_rec = self.get_finding_rec_from_inst_response(repl_inst) 32 | 33 | if repl_inst["MultiAZ"]: 34 | finding_rec['potential_issue'] = False 35 | finding_rec['message'] = f"DMS Replication Instance: {repl_inst['ReplicationInstanceIdentifier']} with ARN {repl_inst['ReplicationInstanceArn']} in an instance with multiple AZs" 36 | else: 37 | finding_rec['potential_issue'] = True 38 | finding_rec['message'] = f"DMS Replication Instance: {repl_inst['ReplicationInstanceIdentifier']} with ARN {repl_inst['ReplicationInstanceArn']} is on an instance in a single AZ" 39 | self.findings.append(finding_rec) 40 | 41 | #Go through the tasks and gather findings. 42 | for repl_task in utils.invoke_aws_api_full_list(dms.describe_replication_tasks, "ReplicationTasks"): 43 | 44 | finding_rec = self.get_finding_rec_from_task_response(repl_task) 45 | 46 | dms_instance_arn = repl_task["ReplicationInstanceArn"] 47 | if self.dms_instances[dms_instance_arn]["MultiAZ"]: 48 | finding_rec['potential_issue'] = False 49 | finding_rec['message'] = f"DMS Replication Task: {repl_task['ReplicationTaskIdentifier']} with ARN {repl_task['ReplicationTaskArn']} in on the replication instance {self.dms_instances[dms_instance_arn]['ReplicationInstanceIdentifier']} which is configured with multiple AZs: {self.dms_instances[dms_instance_arn]['AZs']}" 50 | else: 51 | finding_rec['potential_issue'] = True 52 | finding_rec['message'] = f"DMS Replication Task: {repl_task['ReplicationTaskIdentifier']} with ARN {repl_task['ReplicationTaskArn']} is on the replication instance {self.dms_instances[dms_instance_arn]['ReplicationInstanceIdentifier']} which is configured only in a single AZ {self.dms_instances[dms_instance_arn]['AZs'][0]}." 53 | 54 | self.findings.append(finding_rec) 55 | 56 | def get_finding_rec_from_inst_response(self, repl_inst): 57 | finding_rec = self.get_finding_rec_with_common_fields() 58 | finding_rec['resource_id'] = repl_inst['ReplicationInstanceIdentifier'] 59 | finding_rec['resource_name'] = '' 60 | finding_rec['resource_arn'] = repl_inst['ReplicationInstanceArn'] 61 | 62 | return finding_rec 63 | 64 | def get_finding_rec_from_task_response(self, repl_task): 65 | finding_rec = self.get_finding_rec_with_common_fields() 66 | finding_rec['resource_id'] = repl_task['ReplicationTaskIdentifier'] 67 | finding_rec['resource_name'] = '' 68 | finding_rec['resource_arn'] = repl_task['ReplicationTaskArn'] 69 | 70 | return finding_rec 71 | -------------------------------------------------------------------------------- /src/service_specific_analysers/docdb_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import boto3 5 | import logging 6 | import utils 7 | from service_analyser import ServiceAnalyser 8 | 9 | class DocDBAnalyser(ServiceAnalyser): 10 | 11 | def __init__(self, account_analyser, region): 12 | super().__init__(account_analyser, region, 'docdb') 13 | 14 | def get_findings(self): 15 | session = self.get_aws_session() 16 | docdb = session.client("docdb", region_name=self.region) 17 | 18 | for db_cluster in utils.invoke_aws_api_full_list(docdb.describe_db_clusters, "DBClusters"): 19 | if db_cluster["Engine"] == "docdb": #Neptune clusters could also be listed. Hence we need to look only for docdb 20 | finding_rec = self.get_finding_rec_from_response(db_cluster) 21 | if db_cluster["MultiAZ"]: 22 | finding_rec['potential_issue'] = False 23 | finding_rec['message'] = f"DocDB Cluster: {db_cluster['DBClusterIdentifier']} is in multiple AZs" 24 | else: 25 | finding_rec['potential_issue'] = True 26 | finding_rec['message'] = f"DocDB Cluster: {db_cluster['DBClusterIdentifier']} is in a single AZ" 27 | self.findings.append(finding_rec) 28 | 29 | #Contains the logic to extract relevant fields from the API response to the output csv file. 30 | def get_finding_rec_from_response(self, db_cluster): 31 | 32 | finding_rec = self.get_finding_rec_with_common_fields() 33 | finding_rec['resource_id'] = db_cluster['DbClusterResourceId'] 34 | finding_rec['resource_name'] = db_cluster['DBClusterIdentifier'] 35 | finding_rec['resource_arn'] = db_cluster['DBClusterArn'] 36 | return finding_rec 37 | -------------------------------------------------------------------------------- /src/service_specific_analysers/dx_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import boto3 5 | import logging 6 | import utils 7 | from service_analyser import ServiceAnalyser 8 | 9 | #Checks the following three. 10 | #1. Direct Connect Connection Redundancy - https://docs.aws.amazon.com/awssupport/latest/user/fault-tolerance-checks.html#aws-direct-connect-connection-redundancy 11 | #2. Direct Connect Location Redundancy - https://docs.aws.amazon.com/awssupport/latest/user/fault-tolerance-checks.html#aws-direct-connect-location-redundancy 12 | #3. Direct Connect Virtual Interface Redundancy - https://docs.aws.amazon.com/awssupport/latest/user/fault-tolerance-checks.html#aws-direct-connect-virtual-interface-redundancy 13 | 14 | class DXAnalyser(ServiceAnalyser): 15 | 16 | def __init__(self, account_analyser, region): 17 | super().__init__(account_analyser, region, 'directconnect') 18 | 19 | def get_findings(self): 20 | self.session = self.get_aws_session() 21 | self.dx = self.session.client("directconnect", region_name=self.region) 22 | self.get_conn_location_findings() 23 | self.get_vif_findings() 24 | 25 | def get_conn_location_findings(self): 26 | no_of_connections = 0 27 | locations = set() 28 | 29 | finding_rec = self.get_dx_output() 30 | 31 | for conn in utils.invoke_aws_api_full_list(self.dx.describe_connections, "connections"): 32 | no_of_connections = no_of_connections + 1 33 | locations.add(conn["location"]) 34 | 35 | if no_of_connections == 0: #No DX connection. Hence no issue 36 | finding_rec['potential_issue'] = False 37 | finding_rec['message'] = f"Direct Connect: No connections in region {self.region}. Hence nothing to check" 38 | elif no_of_connections == 1: 39 | finding_rec['potential_issue'] = True 40 | finding_rec['message'] = f"Direct Connect: There is only one DX connection in region {self.region}" 41 | else: #no_of_connections > 0 #Connection Redundancy is met. 42 | logging.info(f"Direct Connect: More than one DX connection found in region {self.region}. Now on to checking locations") 43 | if len(locations) == 1: #All connections use the same location 44 | finding_rec['potential_issue'] = True 45 | finding_rec['message'] = f"Direct Connect: There is only one location {next(iter(locations))} used by all the DX connections in region {self.region}" 46 | else: #Connection Redundancy and Location Redundancy is also met 47 | finding_rec['potential_issue'] = False 48 | finding_rec['message'] = f"Direct Connect: There are more than 1 DX connetions, using more than one location in region {self.region}" 49 | 50 | self.findings.append(finding_rec) 51 | 52 | #check VIF redundancy - https://docs.aws.amazon.com/awssupport/latest/user/fault-tolerance-checks.html#aws-direct-connect-virtual-interface-redundancy 53 | def get_vif_findings(self): 54 | 55 | vifs = {} 56 | vgws = {} 57 | 58 | #collect all the VIFs 59 | for vif in utils.invoke_aws_api_full_list(self.dx.describe_virtual_interfaces, "virtualInterfaces"): 60 | vifs[vif['virtualInterfaceId']] = {'virtualGatewayId': vif ['virtualGatewayId'], 'connectionId' : vif['connectionId']} 61 | if vif ['virtualGatewayId'] in vgws: 62 | vgws[vif['virtualGatewayId']]['vifs'].append(vif['virtualInterfaceId']) 63 | vgws[vif['virtualGatewayId']]['connections'].append(vif['connectionId']) 64 | else: 65 | vgws[vif['virtualGatewayId']]= {'vifs' : [vif['virtualInterfaceId']], 'connections' : [vif['connectionId']]} 66 | 67 | for vgw_id in vgws: 68 | finding_rec = self.get_vgw_output(vgw_id) 69 | if len(vgws[vgw_id]['vifs']) < 2: 70 | finding_rec['potential_issue'] = True 71 | finding_rec['message'] = f"Direct Connect: There is only one VIF {vgws[vgw_id]['vifs']} for the virtual gateway {vgw_id}." 72 | elif len(vgws[vgw_id]['connections']) < 2: 73 | finding_rec['potential_issue'] = True 74 | finding_rec['message'] = f"Direct Connect: Though there are more than 1 VIFs for the virtual gateway {vgw_id}, all the VIFs are on the same DX Connection {vgws[vgw_id]['connections']}." 75 | else: 76 | finding_rec['potential_issue'] = False 77 | finding_rec['message'] = f"Direct Connect: There are more than 1 VIFs for the virtual gateway {vgw_id}, and the VIFs are on more than one DX connection." 78 | 79 | #Contains the logic to extract relevant fields from the API response to the output csv file. 80 | def get_dx_output(self): 81 | 82 | finding_rec = self.get_finding_rec_with_common_fields() 83 | finding_rec['resource_id'] = 'N/A' 84 | finding_rec['resource_name'] = 'N/A' 85 | finding_rec['resource_arn'] = 'N/A' 86 | return finding_rec 87 | 88 | def get_vgw_output(self, vgw_id): 89 | 90 | finding_rec = self.get_finding_rec_with_common_fields() 91 | finding_rec['resource_id'] = vgw_id 92 | finding_rec['resource_name'] = 'N/A' 93 | finding_rec['resource_arn'] = 'N/A' 94 | return finding_rec 95 | -------------------------------------------------------------------------------- /src/service_specific_analysers/efs_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import boto3 5 | import logging 6 | import utils 7 | from service_analyser import ServiceAnalyser 8 | 9 | class EFSAnalyser(ServiceAnalyser): 10 | 11 | def __init__(self, account_analyser, region): 12 | super().__init__(account_analyser, region, 'efs') 13 | 14 | def get_findings(self): 15 | session = self.get_aws_session() 16 | efs = session.client("efs", region_name=self.region) 17 | 18 | for fs in utils.invoke_aws_api_full_list(efs.describe_file_systems, "FileSystems"): 19 | finding_rec = self.get_finding_rec_from_response(fs) 20 | if "AvailabilityZoneId" in fs: #Single AZ File system 21 | finding_rec['potential_issue'] = True 22 | finding_rec['message'] = f"EFS: File system {fs['FileSystemId']} with ARN {fs['FileSystemArn'] } is a single AZ file system." 23 | elif fs["NumberOfMountTargets"] <= 1: #Multi AZ file system but mount target only in a single AZ 24 | finding_rec['potential_issue'] = True 25 | finding_rec['message'] = f"EFS: File system {fs['FileSystemId']} with ARN {fs['FileSystemArn'] } is a multi AZ enabled file system but with only one mount target." 26 | else: 27 | finding_rec['potential_issue'] = False 28 | finding_rec['message'] = f"EFS: File system {fs['FileSystemId']} with ARN {fs['FileSystemArn'] } is a multi AZ enabled file system with more than one mount target" 29 | self.findings.append(finding_rec) 30 | 31 | #Contains the logic to extract relevant fields from the API response to the output csv file. 32 | def get_finding_rec_from_response(self, fs): 33 | finding_rec = self.get_finding_rec_with_common_fields() 34 | finding_rec['resource_id'] = fs['FileSystemId'] 35 | finding_rec['resource_name'] = '' 36 | for tag in fs['Tags']: 37 | if tag['Key'] == 'Name': 38 | finding_rec['resource_name'] = tag['Value'] 39 | finding_rec['resource_arn'] = fs['FileSystemArn'] 40 | 41 | return finding_rec 42 | 43 | -------------------------------------------------------------------------------- /src/service_specific_analysers/elasticache_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import boto3 5 | import logging 6 | import utils 7 | from service_analyser import ServiceAnalyser 8 | 9 | class ElasticacheAnalyser(ServiceAnalyser): 10 | 11 | def __init__(self, account_analyser, region): 12 | super().__init__(account_analyser, region, 'elasticache') 13 | 14 | def get_findings(self): 15 | self.session = self.get_aws_session() 16 | self.elasticache = self.session.client("elasticache", region_name=self.region) 17 | self.get_memcache_single_node_redis_findings() 18 | self.get_redis_replication_group_findings() 19 | 20 | def get_memcache_single_node_redis_findings(self): 21 | 22 | #Get memcached and single node Redis clusters 23 | for cluster in utils.invoke_aws_api_full_list(self.elasticache.describe_cache_clusters, "CacheClusters", ShowCacheClustersNotInReplicationGroups = True): 24 | finding_rec = self.get_output_from_memcache_single_node_redis_response(cluster) 25 | finding_rec['potential_issue'] = True 26 | if cluster['Engine'] == 'redis': #Single node redis cluster 27 | finding_rec['message'] = f"Elasticache-Redis cluster: {cluster['CacheClusterId']} is a single Node Elasticache-Redis cluster" 28 | else: #Memcached cluster 29 | finding_rec['message'] = f"Elasticache-Memcached cluster: {cluster['CacheClusterId']} is a single AZ issue even if there are multiple nodes in multiple AZs as the data is not replicated between nodes." 30 | self.findings.append(finding_rec) 31 | 32 | def get_output_from_memcache_single_node_redis_response(self, cluster): 33 | 34 | finding_rec = self.get_finding_rec_with_common_fields() 35 | finding_rec['resource_id'] = cluster['CacheClusterId'] 36 | finding_rec['resource_name'] = cluster['CacheClusterId'] 37 | finding_rec['resource_arn'] = cluster['ARN'] 38 | finding_rec['engine'] = cluster['Engine'] 39 | return finding_rec 40 | 41 | def get_redis_replication_group_findings(self): 42 | #Get Redis replication group clusters 43 | 44 | for repl_group in utils.invoke_aws_api_full_list(self.elasticache.describe_replication_groups, "ReplicationGroups"): 45 | finding_rec = self.get_output_from_redis_replication_group_response(repl_group) 46 | if len(repl_group["NodeGroups"]) == 0 : #Cluster Mode disabled. And no node groups or shards. So the data is not replicated across nodes and so this is not single AZ failure resilient 47 | finding_rec['potential_issue'] = True 48 | finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode disabled and no node groups configured" 49 | elif len(repl_group["NodeGroups"]) == 1 : #Cluster Mode disabled. One node group/shard 50 | if repl_group["AutomaticFailover"] == "disabled": 51 | finding_rec['potential_issue'] = True 52 | finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode disabled, 1 Node group configured but Auto Failover is disabled" 53 | elif repl_group["MultiAZ"] == "disabled": #Auto failover enabled, but multi AZ disabled 54 | node_group = repl_group["NodeGroups"][0] 55 | azs = set() 56 | for node in node_group["NodeGroupMembers"]: 57 | azs.add(node["PreferredAvailabilityZone"]) 58 | if len(azs) == 1: #All nodes belong to the same AZ 59 | finding_rec['potential_issue'] = True 60 | finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode disabled and Auto Failover is enabled, but all nodes are in the same AZ {azs}" 61 | else: 62 | finding_rec['potential_issue'] = False 63 | finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode disabled, and Auto Failover is enabled. but the nodes are not in multiple AZs {azs}" 64 | else: # Auto failover enabled and multi AZ enabled. So this is ok. 65 | finding_rec['potential_issue'] = False 66 | finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode disabled, but Auto Failover and Multi AZ enabled" 67 | # At this point len(repl_group["NodeGroups"]) > 1 which implies cluster mode is enabled. 68 | # This means that Automatic failover is enabled by force. 69 | # The customer does not have an option to disable it. So that need not be checked. 70 | # Just make sure all nodes of a given shard are not in the same AZ and that each shard has a replication node. 71 | elif repl_group["MultiAZ"] == "disabled": 72 | #Check to see if any replicas are missing in any node groups, or if any node groups have all the nodes in the same AZ. 73 | node_groups = repl_group["NodeGroups"] 74 | issue_found = False 75 | for node_group in node_groups: 76 | if len(node_group["NodeGroupMembers"]) == 1: 77 | finding_rec['potential_issue'] = True 78 | finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode enabled, but no replicas in shard {node_group['NodeGroupId']}" 79 | issue_found = True 80 | break 81 | else: 82 | azs = set() 83 | for node in node_group["NodeGroupMembers"]: 84 | azs.add(node["PreferredAvailabilityZone"]) 85 | if len(azs) == 1: #All nodes belong to the same AZ 86 | finding_rec['potential_issue'] = True 87 | finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode enabled, but all nodes in shard {node_group['NodeGroupId']} are in the same AZ {azs}" 88 | issue_found = True 89 | break 90 | if not issue_found: #All Node groups have been ok 91 | finding_rec['potential_issue'] = False 92 | finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode enabled, all nodegroups have replicas and none of those node groups have all the nodes in the same AZ." 93 | else: 94 | finding_rec['potential_issue'] = False 95 | finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode enabled, and Multi AZ is enabled." 96 | self.findings.append(finding_rec) 97 | 98 | def get_output_from_redis_replication_group_response(self, repl_group): 99 | 100 | finding_rec = self.get_finding_rec_with_common_fields() 101 | finding_rec['resource_id'] = repl_group['ReplicationGroupId'] 102 | finding_rec['resource_name'] = repl_group['ReplicationGroupId'] 103 | finding_rec['resource_arn'] = repl_group['ARN'] 104 | finding_rec['engine'] = 'Redis' #This is the only possibility for replicationg groups. 105 | return finding_rec 106 | -------------------------------------------------------------------------------- /src/service_specific_analysers/fsx_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import boto3 5 | import logging 6 | import utils 7 | from service_analyser import ServiceAnalyser 8 | 9 | class FSXAnalyser(ServiceAnalyser): 10 | 11 | def __init__(self, account_analyser, region): 12 | super().__init__(account_analyser, region, 'fsx') 13 | 14 | def get_findings(self): 15 | 16 | session = self.get_aws_session() 17 | fsx = session.client("fsx", region_name=self.region) 18 | 19 | for fs in utils.invoke_aws_api_full_list(fsx.describe_file_systems, "FileSystems"): 20 | if fs['FileSystemType'] == "WINDOWS": #We look only at Windows File systems 21 | finding_rec = self.get_finding_rec_from_response(fs) 22 | if len(fs["SubnetIds"]) == 1: 23 | finding_rec['potential_issue'] = True 24 | finding_rec['message'] = f"FSX: Windows File system {fs['FileSystemId']} with ARN {fs['ResourceARN'] } is a single AZ file system. Please check." 25 | else: 26 | finding_rec['potential_issue'] = False 27 | finding_rec['message'] = f"FSX: Windows File system {fs['FileSystemId']} with ARN {fs['ResourceARN'] } is a multi AZ file system" 28 | self.findings.append(finding_rec) 29 | 30 | #Contains the logic to extract relevant fields from the API response to the output csv file. 31 | def get_finding_rec_from_response(self, fs): 32 | finding_rec = self.get_finding_rec_with_common_fields() 33 | finding_rec['resource_id'] = fs['FileSystemId'] 34 | finding_rec['resource_name'] = '' 35 | for tag in fs['Tags']: 36 | if tag['Key'] == 'Name': 37 | finding_rec['resource_name'] = tag['Value'] 38 | finding_rec['resource_arn'] = fs['ResourceARN'] 39 | 40 | return finding_rec 41 | -------------------------------------------------------------------------------- /src/service_specific_analysers/globalaccelerator_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import boto3 5 | import logging 6 | import utils 7 | from service_analyser import ServiceAnalyser 8 | 9 | class GlobalAcceleratorAnalyser(ServiceAnalyser): 10 | 11 | def __init__(self, account_analyser, region): 12 | super().__init__(account_analyser, region, 'globalaccelerator') 13 | 14 | def get_findings(self): 15 | 16 | if self.region == "us-west-2": 17 | self.session = self.get_aws_session() 18 | self.aga = self.session.client("globalaccelerator", region_name=self.region) 19 | self.get_standard_accelerator_findings() 20 | else: 21 | logging.info(f"The service Global Accelerator operates only in us-west-2. Hence doing nothing for {self.region}") 22 | return #Nothing to do since Global Accelerator operates only in us-west-2 23 | 24 | def get_standard_accelerator_findings(self): 25 | for accelerator in utils.invoke_aws_api_full_list(self.aga.list_accelerators, "Accelerators", ): 26 | self.validate_standard_accelerator(accelerator) 27 | 28 | def validate_standard_accelerator(self, accelerator): 29 | finding_rec = self.get_finding_rec_from_response(accelerator) 30 | 31 | ec2_instance_ids = [] 32 | target_regions = set() 33 | for listener in utils.invoke_aws_api_full_list(self.aga.list_listeners, 34 | "Listeners", 35 | AcceleratorArn = accelerator["AcceleratorArn"]): 36 | for endpoint_group in utils.invoke_aws_api_full_list(self.aga.list_endpoint_groups, 37 | "EndpointGroups", 38 | ListenerArn = listener["ListenerArn"]): 39 | target_regions.add(endpoint_group["EndpointGroupRegion"]) 40 | if len(target_regions) > 1: 41 | #If multiple regions are available then they are Multi-AZ. No need to proceed further 42 | finding_rec['potential_issue'] = False 43 | finding_rec['message'] = f"Global Accelerator: {accelerator['Name']} has target endpoints are in multiple regions" 44 | fself.findings.append(finding_rec) 45 | for endpoint in endpoint_group["EndpointDescriptions"]: 46 | if not endpoint["EndpointId"].startswith("i-"): #Not EC2 instance 47 | logging.info(f"Global Accelerator {accelerator['Name']} has endpoints that are not EC2 instances. Hence ignored.") 48 | return 49 | else: 50 | ec2_instance_ids.append(endpoint["EndpointId"]) 51 | 52 | #We have now collected all EC2 instances from all listeners and endpoint groups. Check the Availability zone of these EC2 instances now. 53 | #So get all AZs to which these EC2 instances belong 54 | azs = self.get_azs_of_ec2_instances(ec2_instance_ids, next(iter(target_regions))) #We can use next(iter(target_regions) as we are sure there will be only one region. If there are more than one, we would not have come this far. 55 | 56 | if (len(azs) > 1): 57 | finding_rec['potential_issue'] = False 58 | finding_rec['message'] = f"Global Accelerator: All target endpoints for the acceleator {accelerator['Name']} are EC2 instances and they are spread across more than one AZ {azs}" 59 | else: 60 | finding_rec['potential_issue'] = True 61 | finding_rec['message'] = f"Global Accelerator: All target endpoints for the acceleator {accelerator['Name']} are EC2 instances and they are all in a single AZ {azs}" 62 | 63 | self.findings.append(finding_rec) 64 | 65 | def get_azs_of_ec2_instances(self, ec2_instance_ids, region): 66 | #First break up the EC2 instances in batches 67 | ec2_instance_id_batches = [] #List of batches 68 | batch_size = 10 69 | ec2_instance_counter = 0 70 | 71 | #Get the list of domain names and batch them in batch_size 72 | for ec2_instance_id in ec2_instance_ids: 73 | if ((ec2_instance_counter % batch_size) == 0): 74 | ec2_instance_id_batches.append([]) 75 | ec2_instance_counter = ec2_instance_counter + 1 76 | ec2_instance_id_batches[len(ec2_instance_id_batches)-1].append(ec2_instance_id) 77 | 78 | azs = set() 79 | ec2 = self.session.client("ec2", region_name = region) 80 | #For each batch, invoke ec2 describe-instances and get the availability zones 81 | for ec2_instance_id_batch in ec2_instance_id_batches: 82 | resp = ec2.describe_instances(InstanceIds = ec2_instance_id_batch) 83 | #print(resp) 84 | 85 | for ec2_instance in utils.invoke_aws_api_full_list(ec2.describe_instances, 86 | "Reservations", 87 | InstanceIds = ec2_instance_id_batch): 88 | azs.add(ec2_instance["Instances"][0]["Placement"]["AvailabilityZone"]) 89 | 90 | return(azs) 91 | 92 | #Contains the logic to extract relevant fields from the API response to the output csv file. 93 | def get_finding_rec_from_response(self, accelerator): 94 | 95 | finding_rec = self.get_finding_rec_with_common_fields() 96 | finding_rec['resource_id'] = accelerator['DnsName'] 97 | finding_rec['resource_name'] = accelerator['Name'] 98 | finding_rec['resource_arn'] = accelerator['AcceleratorArn'] 99 | return finding_rec 100 | -------------------------------------------------------------------------------- /src/service_specific_analysers/lambda_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import boto3 5 | import logging 6 | import utils 7 | from service_analyser import ServiceAnalyser 8 | 9 | class LambdaAnalyser(ServiceAnalyser): 10 | 11 | def __init__(self, account_analyser, region): 12 | super().__init__(account_analyser, region, 'lambda') 13 | 14 | def get_findings(self): 15 | session = self.get_aws_session() 16 | aws_lambda = session.client("lambda", region_name=self.region) 17 | 18 | for lambda_func in utils.invoke_aws_api_full_list(aws_lambda.list_functions, "Functions"): 19 | 20 | if "VpcConfig" not in lambda_func.keys(): #Ignore if there is no VpcConfig in the function 21 | continue 22 | 23 | if lambda_func["VpcConfig"]["VpcId"]: #If it is populated only then is it VPC Enabld. If not, this check can be ignored. 24 | finding_rec = self.get_finding_rec_from_response(lambda_func) 25 | if len(lambda_func["VpcConfig"]["SubnetIds"]) == 1: 26 | finding_rec['potential_issue'] = True 27 | finding_rec['message'] = f"Lambda: VPC Enabled Lambda function {lambda_func['FunctionName']} is configured to run in only one subnet." 28 | else: 29 | finding_rec['potential_issue'] = False 30 | finding_rec['message'] = f"Lambda: VPC Enabled Lambda Function {lambda_func['FunctionName']} is configured to run in more than one subnet" 31 | self.findings.append(finding_rec) 32 | 33 | #Contains the logic to extract relevant fields from the API response to the output csv file. 34 | def get_finding_rec_from_response(self, lambda_func): 35 | 36 | finding_rec = self.get_finding_rec_with_common_fields() 37 | finding_rec['resource_id'] = '' 38 | finding_rec['resource_name'] = lambda_func['FunctionName'] 39 | finding_rec['resource_arn'] = lambda_func['FunctionArn'] 40 | return finding_rec 41 | -------------------------------------------------------------------------------- /src/service_specific_analysers/memorydb_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import boto3 5 | import logging 6 | import utils 7 | from service_analyser import ServiceAnalyser 8 | 9 | class MemoryDBAnalyser(ServiceAnalyser): 10 | 11 | def __init__(self, account_analyser, region): 12 | super().__init__(account_analyser, region, 'memorydb') 13 | 14 | def get_findings(self): 15 | self.session = self.get_aws_session() 16 | self.memorydb = self.session.client("memorydb", region_name=self.region) 17 | self.get_memorydb_findings() 18 | 19 | def get_memorydb_findings(self): 20 | 21 | for cluster in utils.invoke_aws_api_full_list(self.memorydb.describe_clusters, "Clusters", ShowShardDetails = True): 22 | finding_rec = self.get_finding_rec_from_response(cluster) 23 | issue_found = False 24 | for shard in cluster["Shards"]: 25 | if len(shard["Nodes"]) == 1: 26 | finding_rec['potential_issue'] = True 27 | finding_rec['message'] = f"Memory DB Cluster: Shard {shard['Name']} in cluster {cluster['Name']} does not have any replicas" 28 | issue_found = True 29 | break 30 | 31 | if not issue_found: 32 | finding_rec['potential_issue'] = False 33 | finding_rec['message'] = f"Memory DB Cluster: All shards in cluster {cluster['Name']} have replicas" 34 | self.findings.append(finding_rec) 35 | 36 | def get_finding_rec_from_response(self, cluster): 37 | 38 | finding_rec = self.get_finding_rec_with_common_fields() 39 | finding_rec['resource_id'] = '' 40 | finding_rec['resource_name'] = cluster['Name'] 41 | finding_rec['resource_arn'] = cluster['ARN'] 42 | return finding_rec 43 | -------------------------------------------------------------------------------- /src/service_specific_analysers/opensearch_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import boto3 5 | import logging 6 | import utils 7 | import utils 8 | from service_analyser import ServiceAnalyser 9 | 10 | class OpensearchAnalyser(ServiceAnalyser): 11 | 12 | def __init__(self, account_analyser, region): 13 | super().__init__(account_analyser, region, 'opensearch') 14 | 15 | def get_findings(self): 16 | 17 | session = self.get_aws_session() 18 | opensearch = session.client("opensearch", region_name=self.region) 19 | domain_name_batches = [] #List of batches 20 | batch_size = 5 21 | batch_counter = 0 22 | domain_counter = 0 23 | 24 | #Get the list of domain names and batch them in batch_size 25 | for domain_name in utils.invoke_aws_api_full_list(opensearch.list_domain_names, "DomainNames"): 26 | if ((domain_counter % batch_size) == 0): 27 | domain_name_batches.append([]) 28 | domain_counter = domain_counter + 1 29 | domain_name_batches[len(domain_name_batches)-1].append(domain_name['DomainName']) 30 | 31 | #Validate the domain names in batches as the validate_opensearch_domains API can get information about multiple domains in one API call. 32 | for domain_name_batch in domain_name_batches: 33 | self.validate_opensearch_domains(opensearch, domain_name_batch) 34 | 35 | def validate_opensearch_domains(self, opensearch, domain_names): 36 | 37 | for domain in utils.invoke_aws_api_full_list(opensearch.describe_domains, "DomainStatusList", DomainNames = domain_names): 38 | finding_rec = self.get_finding_rec_from_response(domain) 39 | if len(domain["VPCOptions"]["AvailabilityZones"]) > 1: 40 | finding_rec['potential_issue'] = False 41 | finding_rec['message'] = f"Opensearch domain: Domain {domain['DomainName']} with ARN {domain['ARN'] } is multi AZ enabled." 42 | else: 43 | finding_rec['potential_issue'] = True 44 | finding_rec['message'] = f"Opensearch domain: Domain {domain['DomainName']} with ARN {domain['ARN'] } is only in a single AZ." 45 | self.findings.append(finding_rec) 46 | 47 | #Contains the logic to extract relevant fields from the API response to the output csv file. 48 | def get_finding_rec_from_response(self, domain): 49 | finding_rec = self.get_finding_rec_with_common_fields() 50 | finding_rec['service'] = 'opensearch' 51 | finding_rec['region'] = self.region 52 | finding_rec['resource_id'] = domain['DomainId'] 53 | finding_rec['resource_name'] = domain['DomainName'] 54 | finding_rec['resource_arn'] = domain['ARN'] 55 | return finding_rec 56 | -------------------------------------------------------------------------------- /src/service_specific_analysers/rds_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import boto3 5 | import logging 6 | import utils 7 | from service_analyser import ServiceAnalyser 8 | 9 | class RDSAnalyser(ServiceAnalyser): 10 | 11 | def __init__(self, account_analyser, region): 12 | super().__init__(account_analyser, region, 'rds') 13 | 14 | def get_findings(self): 15 | self.session = self.get_aws_session() 16 | self.rds = self.session.client("rds", region_name=self.region) 17 | self.get_db_instance_findings() 18 | self.get_db_cluster_findings() 19 | 20 | def get_db_instance_findings(self): 21 | for db_instance in utils.invoke_aws_api_full_list(self.rds.describe_db_instances, "DBInstances"): 22 | if db_instance["Engine"] == "docdb": #Ignore any Document DB instances as they are covered separately. 23 | continue 24 | 25 | if "DBClusterIdentifier" in db_instance: #This DB instance is part of a cluster. So it will be handled as part of cluster analyser 26 | continue 27 | 28 | finding_rec = self.get_finding_rec_from_response_instance(db_instance) 29 | 30 | if db_instance["MultiAZ"]: 31 | finding_rec['potential_issue'] = False 32 | finding_rec['message'] = f"RDS Instance: {db_instance['DBInstanceIdentifier']} has MultiAZ enabled" 33 | else: 34 | finding_rec['potential_issue'] = True 35 | finding_rec['message'] = f"RDS Instance: {db_instance['DBInstanceIdentifier']} has MultiAZ disabled" 36 | self.findings.append(finding_rec) 37 | 38 | def get_db_cluster_findings(self): 39 | for db_cluster in utils.invoke_aws_api_full_list(self.rds.describe_db_clusters, "DBClusters"): 40 | if db_cluster["Engine"] in ["docdb","neptune"]: #Ignore any Document DB, Neptune clusters. 41 | continue 42 | 43 | finding_rec = self.get_finding_rec_from_response_cluster(db_cluster) 44 | 45 | if db_cluster["MultiAZ"]: 46 | finding_rec['potential_issue'] = False 47 | finding_rec['message'] = f"RDS Cluster: {db_cluster['DBClusterIdentifier']} has MultiAZ enabled" 48 | else: 49 | finding_rec['potential_issue'] = True 50 | finding_rec['message'] = f"RDS Cluster {db_cluster['DBClusterIdentifier']} has MultiAZ disabled" 51 | self.findings.append(finding_rec) 52 | 53 | #Contains the logic to extract relevant fields from the API response to the output csv file. 54 | def get_finding_rec_from_response_instance(self, db_instance): 55 | 56 | finding_rec = self.get_finding_rec_with_common_fields() 57 | 58 | finding_rec['resource_id'] = '' 59 | finding_rec['resource_name'] = db_instance['DBInstanceIdentifier'] 60 | finding_rec['resource_arn'] = db_instance['DBInstanceArn'] 61 | finding_rec['engine'] = db_instance["Engine"] 62 | return finding_rec 63 | 64 | def get_finding_rec_from_response_cluster(self, db_cluster): 65 | 66 | finding_rec = self.get_finding_rec_with_common_fields() 67 | 68 | finding_rec['resource_id'] = '' 69 | finding_rec['resource_name'] = db_cluster['DBClusterIdentifier'] 70 | finding_rec['resource_arn'] = db_cluster['DBClusterArn'] 71 | finding_rec['engine'] = db_cluster["Engine"] 72 | return finding_rec 73 | -------------------------------------------------------------------------------- /src/service_specific_analysers/redshift_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import utils 5 | from service_analyser import ServiceAnalyser 6 | 7 | class RedshiftAnalyser(ServiceAnalyser): 8 | 9 | def __init__(self, account_analyser, region): 10 | super().__init__(account_analyser, region, 'redshift') 11 | 12 | def get_findings(self): 13 | session = self.get_aws_session() 14 | redshift = session.client("redshift", region_name=self.region) 15 | 16 | for cluster in utils.invoke_aws_api_full_list(redshift.describe_clusters, "Clusters"): 17 | finding_rec = self.get_finding_rec_from_response(cluster) 18 | if cluster["MultiAZ"] == "Enabled": 19 | finding_rec['potential_issue'] = False 20 | finding_rec['message'] = f"Redshift Cluster: {cluster['ClusterIdentifier']} is in multiple AZs" 21 | else: 22 | finding_rec['potential_issue'] = True 23 | finding_rec['message'] = f"Redshift Cluster: {cluster['ClusterIdentifier']} is in a single AZ" 24 | self.findings.append(finding_rec) 25 | 26 | #Contains the logic to extract relevant fields from the API response to the output csv file. 27 | def get_finding_rec_from_response(self, cluster): 28 | 29 | finding_rec = self.get_finding_rec_with_common_fields() 30 | finding_rec['resource_id'] = cluster['ClusterIdentifier'] 31 | finding_rec['resource_name'] = cluster['ClusterIdentifier'] 32 | finding_rec['resource_arn'] = f"arn:aws:redshift:{self.region}:{self.account_analyser.account_id}:cluster-name/{cluster['ClusterIdentifier']}" 33 | return finding_rec 34 | -------------------------------------------------------------------------------- /src/service_specific_analysers/sgw_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import boto3 5 | import logging 6 | import utils 7 | from service_analyser import ServiceAnalyser 8 | 9 | class SGWAnalyser(ServiceAnalyser): 10 | 11 | def __init__(self, account_analyser, region): 12 | super().__init__(account_analyser, region, 'sgw') 13 | 14 | def get_findings(self): 15 | 16 | session = self.get_aws_session() 17 | sgw = session.client("storagegateway", region_name=self.region) 18 | 19 | for gateway in utils.invoke_aws_api_full_list(sgw.list_gateways, "Gateways"): 20 | finding_rec = self.get_finding_rec_from_response(gateway) 21 | if (("Ec2InstanceRegion" in gateway.keys()) and (len(gateway["Ec2InstanceRegion"]))): 22 | finding_rec['potential_issue'] = True 23 | finding_rec['message'] = f"Storge Gateway: Gateway {gateway['GatewayName']} with ARN {gateway['GatewayARN']} in hosted on AWS. Please ensure this gateway is not used for critical workloads" 24 | else: 25 | finding_rec['potential_issue'] = False 26 | finding_rec['message'] = f"Storge Gateway: Gateway {gateway['GatewayName']} is not hosted on AWS" 27 | 28 | self.findings.append(finding_rec) 29 | 30 | #Contains the logic to extract relevant fields from the API response to the output csv file. 31 | def get_finding_rec_from_response(self, gateway): 32 | 33 | finding_rec = self.get_finding_rec_with_common_fields() 34 | 35 | finding_rec['resource_id'] = gateway['GatewayId'] 36 | finding_rec['resource_name'] = gateway['GatewayName'] 37 | finding_rec['resource_arn'] = gateway['GatewayARN'] 38 | 39 | return finding_rec 40 | -------------------------------------------------------------------------------- /src/service_specific_analysers/vpce_analyser.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import boto3 5 | import logging 6 | import utils 7 | from service_analyser import ServiceAnalyser 8 | 9 | class VPCEAnalyser(ServiceAnalyser): 10 | 11 | def __init__(self, account_analyser, region): 12 | super().__init__(account_analyser, region, 'vpce') 13 | 14 | def get_findings(self): 15 | session = self.get_aws_session() 16 | ec2 = session.client("ec2", region_name=self.region) 17 | 18 | for vpce in utils.invoke_aws_api_full_list(ec2.describe_vpc_endpoints, "VpcEndpoints", Filters = [ {'Name':'vpc-endpoint-type', 'Values' : ['Interface']} ]): 19 | subnet_ids = vpce["SubnetIds"] 20 | 21 | finding_rec = self.get_finding_rec_from_response(vpce) 22 | 23 | if len(subnet_ids) > 1: 24 | finding_rec['potential_issue'] = False 25 | finding_rec['message'] = f"VPCE: {vpce['VpcEndpointId']} has multiple subnets: {subnet_ids}" 26 | else: 27 | finding_rec['potential_issue'] = True 28 | finding_rec['message'] = f"VPCE: {vpce['VpcEndpointId']} has a single subnet: {subnet_ids}" 29 | 30 | self.findings.append(finding_rec) 31 | 32 | def get_finding_rec_from_response(self, vpce): 33 | 34 | finding_rec = self.get_finding_rec_with_common_fields() 35 | 36 | finding_rec['resource_id'] = vpce['VpcEndpointId'] 37 | finding_rec['resource_name'] = '' 38 | for tag in vpce['Tags']: 39 | if tag['Key'] == 'Name': 40 | finding_rec['resource_name'] = tag['Value'] 41 | finding_rec['resource_arn'] = vpce['ServiceName'] 42 | return finding_rec 43 | -------------------------------------------------------------------------------- /src/utils.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import argparse 5 | import logging 6 | import re 7 | import time 8 | import threading 9 | import boto3 10 | import botocore 11 | from datetime import datetime, date 12 | from dataclasses import dataclass 13 | 14 | 15 | @dataclass 16 | class ConfigInfo: 17 | regions: list 18 | services: list 19 | max_concurrent_threads: int 20 | output_folder_name: str 21 | event_bus_arn: str 22 | log_level: str 23 | aws_profile_name: str 24 | aws_assume_role_name: str 25 | single_threaded: bool 26 | run_report_file_name: str 27 | bucket_name: str 28 | account_id: str 29 | truncate_output: bool 30 | filename_with_accountid: bool 31 | report_only_issues: bool 32 | 33 | all_services = ['vpce', 34 | 'dms', 35 | 'docdb', 36 | 'sgw', 37 | 'efs', 38 | 'opensearch', 39 | 'fsx', 40 | 'lambda', 41 | 'elasticache', 42 | 'dax', 43 | 'globalaccelerator', 44 | 'rds', 45 | 'memorydb', 46 | 'dx', 47 | 'cloudhsm', 48 | 'redshift'] 49 | 50 | #Use the below function,if needed, as print(json.dumps(db_instance, default = json_serialise, indent = 4)) 51 | def json_serialise(obj): 52 | if isinstance(obj, datetime): 53 | return obj.strftime("%Y-%m-%d, %H:%M:%S %Z") 54 | elif isinstance(obj, date): 55 | return obj.strftime("%Y-%m-%d %Z") 56 | else: 57 | raise TypeError (f"Type {type(obj)} not serializable") 58 | 59 | #Reference: https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_iam-quotas.html 60 | def regex_validator_generator(regex, desc_param_name, custom_message = ""): 61 | pattern = re.compile(regex) 62 | def regex_validator(arg_value): 63 | if arg_value is None: 64 | return arg_value 65 | elif not pattern.match(arg_value): 66 | raise argparse.ArgumentTypeError(f"Invalid {desc_param_name}. {custom_message}") 67 | return arg_value 68 | return regex_validator 69 | 70 | def maxlen_validator_generator(max_len, desc_param_name): 71 | def maxlen_validator(arg_value): 72 | if arg_value is None: 73 | return arg_value 74 | elif len(arg_value) > max_len: 75 | raise argparse.ArgumentTypeError(f"{desc_param_name} too long. It should not exceed {max_len} characters.") 76 | return arg_value 77 | return maxlen_validator 78 | 79 | def log_func(func): 80 | def inner(*args, **kwargs): 81 | logging.debug(f"In thread {threading.current_thread().name}: Starting {func.__name__} with args: {args} and key word args: {kwargs}") 82 | start = time.time() 83 | result = func(*args, **kwargs) 84 | end = time.time() 85 | logging.info(f"Completed {func.__name__} in thread {threading.current_thread().name} with args: {args} and key word args: {kwargs} in {end-start} seconds.") 86 | return result 87 | return inner 88 | 89 | def get_aws_session(session_name = None): 90 | session = boto3.session.Session(profile_name = config_info.aws_profile_name) 91 | 92 | if config_info.aws_assume_role_name: #Need to assume a role before creating an org client 93 | 94 | logging.info(f"aws-assume-role option is used. About to assume the role {config_info.aws_assume_role_name}") 95 | 96 | sts_client = session.client('sts') 97 | account_id = sts_client.get_caller_identity()["Account"] 98 | 99 | if not session_name: 100 | session_name = "AssumeRoleForFaultToleranceAnalyser" 101 | 102 | assumed_role_object=sts_client.assume_role( 103 | RoleArn=f"arn:aws:iam::{account_id}:role/{config_info.aws_assume_role_name}", 104 | RoleSessionName=session_name 105 | ) 106 | credentials=assumed_role_object['Credentials'] 107 | 108 | assumed_role_session = boto3.session.Session( 109 | aws_access_key_id=credentials['AccessKeyId'], 110 | aws_secret_access_key=credentials['SecretAccessKey'], 111 | aws_session_token=credentials['SessionToken'], 112 | ) 113 | logging.info(f"Assumed the role {config_info.aws_assume_role_name} with session name {session_name}") 114 | return assumed_role_session 115 | else: 116 | return session 117 | 118 | def check_aws_credentials(): 119 | try: 120 | session = boto3.session.Session(profile_name = config_info.aws_profile_name) 121 | sts = session.client("sts") 122 | resp = sts.get_caller_identity() 123 | account_id = resp["Account"] 124 | return account_id 125 | except botocore.exceptions.ClientError as error: 126 | raise error 127 | 128 | def get_approved_regions(): 129 | session = get_aws_session(session_name = 'ValidateRegions') 130 | ec2 = session.client("ec2", region_name='us-east-1') 131 | response = ec2.describe_regions() 132 | approved_regions = [region["RegionName"] for region in response["Regions"]] 133 | return approved_regions 134 | 135 | def regions_validator(input_regions): 136 | 137 | approved_regions = get_approved_regions() 138 | 139 | if 'ALL' in input_regions: 140 | if len(input_regions) == 1: #'ALL' is the only input 141 | return approved_regions 142 | else: 143 | raise argparse.ArgumentTypeError(f"When providing 'ALL' as an input region, please do not provide any other regions. 'ALL' implies all approved regions.") 144 | else: 145 | for input_region in input_regions: 146 | if input_region not in approved_regions: 147 | raise argparse.ArgumentTypeError(f"{input_region} is not in the list of approved regions for this account. Please provide only approved regions, or specify ALL for all regions that are approved") 148 | return input_regions 149 | 150 | def services_validator(input_services): 151 | if 'ALL' in input_services: 152 | if len(input_services) == 1: #'ALL' is the only input 153 | return all_services 154 | else: 155 | raise argparse.ArgumentTypeError(f"When providing 'ALL' as an input service, please do not provide any other services. 'ALL' implies the following services: {all_services}") 156 | else: 157 | return input_services 158 | 159 | def bus_arn_validator(event_bus_arn): 160 | 161 | if event_bus_arn is None: 162 | return event_bus_arn 163 | 164 | arn_parts = parse_arn(event_bus_arn) 165 | 166 | #ARN is validated. Now check if the region is correct. 167 | if arn_parts['region'] == 'ALL': 168 | raise argparse.ArgumentTypeError(f"Invalid region in event bus arn") 169 | else: 170 | approved_regions = get_approved_regions() 171 | if arn_parts['region'] not in approved_regions: 172 | raise argparse.ArgumentTypeError(f"{arn_parts['region']} is not in the list of approved regions for this account. Please provide an event bus in an approved regions") 173 | 174 | #Check if the resource is in the right format 175 | bus_name_regex = r"^[A-Za-z0-9._-]{1,256}$" 176 | bus_name_pattern = re.compile(bus_name_regex) 177 | 178 | if arn_parts['resource_type'] != "event-bus": 179 | raise argparse.ArgumentTypeError(f"Resource type '{arn_parts['resource_type']}' in the ARN is not valid for an event bus ARN. It should be 'event-bus'") 180 | elif not bus_name_pattern.match(arn_parts['resource_id']): 181 | raise argparse.ArgumentTypeError(f"{arn_parts['resource_id']} is not a valid bus name. Maximum of 256 characters consisting of numbers, lower/upper case letters, .,-,_.") 182 | 183 | return event_bus_arn 184 | 185 | def arn_validator(arn): 186 | regex = r"^arn:(aws|aws-gov|aws-cn):.*:.*:.*:.*/$" 187 | pattern = re.compile(regex) 188 | if not pattern.match(arn): 189 | raise argparse.ArgumentTypeError(f"The provided ARN is invalid. Please provide a valid ARN. Ref: https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html") 190 | return arn 191 | 192 | def bucket_name_validator(bucket_name): 193 | 194 | regex = r"^[a-z0-9][a-z0-9.-]{1,61}[a-z0-9]$" 195 | pattern = re.compile(regex) 196 | if not (3 <= len(bucket_name) <= 63): 197 | raise argparse.ArgumentTypeError(f"Invalid bucket name. It must be between 3 and 63 characters in length") 198 | if not pattern.match(bucket_name): 199 | raise argparse.ArgumentTypeError(f"Invalid bucket name. Bucket names must be between 3 (min) and 63 (max) characters long. Bucket names can consist only of lowercase letters, numbers, dots (.), and hyphens (-). Bucket names must begin and end with a letter or number.") 200 | if ".." in bucket_name: 201 | raise argparse.ArgumentTypeError(f"Invalid bucket name. Bucket names should not have consecutive periods '..' ") 202 | if bucket_name.startswith("xn--") or bucket_name.endswith('-s3alias'): 203 | raise argparse.ArgumentTypeError(f"Invalid bucket name. Bucket names should not start with 'xn--' or end with '-s3alias'") 204 | 205 | return bucket_name 206 | 207 | 208 | def get_config_info(): 209 | 210 | #Define the arguments 211 | parser = argparse.ArgumentParser(description='Generate fault tolerance findings for different services', add_help=False) 212 | 213 | required_params_group = parser.add_argument_group('Required arguments') 214 | required_params_group.add_argument('-s', '--services', nargs='+', choices = all_services + ['ALL'], 215 | help=f"Indicate which service(s) you want to fetch fault tolerance findings for. Options are {all_services}. Use 'ALL' for all services", 216 | required = True 217 | ) 218 | required_params_group.add_argument('-r', '--regions', nargs='+', 219 | help='Indicate which region(s) you want to fetch fault tolerance findings for. Use "ALL" for all approved regions', 220 | required = True 221 | ) 222 | 223 | optional_params_group = parser.add_argument_group('Optional arguments') 224 | optional_params_group.add_argument('-h', '--help', action="help", help = "show this message and exit") 225 | 226 | optional_params_group.add_argument('-m', '--max-concurrent-threads', dest='max_concurrent_threads', 227 | default = 20, 228 | type=int, 229 | help='Maximum number of threads that will be running at any given time. Default is 20') 230 | optional_params_group.add_argument('-o', '--output', dest='output_folder_name', 231 | default='output/', 232 | type=regex_validator_generator(regex = r".+/$", desc_param_name = "Output folder name", 233 | custom_message = "Provide an output folder name where the findings csv and the run report csv will be placed"), 234 | help='''Name of the folder where findings output csv file and the run report csv file will be written. 235 | If it does not exist, it will be created. If a bucket name is also provided, then the folder will be looked for under the bucket, and if not present, will be created 236 | If a bucket name is not provided, then this folder will be expected under the directory in which the script is ran. In case a bucket is provided, the files will be generated in this folder first and then pushed to the bucket 237 | Please ensure there is a forward slash '/' at the end of the folder path 238 | Output file name will be of the format Fault_Tolerance_Findings___.csv. Example: Fault_Tolerance_Findings_123456789101_TestAccount_2022_11_01.csv 239 | If you do not use the --filename-with-accountid option, the output file name will be of the format: 240 | Fault_Tolerance_Findings_.csv. Example: Fault_Tolerance_Findings_2022_11_01.csv''') 241 | optional_params_group.add_argument('-b', '--bucket', dest='bucket_name', 242 | default = None, 243 | type=bucket_name_validator, 244 | help='Name of the bucket where findings output csv file and the run report csv file will be uploaded to') 245 | optional_params_group.add_argument('--event-bus-arn', dest='event_bus_arn', 246 | default=None, 247 | type=regex_validator_generator(regex = r"arn:(aws|aws-gov|aws-cn):events:.*:.*:event-bus*/[A-Za-z0-9._-]{1,256}$", desc_param_name = "Event Bus ARN", 248 | custom_message = "Provide the ARN of an event bus in AWS Eventbridge to which findings will be published"), 249 | help='''ARN of the event bus in AWS Eventbridge to which findings will be published.''') 250 | optional_params_group.add_argument('--aws-profile', dest='aws_profile_name', 251 | default=None, 252 | type=maxlen_validator_generator(max_len = 250,desc_param_name = "AWS Profile name"), 253 | help="Use this option if you want to pass in an AWS profile already congigured for the CLI") 254 | optional_params_group.add_argument('--aws-assume-role', dest='aws_assume_role_name', 255 | default=None, 256 | type=regex_validator_generator(regex = r"^[a-zA-Z0-9+=,.@_-]+$", desc_param_name = "IAM Role name"), 257 | #type=iam_entity, 258 | help="Use this option if you want the aws profile to assume a role before querying Org related information") 259 | optional_params_group.add_argument('--log-level', dest='log_level', 260 | default='ERROR', choices = ['DEBUG','INFO','WARNING','ERROR','CRITICAL'], 261 | help="Log level. Needs to be one of the following: 'DEBUG','INFO','WARNING','ERROR','CRITICAL'") 262 | optional_params_group.add_argument('--single-threaded', action='store_true', dest='single_threaded', 263 | default=False, 264 | help="Use this option to specify that the service+region level information gathering threads should not run in parallel. Default is False, which means the script uses multi-threading by default. Same effect as setting max-running-threads to 1") 265 | optional_params_group.add_argument('--truncate-output', action='store_true', dest='truncate_output', 266 | default=False, 267 | help="Use this flag to make sure that if the output file already exists, the file is truncated. Default is False. Useful if you are invoking this script to refresh findings within the same day (on a different day, the output file will have a different file name)") 268 | optional_params_group.add_argument('--filename-with-accountid', action='store_true', dest='filename_with_accountid', 269 | default=False, 270 | help='''Use this flag to include account id in the output file name. 271 | By default this is False, meaning, account id will not be in the file name. 272 | The default mode is useful if you are running the script for more than one account, 273 | and want all the accounts' findings to be in the same output file.''') 274 | optional_params_group.add_argument('--report-only-issues', action='store_true', dest='report_only_issues', 275 | default=False, 276 | help="Use this flag to report only findings that are potential issues. Resources that have no identified issues will not appear in the final csv file. Default is to report all findings.") 277 | args = parser.parse_args() 278 | 279 | #Set up logging 280 | logging.basicConfig( 281 | format='%(asctime)s %(levelname)-8s %(message)s', 282 | level=args.log_level, 283 | datefmt='%Y-%m-%d %H:%M:%S') 284 | 285 | global config_info 286 | 287 | config_info = ConfigInfo( 288 | regions = [], 289 | services = [], 290 | max_concurrent_threads = args.max_concurrent_threads, 291 | output_folder_name = args.output_folder_name, 292 | event_bus_arn=args.event_bus_arn, 293 | log_level = args.log_level, 294 | aws_profile_name = args.aws_profile_name, 295 | aws_assume_role_name = args.aws_assume_role_name, 296 | single_threaded = args.single_threaded, 297 | run_report_file_name = "run_report.csv", 298 | bucket_name = args.bucket_name, 299 | account_id = '', 300 | truncate_output = args.truncate_output, 301 | filename_with_accountid = args.filename_with_accountid, 302 | report_only_issues = args.report_only_issues 303 | ) 304 | 305 | 306 | #First check credentials 307 | account_id = check_aws_credentials() 308 | 309 | #Validate regions 310 | config_info.account_id = account_id 311 | config_info.regions = regions_validator(args.regions) 312 | config_info.services = services_validator(args.services) 313 | config_info.event_bus_arn = bus_arn_validator(args.event_bus_arn) 314 | 315 | def invoke_aws_api_full_list (api_method, top_level_member, **kwargs): 316 | 317 | logging.info(f"Invoking {api_method.__self__.__class__.__name__}.{api_method.__name__} for {top_level_member} with the parameters {kwargs}") 318 | response = api_method(**kwargs) 319 | 320 | for response_item in response[top_level_member]: 321 | yield(response_item) 322 | 323 | while ('NextToken' in response): 324 | response = api_method(NextToken = response['NextToken'], **kwargs) 325 | for response_item in response[top_level_member]: 326 | yield(response_item) 327 | 328 | def parse_arn(arn): 329 | parts = arn.split(":") 330 | if len(parts) == 7: #Follows the format "arn:partition:service:region:account-id:resource-type:resource-id" 331 | result = { 332 | 'arn': parts[0], 333 | 'partition': parts[1], 334 | 'service': parts[2], 335 | 'region': parts[3], 336 | 'account_id': parts[4], 337 | 'resource_type': parts[5], 338 | 'resource_id': parts[6] 339 | } 340 | elif len(parts) == 6: 341 | if "/" in parts[5]: #Follows the format "arn:partition:service:region:account-id:resource-type/resource-id" 342 | resource_info = parts[5] 343 | resource_parts = resource_info.split("/") 344 | result = { 345 | 'arn': parts[0], 346 | 'partition': parts[1], 347 | 'service': parts[2], 348 | 'region': parts[3], 349 | 'account_id': parts[4], 350 | 'resource_type': resource_parts[0], 351 | 'resource_id': resource_parts[1], 352 | } 353 | else: #follows the format #Follows the format "arn:partition:service:region:account-id:resource-id" 354 | result = { 355 | 'arn': parts[0], 356 | 'partition': parts[1], 357 | 'service': parts[2], 358 | 'region': parts[3], 359 | 'account_id': parts[4], 360 | 'resource_type': None, 361 | 'resource_id': parts[5], 362 | } 363 | else: 364 | raise argparse.ArgumentTypeError(f"Invalid ARN. Does not follow the pattern defined here: https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html") 365 | 366 | return result 367 | --------------------------------------------------------------------------------