├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── Dockerfile
├── LICENSE
├── README.md
├── minimal_IAM_policy.json
├── requirements.txt
└── src
    ├── account_analyser.py
    ├── requirements.txt
    ├── service_analyser.py
    ├── service_specific_analysers
        ├── cloudhsm_analyser.py
        ├── dax_analyser.py
        ├── dms_analyser.py
        ├── docdb_analyser.py
        ├── dx_analyser.py
        ├── efs_analyser.py
        ├── elasticache_analyser.py
        ├── fsx_analyser.py
        ├── globalaccelerator_analyser.py
        ├── lambda_analyser.py
        ├── memorydb_analyser.py
        ├── opensearch_analyser.py
        ├── rds_analyser.py
        ├── redshift_analyser.py
        ├── sgw_analyser.py
        └── vpce_analyser.py
    └── utils.py


/.gitignore:
--------------------------------------------------------------------------------
1 | src/__pycache__
2 | src/service_specific_analysers/__pycache__
3 | src/output/
4 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing Guidelines
 2 | 
 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
 4 | documentation, we greatly value feedback and contributions from our community.
 5 | 
 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
 7 | information to effectively respond to your bug report or contribution.
 8 | 
 9 | 
10 | ## Reporting Bugs/Feature Requests
11 | 
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 | 
14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 | 
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 | 
22 | 
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 | 
26 | 1. You are working against the latest source on the *main* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 | 
30 | To send us a pull request, please:
31 | 
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 | 
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 | 
42 | 
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
45 | 
46 | 
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 | 
52 | 
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 | 
56 | 
57 | ## Licensing
58 | 
59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 | 


--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM python:3.9
 2 | 
 3 | RUN apt-get update -y
 4 | #RUN apt-get install -y python-pip python-dev build-essential
 5 | 
 6 | COPY ./src/requirements.txt /src/
 7 | 
 8 | WORKDIR /src
 9 | RUN pip install --no-cache-dir -r requirements.txt
10 | 
11 | COPY ./src/*.py /src/
12 | COPY ./src/service_specific_analysers/ /src/service_specific_analysers/
13 | 
14 | ENTRYPOINT ["python3", "./account_analyser.py"]
15 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 4 | this software and associated documentation files (the "Software"), to deal in
 5 | the Software without restriction, including without limitation the rights to
 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
 7 | the Software, and to permit persons to whom the Software is furnished to do so.
 8 | 
 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
15 | 
16 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Fault Tolerance Analyser
  2 | 
  3 | ## __Table of Contents__
  4 | 1. [Description](#1-description)
  5 | 2. [Motivation](#2-motivation)
  6 | 3. [Permissions needed to run the tool](#3-permissions-needed-to-run-the-tool)
  7 | 4. [Installation](#4-installation)
  8 | 5. [Running the tool using Python directly](#5-running-the-tool-using-python-directly)
  9 | 6. [Running the tool as a Docker container](#6-running-the-tool-as-a-docker-container)
 10 | 7. [Functional Design](#7-functional-design)  
 11 |   7.1 [VPC Endpoints](#71-vpc-endpoints)  
 12 |   7.2 [Database Migration Service](#72-database-migration-service)  
 13 |   7.3 [DocumentDB Clusters](#73-documentdb)  
 14 |   7.4 [Storage Gateway](#74-storage-gateway)  
 15 |   7.5 [Elastic File System](#75-elastic-file-system)  
 16 |   7.6 [Opensearch](#76-opensearch)  
 17 |   7.7 [FSX](#77-fsx)  
 18 |   7.8 [Lambda](#78-lambda)  
 19 |   7.9 [Elasticache](#79-elasticache)  
 20 |   7.10 [Memory DB](#710-memory-db)  
 21 |   7.11 [DynamoDB Accelerator](#711-dynamodb-accelerator)  
 22 |   7.12 [Global Accelerator](#712-global-accelerator)  
 23 |   7.13 [Relational Database Service](#713-relational-database-service)   
 24 |   7.14 [Direct Connect](#714-direct-connect)  
 25 |   7.15 [Cloud HSM](#715-cloud-hsm)  
 26 |   7.16 [Redshift](#716-redshift)  
 27 | 8. [Non-Functional Design](#8-non-functional-design)
 28 | 9. [Security](#9-security)
 29 | 10. [Contributing](#10-contributing)
 30 | 11. [Frequently Asked Questions (FAQ)](#11-frequently-asked-questions-faq)
 31 | 12. [License](#12-license)
 32 | 
 33 | ## __1. Description__
 34 | A tool to generate a list of potential fault tolerance issues across different services. Please note that these are only *potential* issues.
 35 | 
 36 | There are a number of circumstances in which the finding may not pose a problem including, development workloads, cost, or not viewing this workload as business critical in the event of an AZ impacting event.
 37 | 
 38 | The output is a csv file created locally and also uploaded to an S3 bucket (if provided).
 39 | 
 40 | ## __2. Motivation__
 41 | The intent is to help customers check their workloads for any components with potential fault tolerance issues.
 42 | 
 43 | ## __3. Permissions needed to run the tool__
 44 | 
 45 | You can run the script on an EC2 with an instance role, or on your own machine with the credentials exported using the usual AWS env variables (as below) or with a profile configured using `aws configure` CLI command
 46 | 
 47 | ```
 48 | export AWS_ACCESS_KEY_ID=abc
 49 | export AWS_SECRET_ACCESS_KEY=def
 50 | export AWS_SESSION_TOKEN=ghi
 51 | ```
 52 | 
 53 | These credentials are needed as the code will invoke AWS APIs to get information about different AWS services. Except when pushing the output file to the S3 bucket, assume_role API call if a role is passed in, all other operations are "read-only". Here are the list of APIs invoked:
 54 | 
 55 | ```
 56 | #APIs invoked for common functionality like getting account information, list of regions, etc.
 57 | STS.get_caller_identity
 58 | STS.assume_role
 59 | EC2.describe_regions
 60 | Organizations.describe_account
 61 | S3.put_object
 62 | 
 63 | #APIs invoked for service specific fault tolerance analysis
 64 | Lambda.list_functions
 65 | StorageGateway.list_gateways
 66 | OpenSearchService.list_domain_names
 67 | OpenSearchService.describe_domains
 68 | OpenSearchService.describe_domain
 69 | ElastiCache.describe_cache_clusters
 70 | ElastiCache.describe_replication_groups
 71 | EFS.describe_file_systems
 72 | DirectConnect.describe_connections
 73 | DirectConnect.describe_virtual_interfaces
 74 | FSx.describe_file_systems
 75 | MemoryDB.describe_clusters
 76 | DAX.describe_clusters
 77 | DatabaseMigrationService.describe_replication_instances
 78 | RDS.describe_db_instances
 79 | RDS.describe_db_clusters
 80 | EC2.describe_vpc_endpoints
 81 | EC2.describe_instances
 82 | ```
 83 | 
 84 | You can also provide an IAM role that the above provided profile can assume.
 85 | 
 86 | If you want the least privileged policy to run this, the minimal permissions needed can be seen in minimal_IAM_policy.json. While most of the policy uses * format to provide permissions (because the tool needs to look at all resources of a specific type), but it is a good practice to specify the account id and a specific bucket name. So please replace all occurences of `account_id`, `bucket_name` and `output_folder_name` with the appropriate values. If you are passing an event bus arn to the tool to post events to the bus, then make sure you use the last section in the minimal_IAM_policy.json after modifying the `account_id`, `event-bus-region` and `event_bus_name`. If the event bus is in an account different from where the tool is being run, then make sure the resource policy on the event bus allows posting events from account the tool is running from. Reder to the [Example policy to send custom events in a different bus](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-event-bus-perms.html#eb-event-bus-example-policy-cross-account-custom-bus-source)
 87 | 
 88 | The minimal IAM policy is written in a way that you can remove sections for the resource types that you do not want the tool to look at for your account. If, say, you do not want to run this tool for directconnect, you can remove the section with the sid `DirectConnect`.
 89 | 
 90 | In the IAM policy provided, some SIDs have the suffix "ThatSupportAllResources" which means that the API calls included in that section, by default, work on all resources, and that you cannot specify specific resources. So a "*" there does not go against the best practice that wild cards should not be used in IAM policies.
 91 | 
 92 | Sections with SIDs that have the suffix "ThatRequireWildcardResources" (used for Dynamo DB Accelerator and Direct Connect) are API calls where using the wildcard is unavoidable.
 93 | 
 94 | In all other cases, the region and the resource name/id are wild cards as the tool needs to work across multiple regions and needs to look at all resources.
 95 | 
 96 | ## __4. Installation__
 97 | 1. You will need Python 3.10+ for this tool. If you do not have Python installed, please install it from here:
 98 | https://www.python.org/
 99 | 
100 | 2. Clone the repo and install the dependencies with the following command:
101 | ```
102 | pip install -r requirements.txt
103 | ```
104 | 
105 | 4. Once this is set up, you can run the tool as described in the next secion
106 | 
107 | ## __5. Running the tool using Python directly__
108 | Here is a simple example commmand on how you can run the script
109 | 
110 | ```
111 | cd src
112 | python3 account_analyser.py \
113 |     --regions us-east-1 \
114 |     --services lambda opensearch docdb rds \
115 |     --truncate-output
116 | ```
117 | 
118 | In the command above, the script is run for the us-east-1 region, and looks at the services Lambda, Opensearch, Document DB and RDS. It generates the csv file and writes it to the output sub folder in the folder it is run. The truncate-output option ensures that if there is any existing file it is truncated before the findings are added.
119 | 
120 | Once the script finishes, check the subfolder output/ and you will see 2 files like below.
121 | 
122 | ```
123 | ls output/
124 | 
125 | Fault_Tolerance_Findings_2022_11_21_17_09_19.csv
126 | Fault_Tolerance_Findings_2022_11_21_17_09_19_run_report.csv
127 | ```
128 | 
129 | The output will look like this. This shows all the findings.
130 | 
131 | ```
132 | service,region,account_id,account_name,payer_account_id,payer_account_name,resource_arn,resource_name,resource_id,potential_issue,engine,message,timestamp
133 | lambda,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:lambda:us-east-1:123456789101:function:test1z,test1z,,True,,VPC Enabled lambda in only one subnet,2022_11_29_16_20_43_+0000
134 | lambda,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:lambda:us-east-1:123456789101:function:test2az,test2az,,False,,VPC Enabled lambda in more than one subnet,2022_11_29_16_20_43_+0000
135 | docdb,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:cluster:docdb-2022-07-08-13-05-30,docdb-2022-07-08-13-05-30,cluster-JKL,True,,Single AZ Doc DB Cluster,2022_11_29_16_20_43_+0000
136 | docdb,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:cluster:docdb-2022-07-19-09-35-14,docdb-2022-07-19-09-35-14,cluster-GHI,True,,Single AZ Doc DB Cluster,2022_11_29_16_20_43_+0000
137 | docdb,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:cluster:docdb-2022-11-10-12-43-07,docdb-2022-11-10-12-43-07,cluster-DEF,True,,Single AZ Doc DB Cluster,2022_11_29_16_20_43_+0000
138 | docdb,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:cluster:docdb-2022-11-10-12-44-23,docdb-2022-11-10-12-44-23,cluster-ABC,True,,Single AZ Doc DB Cluster,2022_11_29_16_20_43_+0000
139 | opensearch,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:es:us-east-1:123456789101:domain/test4,test4,123456789101/test4,True,,Single AZ domain,2022_11_29_16_20_44_+0000
140 | opensearch,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:es:us-east-1:123456789101:domain/test5,test5,123456789101/test5,True,,Single AZ domain,2022_11_29_16_20_44_+0000
141 | opensearch,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:es:us-east-1:123456789101:domain/test2,test2,123456789101/test2,False,,Multi AZ domain,2022_11_29_16_20_44_+0000
142 | opensearch,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:es:us-east-1:123456789101:domain/test3,test3,123456789101/test3,True,,Single AZ domain,2022_11_29_16_20_44_+0000
143 | opensearch,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:es:us-east-1:123456789101:domain/test6,test6,123456789101/test6,True,,Single AZ domain,2022_11_29_16_20_44_+0000
144 | opensearch,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:es:us-east-1:123456789101:domain/test1,test1,123456789101/test1,True,,Single AZ domain,2022_11_29_16_20_44_+0000
145 | rds,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:db:database-3,database-3,,True,sqlserver-ex,RDS Instance has MultiAZ disabled,2022_11_29_16_20_44_+0000
146 | rds,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:cluster:auroraclustersingleaz,auroraclustersingleaz,,True,aurora-mysql,DB Cluster has MultiAZ disabled,2022_11_29_16_20_44_+0000
147 | rds,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:cluster:aurora-mysql-multiaz,aurora-mysql-multiaz,,False,aurora-mysql,DB Cluster has MultiAZ enabled,2022_11_29_16_20_44_+0000
148 | rds,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:cluster:database-4,database-4,,False,postgres,DB Cluster has MultiAZ enabled,2022_11_29_16_20_44_+0000
149 | rds,us-east-1,123456789101,TestAccount,999456789101,TestParentAccount,arn:aws:rds:us-east-1:123456789101:cluster:mysql-cluster,mysql-cluster,,False,mysql,DB Cluster has MultiAZ enabled,2022_11_29_16_20_44_+0000
150 | ```
151 | 
152 | The run report will look like this. This gives an idea of how long each service+region combination took.
153 | ```
154 | account_id,region,service,result,error_message,start_time,end_time,runtime_in_seconds
155 | 625787456381,us-east-1,opensearch,Success,,2022_11_29_16_20_42_+0000,2022_11_29_16_20_43_+0000,1.05
156 | 625787456381,us-east-1,lambda,Success,,2022_11_29_16_20_42_+0000,2022_11_29_16_20_43_+0000,1.12
157 | 625787456381,us-east-1,docdb,Success,,2022_11_29_16_20_42_+0000,2022_11_29_16_20_44_+0000,1.74
158 | 625787456381,us-east-1,rds,Success,,2022_11_29_16_20_42_+0000,2022_11_29_16_20_45_+0000,2.62
159 | 625787456381,Overall,Overall,N/A,N/A,2022_11_29_16_20_42_+0000,2022_11_29_16_20_45_+0000,2.68
160 | ```
161 | 
162 | The same files will also be pushed to an S3 bucket if you provide a bucket name as a command line argument. When you provide a bucket, please make sure the bucket is properly secured as the output from this tool will be written to that bucket, and it could contain sensitive information (like names of RDS instances or other configuration detail) that you might not want to share widely.
163 | 
164 | 
165 | Use the option --help to look at all the options. Here are the options.
166 | 
167 | ```
168 | python3 account_analyser.py --help
169 | usage: account_analyser.py -s {vpce,dms,docdb,sgw,efs,opensearch,fsx,lambda,elasticache,dax,globalaccelerator,rds,memorydb,dx,ALL}
170 |                                       [{vpce,dms,docdb,sgw,efs,opensearch,fsx,lambda,elasticache,dax,globalaccelerator,rds,memorydb,dx,ALL} ...] -r REGIONS [REGIONS ...] [-h]
171 |                                       [-m MAX_CONCURRENT_THREADS] [-o OUTPUT_FOLDER_NAME] [-b BUCKET_NAME] [--event-bus-arn EVENT_BUS_ARN] [--aws-profile AWS_PROFILE_NAME]
172 |                                       [--aws-assume-role AWS_ASSUME_ROLE_NAME] [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--single-threaded] [--truncate-output] [--filename-with-accountid]
173 |                                       [--report-only-issues]
174 | 
175 | Generate fault tolerance findings for different services
176 | 
177 | Required arguments:
178 |   -s {vpce,dms,docdb,sgw,efs,opensearch,fsx,lambda,elasticache,dax,globalaccelerator,rds,memorydb,dx,ALL} [{vpce,dms,docdb,sgw,efs,opensearch,fsx,lambda,elasticache,dax,globalaccelerator,rds,memorydb,dx,cloudhsm,ALL} ...], --services {vpce,dms,docdb,sgw,efs,opensearch,fsx,lambda,elasticache,dax,globalaccelerator,rds,memorydb,dx,cloudhsm,ALL} [{vpce,dms,docdb,sgw,efs,opensearch,fsx,lambda,elasticache,dax,globalaccelerator,rds,memorydb,dx,cloudhsm,ALL} ...]
179 |                         Indicate which service(s) you want to fetch fault tolerance findings for. Options are ['vpce', 'dms', 'docdb', 'sgw', 'efs', 'opensearch', 'fsx', 'lambda', 'elasticache', 'dax',
180 |                         'globalaccelerator', 'rds', 'memorydb', 'dx', 'cloudhsm']. Use 'ALL' for all services
181 |   -r REGIONS [REGIONS ...], --regions REGIONS [REGIONS ...]
182 |                         Indicate which region(s) you want to fetch fault tolerance findings for. Use "ALL" for all approved regions
183 | 
184 | Optional arguments:
185 |   -h, --help            show this message and exit
186 |   -m MAX_CONCURRENT_THREADS, --max-concurrent-threads MAX_CONCURRENT_THREADS
187 |                         Maximum number of threads that will be running at any given time. Default is 20
188 |   -o OUTPUT_FOLDER_NAME, --output OUTPUT_FOLDER_NAME
189 |                         Name of the folder where findings output csv file and the run report csv file will be written. If it does not exist, it will be created. If a bucket name is also provided, then
190 |                         the folder will be looked for under the bucket, and if not present, will be created If a bucket name is not provided, then this folder will be expected under the directory in
191 |                         which the script is ran. In case a bucket is provided, the files will be generated in this folder first and then pushed to the bucket Please ensure there is a forward slash '/'
192 |                         at the end of the folder path Output file name will be of the format Fault_Tolerance_Findings_<account_id>_<account_name>_<Run date in YYYY_MM_DD format>.csv. Example:
193 |                         Fault_Tolerance_Findings_123456789101_TestAccount_2022_11_01.csv If you do not use the --filename-with-accountid option, the output file name will be of the format:
194 |                         Fault_Tolerance_Findings_<Run date in YYYY_MM_DD format>.csv. Example: Fault_Tolerance_Findings_2022_11_01.csv
195 |   -b BUCKET_NAME, --bucket BUCKET_NAME
196 |                         Name of the bucket where findings output csv file and the run report csv file will be uploaded to
197 |   --event-bus-arn EVENT_BUS_ARN
198 |                         ARN of the event bus in AWS Eventbridge to which findings will be published.
199 |   --aws-profile AWS_PROFILE_NAME
200 |                         Use this option if you want to pass in an AWS profile already congigured for the CLI
201 |   --aws-assume-role AWS_ASSUME_ROLE_NAME
202 |                         Use this option if you want the aws profile to assume a role before querying Org related information
203 |   --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
204 |                         Log level. Needs to be one of the following: 'DEBUG','INFO','WARNING','ERROR','CRITICAL'
205 |   --single-threaded     Use this option to specify that the service+region level information gathering threads should not run in parallel. Default is False, which means the script uses multi-threading
206 |                         by default. Same effect as setting max-running-threads to 1
207 |   --truncate-output     Use this flag to make sure that if the output file already exists, the file is truncated. Default is False. Useful if you are invoking this script to refresh findings within
208 |                         the same day (on a different day, the output file will have a different file name)
209 |   --filename-with-accountid
210 |                         Use this flag to include account id in the output file name. By default this is False, meaning, account id will not be in the file name. The default mode is useful if you are
211 |                         running the script for more than one account, and want all the accounts' findings to be in the same output file.
212 |   --report-only-issues   Use this flag to report only findings that are potential issues. Resources that have no identified issues will not appear in the final csv file. Default is to report all
213 |                         findings.
214 | 
215 | 
216 | ```
217 | 
218 | ## __6. Running the tool as a Docker container__
219 | 
220 | Instead of installing Python and the dependencies, you can just use the Docker file and run the tool as a container. Here is how to do it.
221 | 
222 | 1. Clone the repo and bulid the image by running the command `docker build . -t fault_tolerance_analyser`
223 | 
224 | 2. If you are using an AWS profile use the following command (note how the credentials file is mapped into the container so that the container will have access to the credentials). Also note that the second volume being mapped is the folder into which the output file to be written.
225 | 
226 | ```
227 | docker run \
228 |     -v $HOME/.aws/credentials:/root/.aws/credentials:ro \
229 |     -v $PWD/src/output/:/src/output/:rw \
230 |     fault_tolerance_analyser \
231 |     --regions us-east-1 \
232 |     --services lambda opensearch docdb rds \
233 |     --truncate-output
234 | ```
235 | 
236 | 3. If you are using AWS credentials exported as env variables you can run it as below. You can remove AWS_SESSION_TOKEN if you are using long term credentials
237 | 
238 | ```
239 | docker run \
240 |     -v $PWD/src/output/:/src/output/:rw \
241 |     -e AWS_ACCESS_KEY_ID \
242 |     -e AWS_SECRET_ACCESS_KEY \
243 |     -e AWS_SESSION_TOKEN \
244 |     fault_tolerance_analyser \
245 |     --regions us-east-1 \
246 |     --services lambda opensearch docdb rds \
247 |     --truncate-output
248 | ```
249 | 
250 | 4. If you are running on an EC2 machine with an IAM role associated with the machine, then you can just run it without env variables or credential files as below.
251 | 
252 | ```
253 | docker run \
254 |     -v $PWD/src/output/:/src/output/:rw \
255 |     fault_tolerance_analyser \
256 |     --regions us-east-1 \
257 |     --services lambda opensearch docdb rds \
258 |     --truncate-output
259 | ```
260 | 
261 | ## __7. Functional Design__
262 | 
263 | ### 7.1 VPC Endpoints
264 | It is a best practice to make sure that VPC Interface Endpoints have ENIs in more than one subnet. If a VPC endpoint has an ENI in only a single subnet, this tool will flag that as a potential issue. You cannot create VPC Endpoints in 2 different subnets in the same AZ. So, for the purpose of VPC endpoints, having multiple subnets implies multiple AZs.
265 | 
266 | ### 7.2 Database Migration Service
267 | If the DMS Replication Instance is not configured with at least 2 instances in different availability zones, then it will be flagged as a potential issue.
268 | 
269 | Reference: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReplicationInstance.html
270 | 
271 | ### 7.3 DocumentDB
272 | If the Document DB Cluster does not have a replica in a different AZ, it will be flagged as a potential issue.
273 | 
274 | Reference: https://docs.aws.amazon.com/documentdb/latest/developerguide/failover.html
275 | 
276 | ### 7.4 Storage Gateway
277 | Storage Gateway, when deployed on AWS, runs on a single Amazon EC2 instance. Therefore this is a single-point of availability failure for any applications expecting highly available access to application storage. Such storage gateways will be flagged as part of this check as a potential issue.
278 | 
279 | Customers who are running Storage Gateway as a mechanism for providing file-based application storage that require high-availability should consider migrating their workloads to Amazon EFS, FSx, or other storage services that can provide higher availability architectures than Storage Gateway.
280 | 
281 | ### 7.5 Elastic File System
282 | This check tags both of the following scenarios as potential issues:
283 | 1. Running an "EFS One Zone" deployment
284 | 2. Running "EFS Standard" class deployment with a mount target in only one AZ.
285 | 
286 | Customers that have chosen to deploy a One Zone class of storage, should ensure these workloads are not "mission-critical" that require high-availability and that the choice was made appropriately.
287 | 
288 | For customers identified that are running a Standard class EFS deployment, where multi-az replication is provided by the service, they have only a single mount target to access their file systems.  If an availability issue were to occur in that availability zone, the customer would lose access to the EFS deployment, even though other AZs/subnets were unaffected.
289 | 
290 | ### 7.6 Opensearch
291 | Any single-node domains, as well as OpenSearch domains with multiple nodes all of which are deployed within the same Availability Zone would be flagged as a potential issue by this tool.
292 | 
293 | ### 7.7 FSx
294 | Any FSx Windows systems deployed as Single-AZ is flagged as a potential issue by this tool.
295 | 
296 | Customers have the option to choose a Mulit-AZ or Single-AZ deployment when creating their file server deployment.
297 | 
298 | ### 7.8 Lambda
299 | Any Lambda function that is configured only to execute in a single Availability Zone are flagged as a potential issue.
300 | Reference: https://docs.aws.amazon.com/lambda/latest/dg/security-resilience.html
301 | 
302 | ### 7.9 Elasticache
303 | The following clusters are flagged as potential Single AZ issues
304 | 
305 | 1. All Memcached clusters - Data is not replicated between memcached cluster nodes. Even if a customer has deployed nodes across multiple availability zones, the data present on any nodes that have a loss of availability (related to those hosts or their AZ) will result in the data in those cache nodes being unavailable as well.
306 | 
307 | 2. Redis clusters - The following clusters are taggeed as a issue  
308 |   2.1 Any single node clusters  
309 |   2.2 Any "Cluster Mode" disabled clusters.  
310 |   2.3 Any "Cluster Mode" enabled clusters with at least one Node group having all the nodes in the same AZ.  
311 |   2.4 "Cluster Mode" enabled clusters but Auto Failover disabled.  
312 |   2.5 "Cluster Mode" enabled clusters having shards with no replicas.  
313 |   
314 |   Reference: https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Replication.Redis-RedisCluster.html
315 | 
316 | ### 7.10 Memory DB
317 | Any Memory DB cluster that has a single node in a shard is flagged as a potential issue by this tool.
318 | 
319 | ### 7.11 DynamoDB Accelerator
320 | Any single-node clusters, as well as DAX clusters with multiple nodes all deployed within the same Availability Zone would be flagged as being a potential issue by this tool.
321 | 
322 | ### 7.12 Global Accelerator
323 | Any "Standard" Global accelerators that are configured to target endpoints consisting only of EC2 instances in a single Availability Zone are flagged by this tool. "Custom Routing" Global Accelerators are not covered.
324 | 
325 | ### 7.13 Relational Database Service
326 | Any single AZ RDS Instance or Cluster is flagged as a potential issue by this tool.
327 | 
328 | ### 7.14 Direct Connect
329 | The following scenarios are flagged as potential issue by this tool:
330 | 1. Any region with a single Direct Connect connection.
331 | 2. Any region where there is more than one direct connection, but all of them use the same location.
332 | 3. Any Virtual Gateway with only one VIF
333 | 4. Any Virtual Gateway with more than one VIF but all of the VIFs on the same direct connect Connection.
334 | 
335 | ### 7.15 Cloud HSM
336 | The following scenarios are flagged as potential issue by this tool:
337 | 1. Any cluster with a single HSM.
338 | 2. Any cluster with multiple HSMs all of which are in a single AZ.
339 | 
340 | ### 7.16 Redshift
341 | Any Redshift cluster with all its nodes in a single AZ will be flagged as a potential issue by this tool.
342 | 
343 | ## __8. Non-Functional Design__
344 | 
345 | There are two main classes:
346 | 
347 | ### ServiceAnalyser
348 | The ServiceAnalyser is an abstract class from which all the service specific analysers are inherited. The service specific analysers contain the logic to identify potential issues for a given region.
349 | 
350 | ### AccountAnalyser
351 | An object of this class is initiated as part of the "main" functionality. This loops through all the services and regions and instantiates the service specific analyser for each region+service combination and triggers the method to gather the findings in that service specific analyser. Once the findings are received, it writes it to a file.
352 | 
353 | The AccountAnalyser logic can run either in multi-threaded or single-threaded mode. In multi-threaded mode, the analyser for each service+region combination runs in a separate thread. This is the default mode. This saves a lot of time as there are 14 analysers running making API calls and that too across multiple regions.
354 | 
355 | In multi-threaded mode, care is taken to ensure that when writing the findings to an output file, multiple threads do not try to do it at the same time (with the help of a lock).
356 | 
357 | When all the analysers are run, the output file is uploaded to an S3 bucket, if provided.
358 | 
359 | ## __9. Security__
360 | 
361 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
362 | 
363 | ## __10. Contributing__
364 | 
365 | See [CONTRIBUTING](CONTRIBUTING.md#contributing-via-pull-requests) for more information.
366 | 
367 | ## __11. Frequently Asked Questions (FAQ)__
368 | 
369 | ### What is the purpose of the Fault Tolerance tool?
370 | * The Fault Tolerance tool is designed to identify potential single points of failure across different AWS services (see listed of supported services [Functional Design](#7-functional-design)) in your account. By detecting resources that could cause disruption to the system if they were to fail, the tool helps customers build more fault tolerant workloads in AWS. This is in line with the guidance provided by the [Well-Architected Framework - Reliability Pillar - Use fault isolation to protect your workload](https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/use-fault-isolation-to-protect-your-workload.html), which emphasizes the use of fault isolation to protect workloads.
371 | 
372 | ### How does the Fault Tolerance tool work?
373 | * The Fault Tolerance tool analyzes on your AWS account and generates a report that highlights potential fault tolerance issues across various AWS services. This report allows customers to identify resources where a single failure could disrupt the system and take steps to add redundancy to prevent downtime.
374 | 
375 | ### What permissions are needed to run the tool?
376 | * In order to check the required permission for the tool, please check the detailed instructions on the [Permissions needed to run the tool](#3-permissions-needed-to-run-the-tool) section. 
377 | 
378 | ### How can I install and run the Fault Tolerance Analyser tool?
379 | * In order to install the tool, please check the detailed instructions on the [Installation](#4-installation) section. 
380 | 
381 | ### What are the potential fault tolerance issues identified by the tool?
382 | * The Fault Tolerance Analyser tool checks various AWS services for potential issues. Some examples of issues identified include VPC endpoints with ENIs in a single subnet, DMS replication instances without multi-AZ configuration, DocumentDB clusters without replicas in different AZs, Lambda functions executing in a single AZ, and more. The [Functional Design](#7-functional-design) section provides a detailed explanation of risks for each supported service analyzed.
383 | 
384 | ### Can the tool be run as a Docker container?
385 | * Yes, the Fault Tolerance Analyser tool can be run as a Docker container. The [Running the tool as a Docker container](#6-running-the-tool-as-a-docker-container) section provides instructions on how to build the Docker image using the provided Dockerfile. You can then run the tool as a container, either with an AWS profile, AWS credentials exported as environment variables, or on an EC2 machine with an associated IAM role. The necessary volume mappings and command examples are provided in the [Running the tool as a Docker container](#6-running-the-tool-as-a-docker-container) secion.
386 | 
387 | ### Does the tool support uploading the generated report to an S3 bucket?
388 | * Yes, the tool supports uploading the generated CSV report to an S3 bucket. Customers can provide the name of the bucket as a command-line argument when running the tool. Once the analysis is completed, the tool will upload the report file to the specified S3 bucket, allowing customers to centralize the findings and access them from a secure and controlled location. You can provide the `-b BUCKET_NAME` or `--bucket BUCKET_NAME` flag when running the tool to specify the S3 bucket to have the reports output. 
389 | 
390 | ### Can I integrate the tool with my internal ticketing systems to track findings?
391 | * Yes, the tool supports sending findings to an Amazon Eventbridge event bus. This allows for integrating with any other system easily. You can use the `--event-bus-arn` option to provide the ARN of the event bus.
392 | 
393 | ### How is this tool different from Trusted Advisor and Resilience Hub?
394 | * The fault tolerance tool described here is a fully open-source tool, released under the MIT license, designed to generate a list of potential fault tolerance issues specific to different AWS services. It focuses on identifying potential issues related to fault tolerance and provides a detailed report that helps customers assess the fault tolerance of their workloads. Customers have the ability to customize and deploy the tool as per their requirements, making it a flexible and adaptable solution.
395 | * [AWS Trusted Advisor](https://aws.amazon.com/premiumsupport/technology/trusted-advisor/) is a service that draws upon best practices learned from serving hundreds of thousands of AWS customers. It inspects your AWS environment and provides recommendations when opportunities exist to save money, improve system availability and performance, or close security gaps. It specifically operates across five unique areas: Security, Performance, Cost Optimization, Fault Tolerance, and AWS Service Quotas.
396 | * [AWS Resilience Hub](https://aws.amazon.com/resilience-hub/) offers a centralized location to define, validate, and track the resiliency of your AWS applications. It helps protect your applications from disruptions, reduces recovery costs, and optimizes business continuity. You can describe your applications using AWS CloudFormation, Terraform state files, AWS Resource Groups, or choose from applications already defined in AWS Service Catalog AppRegistry.
397 | 
398 | 
399 | ### What is the open-source nature of this tool, and can customers contribute to its development?
400 | * This tool is built using some open-source technologies and follows an open-source development approach. The code for this tool is hosted on a public repository (aws-samples), allowing customers to access and contribute to its development. Customers can submit bug reports, feature requests, and even contribute enhancements or additional service-specific analyzers. Open-source collaboration promotes transparency and encourages community involvement in improving the tool's functionality. Check the [Contributing](#10-contributing) section to learn more. 
401 | 
402 | ## __12. License__
403 | 
404 | This library is licensed under the MIT-0 License. See the LICENSE file.
405 | 


--------------------------------------------------------------------------------
/minimal_IAM_policy.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "Version": "2012-10-17",
  3 |     "Statement": [
  4 |         {
  5 |             "Sid": "VPCEThatSupportAllResources",
  6 |             "Effect": "Allow",
  7 |             "Action": [
  8 |                 "ec2:DescribeVpcEndpoints"
  9 |             ],
 10 |             "Resource": "*"
 11 |         },
 12 |         {
 13 |             "Sid": "LambdaThatSupportAllResources",
 14 |             "Effect": "Allow",
 15 |             "Action": [
 16 |                 "lambda:ListFunctions"
 17 |             ],
 18 |             "Resource": "*"
 19 |         },
 20 |         {
 21 |             "Sid": "FSXThatSupportAllResources",
 22 |             "Effect": "Allow",
 23 |             "Action": [
 24 |                 "fsx:DescribeFileSystems"
 25 |             ],
 26 |             "Resource": "*"
 27 |         },
 28 |         {
 29 |             "Sid": "DMSThatSupportAllResources",
 30 |             "Effect": "Allow",
 31 |             "Action": [
 32 |                 "dms:DescribeReplicationInstances",
 33 |                 "dms:DescribeReplicationTasks"
 34 |             ],
 35 |             "Resource": "*"
 36 |         },
 37 |         {
 38 |             "Sid": "SGWThatSupportAllResources",
 39 |             "Effect": "Allow",
 40 |             "Action": [
 41 |                 "storagegateway:ListGateways"
 42 |             ],
 43 |             "Resource": "*"
 44 |         },
 45 |         {
 46 |             "Sid": "CommonAPIsThatSupportAllResources",
 47 |             "Effect": "Allow",
 48 |             "Action": [
 49 |                 "sts:GetCallerIdentity",
 50 |                 "ec2:DescribeRegions",
 51 |                 "organizations:DescribeOrganization"
 52 |             ],
 53 |             "Resource": "*"
 54 |         },
 55 |         {
 56 |             "Sid": "CommonAPIs",
 57 |             "Effect": "Allow",
 58 |             "Action": [
 59 |                 "organizations:DescribeAccount"
 60 |             ],
 61 |             "Resource": [
 62 |                 "arn:aws:organizations::<account_id>:account/o-*/*"
 63 |             ]
 64 |         },
 65 |         {
 66 |             "Sid": "DAXThatRequireWildcardResources",
 67 |             "Effect": "Allow",
 68 |             "Action": [
 69 |                 "dax:DescribeClusters"
 70 |             ],
 71 |             "Resource": [
 72 |                 "*"
 73 |             ]
 74 |         },
 75 |         {
 76 |             "Sid": "DXThatRequireWildcardResources",
 77 |             "Effect": "Allow",
 78 |             "Action": [
 79 |                 "directconnect:DescribeConnections",
 80 |                 "directconnect:DescribeVirtualInterfaces"
 81 |             ],
 82 |             "Resource": [
 83 |                 "*"
 84 |             ]
 85 |         },
 86 |         {
 87 |             "Sid": "Elasticache",
 88 |             "Effect": "Allow",
 89 |             "Action": [
 90 |                 "elasticache:DescribeReplicationGroups",
 91 |                 "elasticache:DescribeCacheClusters"
 92 |             ],
 93 |             "Resource": [
 94 |                 "arn:aws:elasticache:*:<account_id>:replicationgroup:*",
 95 |                 "arn:aws:elasticache:*:<account_id>:cluster:*"
 96 |             ]
 97 |         },
 98 |         {
 99 |             "Sid": "MemoryDB",
100 |             "Effect": "Allow",
101 |             "Action": [
102 |                 "memorydb:DescribeClusters"
103 |             ],
104 |             "Resource": [
105 |                 "arn:aws:memorydb:*:<account_id>:cluster/*"
106 |             ]
107 |         },
108 |         {
109 |             "Sid": "RDSAndDocumentDB",
110 |             "Effect": "Allow",
111 |             "Action": [
112 |                 "rds:DescribeDBInstances",
113 |                 "rds:DescribeDBClusters"
114 |             ],
115 |             "Resource": [
116 |                 "arn:aws:rds:*:<account_id>:db:*",
117 |                 "arn:aws:rds:*:<account_id>:cluster:*"
118 |             ]
119 |         },
120 |         {
121 |             "Sid": "Opensearch",
122 |             "Effect": "Allow",
123 |             "Action": [
124 |                 "es:DescribeDomain",
125 |                 "es:DescribeDomains"
126 |             ],
127 |             "Resource": [
128 |                 "arn:aws:es:*:<account_id>:domain/*"
129 |             ]
130 |         },
131 |         {
132 |             "Sid": "OpensearchThatSupportAllResources",
133 |             "Effect": "Allow",
134 |             "Action": [
135 |                 "es:ListDomainNames"
136 |             ],
137 |             "Resource": [
138 |                 "*"
139 |             ]
140 |         },
141 |         {
142 |             "Sid": "AGA",
143 |             "Effect": "Allow",
144 |             "Action": [
145 |                 "globalaccelerator:ListEndpointGroups",
146 |                 "globalaccelerator:ListListeners"
147 |             ],
148 |             "Resource": [
149 |                 "arn:aws:globalaccelerator::<account_id>:accelerator/*",
150 |                 "arn:aws:globalaccelerator::<account_id>:accelerator/*/listener/*"
151 |             ]
152 |         },
153 |         {
154 |             "Sid": "AGAThatSupportAllResources",
155 |             "Effect": "Allow",
156 |             "Action": [
157 |                 "ec2:DescribeInstances",
158 |                 "globalaccelerator:ListAccelerators"
159 |             ],
160 |             "Resource": [
161 |                 "*"
162 |             ]
163 |         },
164 |         {
165 |             "Sid": "EFS",
166 |             "Effect": "Allow",
167 |             "Action": [
168 |                 "elasticfilesystem:DescribeFileSystems"
169 |             ],
170 |             "Resource": [
171 |                 "arn:aws:elasticfilesystem:*:<account_id>:file-system/*"
172 |             ]
173 |         },
174 |         {
175 |             "Sid": "CloudHSMThatSupportAllResources",
176 |             "Effect": "Allow",
177 |             "Action": [
178 |                 "cloudhsm:DescribeClusters"
179 |             ],
180 |             "Resource": [
181 |                 "arn:aws:elasticfilesystem:*:<account_id>:file-system/*"
182 |             ]
183 |         },
184 |         {
185 |             "Sid": "S3",
186 |             "Effect": "Allow",
187 |             "Action": [
188 |                 "s3:PutObject"
189 |             ],
190 |             "Resource": [
191 |                 "arn:aws:s3:::<bucket_name>/<output_folder_name>/*"
192 |             ]
193 |         },
194 |         {
195 |             "Sid": "EventBusPermissions",
196 |             "Effect": "Allow",
197 |             "Action": [
198 |                 "events:PutEvents"
199 |             ],
200 |             "Resource": [
201 |                 "arn:aws:events:<event-bus-region>:<account_id>:event-bus/<event_bus_name>"
202 |             ]
203 |         }
204 |     ]
205 | }
206 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | boto3==1.26.124
2 | botocore==1.29.124
3 | 


--------------------------------------------------------------------------------
/src/account_analyser.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | import threading
  5 | import csv
  6 | import time
  7 | import logging
  8 | import utils
  9 | import boto3
 10 | import botocore
 11 | import datetime
 12 | import utils
 13 | import os
 14 | 
 15 | from service_specific_analysers.vpce_analyser import VPCEAnalyser
 16 | from service_specific_analysers.docdb_analyser import DocDBAnalyser
 17 | from service_specific_analysers.dms_analyser import DMSAnalyser
 18 | from service_specific_analysers.sgw_analyser import SGWAnalyser
 19 | from service_specific_analysers.efs_analyser import EFSAnalyser
 20 | from service_specific_analysers.opensearch_analyser import OpensearchAnalyser
 21 | from service_specific_analysers.fsx_analyser import FSXAnalyser
 22 | from service_specific_analysers.lambda_analyser import LambdaAnalyser
 23 | from service_specific_analysers.elasticache_analyser import ElasticacheAnalyser
 24 | from service_specific_analysers.dax_analyser import DAXAnalyser
 25 | from service_specific_analysers.globalaccelerator_analyser import GlobalAcceleratorAnalyser
 26 | from service_specific_analysers.rds_analyser import RDSAnalyser
 27 | from service_specific_analysers.memorydb_analyser import MemoryDBAnalyser
 28 | from service_specific_analysers.dx_analyser import DXAnalyser
 29 | from service_specific_analysers.cloudhsm_analyser import CloudHSMAnalyser
 30 | from service_specific_analysers.redshift_analyser import RedshiftAnalyser
 31 | 
 32 | from collections import namedtuple
 33 | 
 34 | class AccountAnalyser():
 35 | 
 36 |     analyser_classes = {}
 37 |     analyser_classes['vpce'] = VPCEAnalyser
 38 |     analyser_classes['docdb'] = DocDBAnalyser
 39 |     analyser_classes['dms'] = DMSAnalyser
 40 |     analyser_classes['sgw'] = SGWAnalyser
 41 |     analyser_classes['efs'] = EFSAnalyser
 42 |     analyser_classes['opensearch'] = OpensearchAnalyser
 43 |     analyser_classes['fsx'] = FSXAnalyser
 44 |     analyser_classes['lambda'] = LambdaAnalyser
 45 |     analyser_classes['elasticache'] = ElasticacheAnalyser
 46 |     analyser_classes['dax'] = DAXAnalyser
 47 |     analyser_classes['globalaccelerator'] = GlobalAcceleratorAnalyser
 48 |     analyser_classes['rds'] = RDSAnalyser
 49 |     analyser_classes['memorydb'] = MemoryDBAnalyser
 50 |     analyser_classes['dx'] = DXAnalyser
 51 |     analyser_classes['cloudhsm'] = CloudHSMAnalyser
 52 |     analyser_classes['redshift'] = RedshiftAnalyser
 53 | 
 54 |     def __init__ (self):
 55 |         #self.services = services
 56 |         #self.regions = regions
 57 |         self.lock = threading.Lock()
 58 |         self.threads = []
 59 |         self.account_name = ''
 60 |         self.payer_account_id = ''
 61 |         self.payer_account_name = ''
 62 |         self.run_report = []
 63 | 
 64 |         utils.get_config_info()
 65 | 
 66 |         self.account_id = utils.config_info.account_id
 67 |         self.thread_limiter = threading.BoundedSemaphore(utils.config_info.max_concurrent_threads)
 68 | 
 69 |         #Write out an empty csv file with the headers
 70 |         self.keys = [
 71 |                         'service',
 72 |                         'region',
 73 |                         'account_id',
 74 |                         'account_name',
 75 |                         'payer_account_id',
 76 |                         'payer_account_name',
 77 |                         'resource_arn',
 78 |                         'resource_name',
 79 |                         'resource_id',
 80 |                         'potential_issue',
 81 |                         'engine', #Used for Elasticache, Memory DB and RDS
 82 |                         'message',
 83 |                         'timestamp'
 84 |                     ]
 85 | 
 86 |         self.get_account_level_information()
 87 | 
 88 |         curr_time = datetime.datetime.now()
 89 |         tm = curr_time.strftime("%Y_%m_%d")
 90 | 
 91 |         #Build output file names, either with or without the account id based on the config information
 92 |         if utils.config_info.filename_with_accountid:
 93 |             self.output_file_name = f"Fault_Tolerance_Findings_{self.account_id}_{self.account_name}_{tm}.csv"
 94 |             self.run_report_file_name = f"Fault_Tolerance_Findings_{self.account_id}_{self.account_name}_{tm}_run_report.csv"
 95 |         else:
 96 |             self.output_file_name = f"Fault_Tolerance_Findings_{tm}.csv"
 97 |             self.run_report_file_name = f"Fault_Tolerance_Findings_{tm}_run_report.csv"
 98 | 
 99 |         self.output_file_full_path = f"{utils.config_info.output_folder_name}{self.output_file_name}"
100 |         self.run_report_file_full_path = f"{utils.config_info.output_folder_name}{self.run_report_file_name}"
101 | 
102 |         self.create_or_truncate_file = False
103 | 
104 |         if utils.config_info.truncate_output:
105 |             self.create_or_truncate_file = True #If truncate mode is set to True, then create/truncate file
106 |         else:
107 |             if not os.path.isfile(self.output_file_full_path):
108 |                 self.create_or_truncate_file = True #If truncate mode is set to False but file does not already exist, then create the file
109 | 
110 |         #If the folder does not exist, create it.
111 |         os.makedirs(os.path.dirname(utils.config_info.output_folder_name), exist_ok=True)
112 | 
113 |         if self.create_or_truncate_file: #If create or truncate file is true then open the file in 'w' mode and write the header
114 |             with open(self.output_file_full_path, 'w', newline='') as output_file:
115 |                 dict_writer = csv.DictWriter(output_file, self.keys)
116 |                 dict_writer.writeheader()
117 | 
118 |     def get_findings(self):
119 |         start = datetime.datetime.now().astimezone()
120 | 
121 |         for region in utils.config_info.regions:
122 |             for service in utils.config_info.services:
123 |                 analyser = self.analyser_classes[service](account_analyser = self, region = region)        
124 |                 if utils.config_info.single_threaded:
125 |                     analyser.get_and_write_findings()
126 |                 else:
127 |                     t = threading.Thread(target = analyser.get_and_write_findings, name = f"{service}+{region}")
128 |                     self.threads.append(t)
129 |                     t.start()
130 | 
131 |         #If running in multi threaded mode wait for all threads to finish
132 |         if not utils.config_info.single_threaded:
133 |             for t in self.threads:
134 |                 t.join()
135 | 
136 |         end = datetime.datetime.now().astimezone()
137 | 
138 |         self.run_report.append(
139 |                                 {
140 |                                 'account_id' : self.account_id,
141 |                                 'region'  : 'Overall',
142 |                                 'service' : 'Overall',
143 |                                 'result'  : 'N/A', 
144 |                                 'error_message' : 'N/A',
145 |                                 'start_time' : start.strftime("%Y_%m_%d_%H_%M_%S%z"),
146 |                                 'end_time' : end.strftime("%Y_%m_%d_%H_%M_%S%z"),
147 |                                 'runtime_in_seconds' : round((end-start).total_seconds(), 2)
148 |                                 }
149 |                             )
150 | 
151 |         logging.info(f"Total time taken for the account {self.account_id} is {end-start} seconds")
152 |         self.write_run_report()
153 | 
154 |         if utils.config_info.bucket_name:
155 |             self.push_files_to_s3()
156 | 
157 |     def write_run_report(self):
158 |         run_report_keys = self.run_report[0].keys()
159 |         if self.create_or_truncate_file: #Same behaviour as the findings output file. If a new findings file is created or it is truncated, then create or truncate the run_report too.
160 |             file_open_mode = 'w'
161 |         else:
162 |             file_open_mode = 'a+'
163 |         with open(self.run_report_file_full_path, file_open_mode, newline='') as output_file:
164 |             dict_writer = csv.DictWriter(output_file, run_report_keys)
165 |             if self.create_or_truncate_file:
166 |                 dict_writer.writeheader()
167 |             dict_writer.writerows(self.run_report)
168 | 
169 |     def push_files_to_s3(self):
170 |         session = utils.get_aws_session(session_name = 'UploadFilesToS3')
171 |         s3 = session.client("s3")
172 |         try:
173 |             response = s3.upload_file(self.output_file_full_path, utils.config_info.bucket_name, utils.config_info.output_folder_name+self.output_file_name)
174 |             logging.info(f"Uploaded output file {utils.config_info.output_folder_name+self.output_file_name} to bucket {utils.config_info.bucket_name}")
175 |  
176 |             response = s3.upload_file(self.run_report_file_full_path, utils.config_info.bucket_name, utils.config_info.output_folder_name+self.run_report_file_name)
177 |             logging.info(f"Uploaded run report file {utils.config_info.output_folder_name+self.output_file_name} to bucket {utils.config_info.bucket_name}")
178 | 
179 |         except botocore.exceptions.ClientError as error:
180 |             logging.error(error)
181 | 
182 |     def get_account_level_information(self):
183 |         session = utils.get_aws_session(session_name = 'InitialAccountInfoGathering')
184 |         org = session.client("organizations")
185 |         try:
186 |             acct_info = org.describe_account(AccountId = self.account_id)
187 |             self.account_name = acct_info["Account"]["Name"]
188 |         except botocore.exceptions.ClientError as error:
189 |             if error.response['Error']['Code'] == 'AWSOrganizationsNotInUseException':
190 |                 logging.info(f"Account {self.account_id} is not part of an AWS Organization")
191 |                 self.account_name = ''
192 |                 self.payer_account_id = 'N/A'
193 |                 self.payer_account_name = 'N/A'
194 |                 return
195 |             else:
196 |                 raise error
197 | 
198 |         org_info = org.describe_organization()
199 |         self.payer_account_id = org_info["Organization"]["MasterAccountId"]
200 | 
201 |         payer_account_info = org.describe_account(AccountId = self.payer_account_id)
202 |         self.payer_account_name = payer_account_info["Account"]["Name"]
203 | 
204 | if __name__ == "__main__":
205 |     #Create an instance of the Account level analyser and trigger the get_findings function.
206 |     ara = AccountAnalyser()
207 |     ara.get_findings()
208 | 


--------------------------------------------------------------------------------
/src/requirements.txt:
--------------------------------------------------------------------------------
1 | boto3==1.26.13
2 | botocore==1.29.13
3 | 


--------------------------------------------------------------------------------
/src/service_analyser.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | from abc import ABCMeta, abstractmethod
  5 | import utils
  6 | from collections import namedtuple
  7 | import botocore
  8 | import time
  9 | import logging
 10 | import datetime
 11 | import csv
 12 | import json
 13 | 
 14 | class ServiceAnalyser(metaclass = ABCMeta):
 15 | 
 16 |     def __init__ (self, account_analyser, region, service):
 17 |         self.service = service
 18 |         self.region = region
 19 |         self.account_analyser = account_analyser
 20 |         self.findings = []
 21 |         self.session = None
 22 | 
 23 |     def get_aws_session(self):
 24 |         if self.session:
 25 |             return self.session
 26 |         else:
 27 |             return utils.get_aws_session(session_name = f"{self.service}_{self.region}_FaultToleranceAnalyser")
 28 | 
 29 |     @utils.log_func
 30 |     def get_and_write_findings(self):
 31 |         
 32 |         with self.account_analyser.thread_limiter:
 33 |             start = datetime.datetime.now().astimezone()
 34 |             
 35 |             try:
 36 |                 self.get_findings()
 37 |                 self.write_findings()
 38 |                 end = datetime.datetime.now().astimezone()
 39 |                 logging.info(f"Completed processing {self.service}+{self.region} in {round((end-start).total_seconds(), 2)} seconds.")
 40 |                 self.account_analyser.run_report.append(
 41 |                                                             {   
 42 |                                                             'account_id' : self.account_analyser.account_id,
 43 |                                                             'region'  : self.region,
 44 |                                                             'service' : self.service,
 45 |                                                             'result'  :'Success',
 46 |                                                             'error_message' :'',
 47 |                                                             'start_time' : start.strftime("%Y_%m_%d_%H_%M_%S%z"),
 48 |                                                             'end_time' : end.strftime("%Y_%m_%d_%H_%M_%S%z"),
 49 |                                                             'runtime_in_seconds' : round((end-start).total_seconds(), 2)
 50 |                                                             }
 51 |                                                         )
 52 |             except botocore.exceptions.BotoCoreError as error:
 53 |                 end = datetime.datetime.now().astimezone()
 54 |                 self.account_analyser.run_report.append(
 55 |                                                             {   
 56 |                                                             'account_id' : self.account_analyser.account_id,
 57 |                                                             'region'  : self.region,
 58 |                                                             'service' : self.service,
 59 |                                                             'result'  :'Failure', 
 60 |                                                             'error_message' : str(error), 
 61 |                                                             'start_time' : start.strftime("%Y_%m_%d_%H_%M_%S%z"),
 62 |                                                             'end_time' : end.strftime("%Y_%m_%d_%H_%M_%S%z"),
 63 |                                                             'runtime_in_seconds' : round((end-start).total_seconds(), 2)
 64 |                                                             }
 65 |                                                         )
 66 |                 raise error
 67 |             
 68 | 
 69 |     @abstractmethod
 70 |     def get_findings(self, region):
 71 |         pass
 72 | 
 73 |     def get_finding_rec_with_common_fields(self):
 74 |         finding_rec = {}
 75 |         finding_rec["account_id"] = self.account_analyser.account_id
 76 |         finding_rec["account_name"] = self.account_analyser.account_name
 77 |         finding_rec["payer_account_id"] = self.account_analyser.payer_account_id
 78 |         finding_rec["payer_account_name"] = self.account_analyser.payer_account_name
 79 |         finding_rec['service'] = self.service
 80 |         finding_rec['region'] = self.region
 81 | 
 82 |         curr_time = datetime.datetime.now().astimezone()
 83 |         finding_rec['timestamp'] = curr_time.strftime("%Y_%m_%d_%H_%M_%S%z")
 84 | 
 85 |         return finding_rec
 86 | 
 87 |     def write_findings(self):
 88 |         self.write_findings_to_file()
 89 |         #If an event bus is provided publish any issues to event bridge
 90 |         if (utils.config_info.event_bus_arn):
 91 |             self.publish_findings_to_event_bridge()
 92 | 
 93 |     #This function will be called by the threads to write to the output file. So it must use a lock before opening and writing to the file.
 94 |     def write_findings_to_file(self):
 95 | 
 96 |         #Log findings
 97 |         for finding_rec in self.findings:
 98 |             if finding_rec['potential_issue']:
 99 |                 logging.error(finding_rec['message'])
100 |             else:
101 |                 logging.info(finding_rec['message'])
102 | 
103 |         #Write findings to output file
104 |         if len(self.findings) > 0:
105 |             keys = self.findings[0].keys()
106 |             if self.account_analyser.lock.acquire():
107 |                 with open(self.account_analyser.output_file_full_path, 'a', newline='') as output_file:
108 |                     dict_writer = csv.DictWriter(output_file, self.account_analyser.keys)
109 |                     if utils.config_info.report_only_issues: #If the "report-only-issues" flag is set, go through each finding and write out only those that are identified as a potential issue
110 |                         for finding_rec in self.findings:
111 |                             if finding_rec['potential_issue']:
112 |                                 dict_writer.writerow(finding_rec)
113 |                     else: #If the "report-only-issues" flag is not set, then Write all findings
114 |                         dict_writer.writerows(self.findings)
115 |                 self.account_analyser.lock.release()
116 | 
117 |     def publish_findings_to_event_bridge(self):
118 |         session = self.get_aws_session()
119 | 
120 |         #Get the event bus region name from the event bus ARN. That region has to be used as cross region API calls are not permitted.
121 |         event_bus_region = (utils.parse_arn(utils.config_info.event_bus_arn))['region']
122 | 
123 |         events = session.client("events", region_name = event_bus_region)
124 | 
125 |         entries = []
126 | 
127 |         total_entries_count = 0
128 | 
129 |         for finding_rec in self.findings:
130 |             if (not utils.config_info.report_only_issues) or (utils.config_info.report_only_issues and finding_rec['potential_issue']):
131 |                 entries.append(
132 |                     {
133 |                         'Time': datetime.datetime.now().astimezone(),
134 |                         'Source': 'FaultToleranceAnalyser',
135 |                         'DetailType': 'FaultToleranceIssue',
136 |                         'Detail': json.dumps(finding_rec),
137 |                         'EventBusName' : utils.config_info.event_bus_arn
138 |                     }
139 |                 )
140 |                 total_entries_count = total_entries_count+1
141 |                 if len(entries) == 10: #Call put-events in batches of 10 each because the API does not accept more than that many events in 1 call.
142 |                     response = events.put_events(Entries = entries)
143 |                     events.clear()
144 | 
145 |         if len(entries) > 0:
146 |             response = events.put_events(Entries = entries)
147 |         
148 |         logging.info(f"Published {total_entries_count} finding(s) for {self.service} in {self.region} to Eventbridge")
149 | 


--------------------------------------------------------------------------------
/src/service_specific_analysers/cloudhsm_analyser.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import boto3
 5 | import logging
 6 | import utils
 7 | from service_analyser import ServiceAnalyser
 8 | 
 9 | class CloudHSMAnalyser(ServiceAnalyser):
10 | 
11 |     def __init__(self, account_analyser, region):
12 |         super().__init__(account_analyser, region, 'efs')
13 | 
14 |     def get_findings(self):
15 |         session = self.get_aws_session()
16 |         efs = session.client("cloudhsmv2", region_name=self.region)
17 | 
18 |         for cluster in utils.invoke_aws_api_full_list(efs.describe_clusters, "Clusters"):
19 | 
20 |             finding_rec = self.get_finding_rec_from_response(cluster)
21 | 
22 |             if len(cluster["Hsms"]) == 0:
23 |                 finding_rec['potential_issue'] = False
24 |                 finding_rec['message'] = f"CloudHSM: Cloud HSM cluster {cluster['ClusterId']} has only no hsms."
25 |             elif len(cluster["Hsms"]) == 1:
26 |                 finding_rec['potential_issue'] = True
27 |                 finding_rec['message'] = f"CloudHSM: Cloud HSM cluster {cluster['ClusterId']} has only 1 hsm in a single AZ {cluster['Hsms'][0]['AvailabilityZone']}."
28 |             elif len(cluster["Hsms"]) > 1:
29 |                 azs = set()
30 |                 for hsm in cluster['Hsms']:
31 |                     azs.add(hsm['AvailabilityZone'])
32 |                 if len(azs) == 1:
33 |                     finding_rec['potential_issue'] = True
34 |                     finding_rec['message'] = f"CloudHSM: Cloud HSM cluster {cluster['ClusterId']} has {len(cluster['Hsms'])} hsms but they are all in the AZ {azs.pop()}"
35 |                 else: #len(azs) > 1
36 |                     finding_rec['potential_issue'] = False
37 |                     finding_rec['message'] = f"CloudHSM: Cloud HSM cluster {cluster['ClusterId']} has {len(cluster['Hsms'])} hsms and they are spread across multiple AZs: {list(azs)}"
38 |             self.findings.append(finding_rec)
39 | 
40 |     #Contains the logic to extract relevant fields from the API response to the output csv file.
41 |     def get_finding_rec_from_response(self, cluster):
42 | 
43 |         finding_rec = self.get_finding_rec_with_common_fields()
44 |         finding_rec['resource_id'] = cluster['ClusterId']
45 |         finding_rec['resource_name'] = cluster['ClusterId']
46 |         finding_rec['resource_arn'] = f"arn:aws:cloudhsm:{self.region}:{self.account_analyser.account_id}:cluster/{cluster['ClusterId']}"
47 |         return finding_rec 
48 | 


--------------------------------------------------------------------------------
/src/service_specific_analysers/dax_analyser.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import boto3
 5 | import logging
 6 | import utils
 7 | from service_analyser import ServiceAnalyser
 8 | 
 9 | class DAXAnalyser(ServiceAnalyser):
10 | 
11 |     def __init__(self, account_analyser, region):
12 |         super().__init__(account_analyser, region, 'dax')
13 | 
14 |     def get_findings(self):
15 |         session = self.get_aws_session()
16 |         dax = session.client("dax", region_name=self.region)
17 | 
18 |         for cluster in utils.invoke_aws_api_full_list(dax.describe_clusters, "Clusters"):
19 |             finding_rec = self.get_finding_rec_from_response(cluster)
20 |             azs = {node['AvailabilityZone'] for node in cluster["Nodes"]}
21 | 
22 |             if len(azs) > 1:
23 |                 finding_rec['potential_issue'] = False
24 |                 finding_rec['message'] = f"Nodes in DAX Cluster {cluster['ClusterName']} are spread across more than 1 AZ {azs}"
25 |             else:
26 |                 finding_rec['potential_issue'] = True
27 |                 finding_rec['message'] = f"All nodes in the DAX cluster  {cluster['ClusterName']} are in a single AZ {azs}"
28 |             self.findings.append(finding_rec)
29 | 
30 |     #Contains the logic to extract relevant fields from the API response to the output csv file.
31 |     def get_finding_rec_from_response(self, cluster):
32 | 
33 |         finding_rec = self.get_finding_rec_with_common_fields()
34 |         finding_rec['resource_id'] = ''
35 |         finding_rec['resource_name'] = cluster['ClusterName']
36 |         finding_rec['resource_arn'] = cluster['ClusterArn']
37 |         return finding_rec 
38 | 


--------------------------------------------------------------------------------
/src/service_specific_analysers/dms_analyser.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import boto3
 5 | import logging
 6 | import utils
 7 | from service_analyser import ServiceAnalyser
 8 | 
 9 | class DMSAnalyser(ServiceAnalyser):
10 | 
11 |     def __init__(self, account_analyser, region):
12 |         self.dms_instances = {}
13 |         super().__init__(account_analyser, region, 'dms')
14 | 
15 |     def get_findings(self):
16 | 
17 |         session = self.get_aws_session()
18 | 
19 |         dms = session.client("dms", region_name=self.region)
20 | 
21 |         #Go through the instances, and gather findings.
22 |         for repl_inst in utils.invoke_aws_api_full_list(dms.describe_replication_instances, "ReplicationInstances"):
23 |             self.dms_instances[repl_inst["ReplicationInstanceArn"]] = {
24 |                                         "MultiAZ":repl_inst["MultiAZ"],
25 |                                         "ReplicationInstanceIdentifier":repl_inst["ReplicationInstanceIdentifier"],
26 |                                         "AZs": [
27 |                                                 repl_inst["AvailabilityZone"],
28 |                                                 repl_inst["SecondaryAvailabilityZone"] if "SecondaryAvailabilityZone" in repl_inst else None
29 |                                                 ]
30 |             }
31 |             finding_rec = self.get_finding_rec_from_inst_response(repl_inst)
32 | 
33 |             if repl_inst["MultiAZ"]:
34 |                 finding_rec['potential_issue'] = False
35 |                 finding_rec['message'] = f"DMS Replication Instance: {repl_inst['ReplicationInstanceIdentifier']} with ARN {repl_inst['ReplicationInstanceArn']} in an instance with multiple AZs"
36 |             else:
37 |                 finding_rec['potential_issue'] = True
38 |                 finding_rec['message'] = f"DMS Replication Instance: {repl_inst['ReplicationInstanceIdentifier']} with ARN {repl_inst['ReplicationInstanceArn']} is on an instance in a single AZ"
39 |             self.findings.append(finding_rec)
40 | 
41 |         #Go through the tasks and gather findings.
42 |         for repl_task in utils.invoke_aws_api_full_list(dms.describe_replication_tasks, "ReplicationTasks"):
43 | 
44 |             finding_rec = self.get_finding_rec_from_task_response(repl_task)
45 | 
46 |             dms_instance_arn = repl_task["ReplicationInstanceArn"]
47 |             if self.dms_instances[dms_instance_arn]["MultiAZ"]:
48 |                 finding_rec['potential_issue'] = False
49 |                 finding_rec['message'] = f"DMS Replication Task: {repl_task['ReplicationTaskIdentifier']} with ARN {repl_task['ReplicationTaskArn']} in on the replication instance {self.dms_instances[dms_instance_arn]['ReplicationInstanceIdentifier']} which is configured with multiple AZs: {self.dms_instances[dms_instance_arn]['AZs']}"
50 |             else:
51 |                 finding_rec['potential_issue'] = True
52 |                 finding_rec['message'] = f"DMS Replication Task: {repl_task['ReplicationTaskIdentifier']} with ARN {repl_task['ReplicationTaskArn']} is on the replication instance  {self.dms_instances[dms_instance_arn]['ReplicationInstanceIdentifier']} which is configured only in a single AZ {self.dms_instances[dms_instance_arn]['AZs'][0]}."
53 |             
54 |             self.findings.append(finding_rec)
55 | 
56 |     def get_finding_rec_from_inst_response(self, repl_inst):
57 |         finding_rec = self.get_finding_rec_with_common_fields()
58 |         finding_rec['resource_id'] = repl_inst['ReplicationInstanceIdentifier']
59 |         finding_rec['resource_name'] = ''
60 |         finding_rec['resource_arn'] = repl_inst['ReplicationInstanceArn']
61 | 
62 |         return finding_rec 
63 | 
64 |     def get_finding_rec_from_task_response(self, repl_task):
65 |         finding_rec = self.get_finding_rec_with_common_fields()
66 |         finding_rec['resource_id'] = repl_task['ReplicationTaskIdentifier']
67 |         finding_rec['resource_name'] = ''
68 |         finding_rec['resource_arn'] = repl_task['ReplicationTaskArn']
69 | 
70 |         return finding_rec 
71 | 


--------------------------------------------------------------------------------
/src/service_specific_analysers/docdb_analyser.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import boto3
 5 | import logging
 6 | import utils
 7 | from service_analyser import ServiceAnalyser
 8 | 
 9 | class DocDBAnalyser(ServiceAnalyser):
10 | 
11 |     def __init__(self, account_analyser, region):
12 |         super().__init__(account_analyser, region, 'docdb')
13 | 
14 |     def get_findings(self):
15 |         session = self.get_aws_session()
16 |         docdb = session.client("docdb", region_name=self.region)
17 | 
18 |         for db_cluster in utils.invoke_aws_api_full_list(docdb.describe_db_clusters, "DBClusters"):
19 |             if db_cluster["Engine"] == "docdb": #Neptune clusters could also be listed. Hence we need to look only for docdb
20 |                 finding_rec = self.get_finding_rec_from_response(db_cluster)
21 |                 if db_cluster["MultiAZ"]:
22 |                     finding_rec['potential_issue'] = False
23 |                     finding_rec['message'] = f"DocDB Cluster: {db_cluster['DBClusterIdentifier']} is in multiple AZs"
24 |                 else:
25 |                     finding_rec['potential_issue'] = True
26 |                     finding_rec['message'] = f"DocDB Cluster: {db_cluster['DBClusterIdentifier']} is in a single AZ"
27 |                 self.findings.append(finding_rec)
28 | 
29 |     #Contains the logic to extract relevant fields from the API response to the output csv file.
30 |     def get_finding_rec_from_response(self, db_cluster):
31 | 
32 |         finding_rec = self.get_finding_rec_with_common_fields()
33 |         finding_rec['resource_id'] = db_cluster['DbClusterResourceId']
34 |         finding_rec['resource_name'] = db_cluster['DBClusterIdentifier']
35 |         finding_rec['resource_arn'] = db_cluster['DBClusterArn']
36 |         return finding_rec 
37 | 


--------------------------------------------------------------------------------
/src/service_specific_analysers/dx_analyser.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import boto3
 5 | import logging
 6 | import utils
 7 | from service_analyser import ServiceAnalyser
 8 | 
 9 | #Checks the following three.
10 | #1. Direct Connect Connection Redundancy - https://docs.aws.amazon.com/awssupport/latest/user/fault-tolerance-checks.html#aws-direct-connect-connection-redundancy
11 | #2. Direct Connect Location Redundancy - https://docs.aws.amazon.com/awssupport/latest/user/fault-tolerance-checks.html#aws-direct-connect-location-redundancy
12 | #3. Direct Connect Virtual Interface Redundancy - https://docs.aws.amazon.com/awssupport/latest/user/fault-tolerance-checks.html#aws-direct-connect-virtual-interface-redundancy
13 | 
14 | class DXAnalyser(ServiceAnalyser):
15 | 
16 |     def __init__(self, account_analyser, region):
17 |         super().__init__(account_analyser, region, 'directconnect')
18 | 
19 |     def get_findings(self):
20 |         self.session = self.get_aws_session()
21 |         self.dx = self.session.client("directconnect", region_name=self.region)
22 |         self.get_conn_location_findings()
23 |         self.get_vif_findings()
24 | 
25 |     def get_conn_location_findings(self):
26 |         no_of_connections = 0
27 |         locations = set()
28 | 
29 |         finding_rec = self.get_dx_output()
30 | 
31 |         for conn in utils.invoke_aws_api_full_list(self.dx.describe_connections, "connections"):
32 |             no_of_connections = no_of_connections + 1
33 |             locations.add(conn["location"])
34 |         
35 |         if no_of_connections == 0: #No DX connection. Hence no issue
36 |             finding_rec['potential_issue'] = False
37 |             finding_rec['message'] = f"Direct Connect: No connections in region {self.region}. Hence nothing to check"
38 |         elif no_of_connections == 1:
39 |             finding_rec['potential_issue'] = True
40 |             finding_rec['message'] = f"Direct Connect: There is only one DX connection in region {self.region}"
41 |         else: #no_of_connections > 0 #Connection Redundancy is met.
42 |             logging.info(f"Direct Connect: More than one DX connection found in region {self.region}. Now on to checking locations")
43 |             if len(locations) == 1: #All connections use the same location
44 |                 finding_rec['potential_issue'] = True
45 |                 finding_rec['message'] = f"Direct Connect: There is only one location {next(iter(locations))} used by all the DX connections in region {self.region}"
46 |             else: #Connection Redundancy and Location Redundancy is also met
47 |                 finding_rec['potential_issue'] = False
48 |                 finding_rec['message'] = f"Direct Connect:  There are more than 1 DX connetions, using more than one location in region {self.region}"
49 | 
50 |         self.findings.append(finding_rec)
51 | 
52 |     #check VIF redundancy - https://docs.aws.amazon.com/awssupport/latest/user/fault-tolerance-checks.html#aws-direct-connect-virtual-interface-redundancy
53 |     def get_vif_findings(self):
54 |         
55 |         vifs = {}
56 |         vgws = {}
57 | 
58 |         #collect all the VIFs
59 |         for vif in utils.invoke_aws_api_full_list(self.dx.describe_virtual_interfaces, "virtualInterfaces"):
60 |             vifs[vif['virtualInterfaceId']] = {'virtualGatewayId': vif ['virtualGatewayId'], 'connectionId' : vif['connectionId']}
61 |             if vif ['virtualGatewayId'] in vgws:
62 |                 vgws[vif['virtualGatewayId']]['vifs'].append(vif['virtualInterfaceId'])
63 |                 vgws[vif['virtualGatewayId']]['connections'].append(vif['connectionId'])
64 |             else:
65 |                 vgws[vif['virtualGatewayId']]= {'vifs' : [vif['virtualInterfaceId']], 'connections' : [vif['connectionId']]}
66 |         
67 |         for vgw_id in vgws:
68 |             finding_rec = self.get_vgw_output(vgw_id)
69 |             if len(vgws[vgw_id]['vifs']) < 2:
70 |                 finding_rec['potential_issue'] = True
71 |                 finding_rec['message'] = f"Direct Connect: There is only one VIF {vgws[vgw_id]['vifs']} for the virtual gateway {vgw_id}."
72 |             elif len(vgws[vgw_id]['connections']) < 2:
73 |                 finding_rec['potential_issue'] = True
74 |                 finding_rec['message'] = f"Direct Connect: Though there are more than 1 VIFs for the virtual gateway {vgw_id}, all the VIFs are on the same DX Connection {vgws[vgw_id]['connections']}."
75 |             else:
76 |                 finding_rec['potential_issue'] = False
77 |                 finding_rec['message'] = f"Direct Connect: There are more than 1 VIFs for the virtual gateway {vgw_id}, and the VIFs are on more than one DX connection."
78 | 
79 | #Contains the logic to extract relevant fields from the API response to the output csv file.
80 |     def get_dx_output(self):
81 | 
82 |         finding_rec = self.get_finding_rec_with_common_fields()
83 |         finding_rec['resource_id'] = 'N/A'
84 |         finding_rec['resource_name'] = 'N/A'
85 |         finding_rec['resource_arn'] = 'N/A'
86 |         return finding_rec 
87 | 
88 |     def get_vgw_output(self, vgw_id):
89 | 
90 |         finding_rec = self.get_finding_rec_with_common_fields()
91 |         finding_rec['resource_id'] = vgw_id
92 |         finding_rec['resource_name'] = 'N/A'
93 |         finding_rec['resource_arn'] = 'N/A'
94 |         return finding_rec 
95 | 


--------------------------------------------------------------------------------
/src/service_specific_analysers/efs_analyser.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import boto3
 5 | import logging
 6 | import utils
 7 | from service_analyser import ServiceAnalyser
 8 | 
 9 | class EFSAnalyser(ServiceAnalyser):
10 | 
11 |     def __init__(self, account_analyser, region):
12 |         super().__init__(account_analyser, region, 'efs')
13 | 
14 |     def get_findings(self):
15 |         session = self.get_aws_session()
16 |         efs = session.client("efs", region_name=self.region)
17 | 
18 |         for fs in utils.invoke_aws_api_full_list(efs.describe_file_systems, "FileSystems"):
19 |             finding_rec = self.get_finding_rec_from_response(fs)
20 |             if "AvailabilityZoneId" in fs: #Single AZ File system
21 |                 finding_rec['potential_issue'] = True
22 |                 finding_rec['message'] = f"EFS: File system {fs['FileSystemId']} with ARN {fs['FileSystemArn'] } is a single AZ file system."
23 |             elif fs["NumberOfMountTargets"] <= 1: #Multi AZ file system but mount target only in a single AZ
24 |                 finding_rec['potential_issue'] = True
25 |                 finding_rec['message'] = f"EFS: File system {fs['FileSystemId']} with ARN {fs['FileSystemArn'] } is a multi AZ enabled file system but with only one mount target."
26 |             else:
27 |                 finding_rec['potential_issue'] = False
28 |                 finding_rec['message'] = f"EFS: File system {fs['FileSystemId']} with ARN {fs['FileSystemArn'] } is a multi AZ enabled file system with more than one mount target"
29 |             self.findings.append(finding_rec)
30 | 
31 |     #Contains the logic to extract relevant fields from the API response to the output csv file.
32 |     def get_finding_rec_from_response(self, fs):
33 |         finding_rec = self.get_finding_rec_with_common_fields()
34 |         finding_rec['resource_id'] = fs['FileSystemId']
35 |         finding_rec['resource_name'] = ''
36 |         for tag in fs['Tags']:
37 |             if tag['Key'] == 'Name':
38 |                 finding_rec['resource_name'] = tag['Value']
39 |         finding_rec['resource_arn'] = fs['FileSystemArn']
40 | 
41 |         return finding_rec 
42 | 
43 | 


--------------------------------------------------------------------------------
/src/service_specific_analysers/elasticache_analyser.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | import boto3
  5 | import logging
  6 | import utils
  7 | from service_analyser import ServiceAnalyser
  8 | 
  9 | class ElasticacheAnalyser(ServiceAnalyser):
 10 | 
 11 |     def __init__(self, account_analyser, region):
 12 |         super().__init__(account_analyser, region, 'elasticache')
 13 | 
 14 |     def get_findings(self):
 15 |         self.session = self.get_aws_session()
 16 |         self.elasticache = self.session.client("elasticache", region_name=self.region)
 17 |         self.get_memcache_single_node_redis_findings()
 18 |         self.get_redis_replication_group_findings()
 19 | 
 20 |     def get_memcache_single_node_redis_findings(self):
 21 | 
 22 |         #Get memcached and single node Redis clusters        
 23 |         for cluster in utils.invoke_aws_api_full_list(self.elasticache.describe_cache_clusters, "CacheClusters", ShowCacheClustersNotInReplicationGroups = True):
 24 |             finding_rec = self.get_output_from_memcache_single_node_redis_response(cluster)
 25 |             finding_rec['potential_issue'] = True
 26 |             if cluster['Engine'] == 'redis': #Single node redis cluster
 27 |                 finding_rec['message'] = f"Elasticache-Redis cluster: {cluster['CacheClusterId']} is a single Node Elasticache-Redis cluster"
 28 |             else: #Memcached cluster
 29 |                 finding_rec['message'] = f"Elasticache-Memcached cluster: {cluster['CacheClusterId']} is a single AZ issue even if there are multiple nodes in multiple AZs as the data is not replicated between nodes."
 30 |             self.findings.append(finding_rec)
 31 | 
 32 |     def get_output_from_memcache_single_node_redis_response(self, cluster):
 33 | 
 34 |         finding_rec = self.get_finding_rec_with_common_fields()
 35 |         finding_rec['resource_id'] = cluster['CacheClusterId']
 36 |         finding_rec['resource_name'] = cluster['CacheClusterId']
 37 |         finding_rec['resource_arn'] = cluster['ARN']
 38 |         finding_rec['engine'] = cluster['Engine']
 39 |         return finding_rec 
 40 | 
 41 |     def get_redis_replication_group_findings(self):
 42 |         #Get Redis replication group clusters
 43 |         
 44 |         for repl_group in utils.invoke_aws_api_full_list(self.elasticache.describe_replication_groups, "ReplicationGroups"):
 45 |             finding_rec = self.get_output_from_redis_replication_group_response(repl_group)
 46 |             if len(repl_group["NodeGroups"]) == 0 : #Cluster Mode disabled. And no node groups or shards. So the data is not replicated across nodes and so this is not single AZ failure resilient
 47 |                 finding_rec['potential_issue'] = True
 48 |                 finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode disabled and no node groups configured"
 49 |             elif len(repl_group["NodeGroups"]) == 1 : #Cluster Mode disabled. One node group/shard
 50 |                 if repl_group["AutomaticFailover"] == "disabled":
 51 |                     finding_rec['potential_issue'] = True
 52 |                     finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode disabled, 1 Node group configured but Auto Failover is disabled"
 53 |                 elif repl_group["MultiAZ"] == "disabled": #Auto failover enabled, but multi AZ disabled
 54 |                     node_group = repl_group["NodeGroups"][0]
 55 |                     azs = set()
 56 |                     for node in node_group["NodeGroupMembers"]:
 57 |                         azs.add(node["PreferredAvailabilityZone"])
 58 |                     if len(azs) == 1: #All nodes belong to the same AZ
 59 |                         finding_rec['potential_issue'] = True
 60 |                         finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode disabled and Auto Failover is enabled, but all nodes are in the same AZ {azs}"
 61 |                     else:
 62 |                         finding_rec['potential_issue'] = False
 63 |                         finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode disabled, and Auto Failover is enabled. but the nodes are not in multiple AZs {azs}"
 64 |                 else: # Auto failover enabled and multi AZ enabled. So this is ok.
 65 |                     finding_rec['potential_issue'] = False
 66 |                     finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode disabled, but Auto Failover and Multi AZ enabled"
 67 |             # At this point len(repl_group["NodeGroups"]) > 1 which implies cluster mode is enabled.
 68 |             # This means that Automatic failover is enabled by force.
 69 |             # The customer does not have an option to disable it. So that need not be checked.
 70 |             # Just make sure all nodes of a given shard are not in the same AZ and that each shard has a replication node.
 71 |             elif repl_group["MultiAZ"] == "disabled":
 72 |                 #Check to see if any replicas are missing in any node groups, or if any node groups have all the nodes in the same AZ.
 73 |                 node_groups = repl_group["NodeGroups"]
 74 |                 issue_found = False
 75 |                 for node_group in node_groups:
 76 |                     if len(node_group["NodeGroupMembers"]) == 1:
 77 |                         finding_rec['potential_issue'] = True
 78 |                         finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode enabled, but no replicas in shard {node_group['NodeGroupId']}"
 79 |                         issue_found = True
 80 |                         break
 81 |                     else:
 82 |                         azs = set()
 83 |                         for node in node_group["NodeGroupMembers"]:
 84 |                             azs.add(node["PreferredAvailabilityZone"])
 85 |                         if len(azs) == 1: #All nodes belong to the same AZ
 86 |                             finding_rec['potential_issue'] = True
 87 |                             finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode enabled, but all nodes in shard {node_group['NodeGroupId']} are in the same AZ {azs}"
 88 |                             issue_found = True
 89 |                             break
 90 |                 if not issue_found: #All Node groups have been ok
 91 |                     finding_rec['potential_issue'] = False
 92 |                     finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode enabled, all nodegroups have replicas and none of those node groups have all the nodes in the same AZ."
 93 |             else:
 94 |                 finding_rec['potential_issue'] = False
 95 |                 finding_rec['message'] = f"Elasticache-Redis Replication Group: {repl_group['ReplicationGroupId']}: Cluster Mode enabled, and Multi AZ is enabled."
 96 |             self.findings.append(finding_rec)
 97 | 
 98 |     def get_output_from_redis_replication_group_response(self, repl_group):
 99 | 
100 |         finding_rec = self.get_finding_rec_with_common_fields()
101 |         finding_rec['resource_id'] = repl_group['ReplicationGroupId']
102 |         finding_rec['resource_name'] = repl_group['ReplicationGroupId']
103 |         finding_rec['resource_arn'] = repl_group['ARN']
104 |         finding_rec['engine'] = 'Redis' #This is the only possibility for replicationg groups.
105 |         return finding_rec 
106 | 


--------------------------------------------------------------------------------
/src/service_specific_analysers/fsx_analyser.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import boto3
 5 | import logging
 6 | import utils
 7 | from service_analyser import ServiceAnalyser
 8 | 
 9 | class FSXAnalyser(ServiceAnalyser):
10 | 
11 |     def __init__(self, account_analyser, region):
12 |         super().__init__(account_analyser, region, 'fsx')
13 | 
14 |     def get_findings(self):
15 | 
16 |         session = self.get_aws_session()
17 |         fsx = session.client("fsx", region_name=self.region)
18 | 
19 |         for fs in utils.invoke_aws_api_full_list(fsx.describe_file_systems, "FileSystems"):
20 |             if fs['FileSystemType'] == "WINDOWS": #We look only at Windows File systems
21 |                 finding_rec = self.get_finding_rec_from_response(fs)
22 |                 if len(fs["SubnetIds"]) == 1:
23 |                     finding_rec['potential_issue'] = True
24 |                     finding_rec['message'] = f"FSX: Windows File system {fs['FileSystemId']} with ARN {fs['ResourceARN'] } is a single AZ file system. Please check."
25 |                 else:
26 |                     finding_rec['potential_issue'] = False
27 |                     finding_rec['message'] = f"FSX: Windows File system {fs['FileSystemId']} with ARN {fs['ResourceARN'] } is a multi AZ file system"
28 |                 self.findings.append(finding_rec)
29 | 
30 |     #Contains the logic to extract relevant fields from the API response to the output csv file.
31 |     def get_finding_rec_from_response(self, fs):
32 |         finding_rec = self.get_finding_rec_with_common_fields()
33 |         finding_rec['resource_id'] = fs['FileSystemId']
34 |         finding_rec['resource_name'] = ''
35 |         for tag in fs['Tags']:
36 |             if tag['Key'] == 'Name':
37 |                 finding_rec['resource_name'] = tag['Value']
38 |         finding_rec['resource_arn'] = fs['ResourceARN']
39 | 
40 |         return finding_rec 
41 | 


--------------------------------------------------------------------------------
/src/service_specific_analysers/globalaccelerator_analyser.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | import boto3
  5 | import logging
  6 | import utils
  7 | from service_analyser import ServiceAnalyser
  8 | 
  9 | class GlobalAcceleratorAnalyser(ServiceAnalyser):
 10 | 
 11 |     def __init__(self, account_analyser, region):
 12 |         super().__init__(account_analyser, region, 'globalaccelerator')
 13 | 
 14 |     def get_findings(self):
 15 | 
 16 |         if self.region == "us-west-2":
 17 |             self.session = self.get_aws_session()
 18 |             self.aga = self.session.client("globalaccelerator", region_name=self.region)
 19 |             self.get_standard_accelerator_findings()
 20 |         else:
 21 |             logging.info(f"The service Global Accelerator operates only in us-west-2. Hence doing nothing for {self.region}")
 22 |             return #Nothing to do since Global Accelerator operates only in us-west-2
 23 | 
 24 |     def get_standard_accelerator_findings(self):
 25 |         for accelerator in utils.invoke_aws_api_full_list(self.aga.list_accelerators, "Accelerators", ):
 26 |             self.validate_standard_accelerator(accelerator)
 27 | 
 28 |     def validate_standard_accelerator(self, accelerator):
 29 |         finding_rec = self.get_finding_rec_from_response(accelerator)
 30 | 
 31 |         ec2_instance_ids = []
 32 |         target_regions = set()
 33 |         for listener in utils.invoke_aws_api_full_list(self.aga.list_listeners,
 34 |                                                 "Listeners",
 35 |                                                 AcceleratorArn = accelerator["AcceleratorArn"]):
 36 |             for endpoint_group in utils.invoke_aws_api_full_list(self.aga.list_endpoint_groups,
 37 |                                                 "EndpointGroups",
 38 |                                                 ListenerArn = listener["ListenerArn"]):
 39 |                 target_regions.add(endpoint_group["EndpointGroupRegion"])
 40 |                 if len(target_regions) > 1:
 41 |                     #If multiple regions are available then they are Multi-AZ. No need to proceed further
 42 |                     finding_rec['potential_issue'] = False
 43 |                     finding_rec['message'] = f"Global Accelerator: {accelerator['Name']} has target endpoints are in multiple regions"
 44 |                     fself.findings.append(finding_rec)
 45 |                 for endpoint in endpoint_group["EndpointDescriptions"]:
 46 |                     if not endpoint["EndpointId"].startswith("i-"): #Not EC2 instance
 47 |                         logging.info(f"Global Accelerator {accelerator['Name']} has endpoints that are not EC2 instances. Hence ignored.")
 48 |                         return
 49 |                     else:
 50 |                         ec2_instance_ids.append(endpoint["EndpointId"])
 51 | 
 52 |         #We have now collected all EC2 instances from all listeners and endpoint groups. Check the Availability zone of these EC2 instances now.
 53 |         #So get all AZs to which these EC2 instances belong        
 54 |         azs = self.get_azs_of_ec2_instances(ec2_instance_ids, next(iter(target_regions))) #We can use next(iter(target_regions) as we are sure there will be only one region. If there are more than one, we would not have come this far.
 55 | 
 56 |         if (len(azs) > 1):
 57 |             finding_rec['potential_issue'] = False
 58 |             finding_rec['message'] = f"Global Accelerator: All target endpoints for the acceleator {accelerator['Name']} are EC2 instances and they are spread across more than one AZ {azs}"
 59 |         else:
 60 |             finding_rec['potential_issue'] = True
 61 |             finding_rec['message'] = f"Global Accelerator: All target endpoints for the acceleator {accelerator['Name']} are EC2 instances and they are all in a single AZ {azs}"
 62 | 
 63 |         self.findings.append(finding_rec)
 64 | 
 65 |     def get_azs_of_ec2_instances(self, ec2_instance_ids, region):
 66 |         #First break up the EC2 instances in batches
 67 |         ec2_instance_id_batches = [] #List of batches
 68 |         batch_size = 10
 69 |         ec2_instance_counter = 0
 70 | 
 71 |         #Get the list of domain names and batch them in batch_size
 72 |         for ec2_instance_id in ec2_instance_ids:
 73 |             if ((ec2_instance_counter % batch_size) == 0):
 74 |                 ec2_instance_id_batches.append([])
 75 |             ec2_instance_counter = ec2_instance_counter + 1
 76 |             ec2_instance_id_batches[len(ec2_instance_id_batches)-1].append(ec2_instance_id)
 77 | 
 78 |         azs = set()
 79 |         ec2 = self.session.client("ec2", region_name = region)
 80 |         #For each batch, invoke ec2 describe-instances and get the availability zones
 81 |         for ec2_instance_id_batch in ec2_instance_id_batches:
 82 |             resp = ec2.describe_instances(InstanceIds = ec2_instance_id_batch)
 83 |             #print(resp)
 84 | 
 85 |             for ec2_instance in utils.invoke_aws_api_full_list(ec2.describe_instances,
 86 |                                                 "Reservations",
 87 |                                                 InstanceIds = ec2_instance_id_batch):
 88 |                 azs.add(ec2_instance["Instances"][0]["Placement"]["AvailabilityZone"])
 89 | 
 90 |         return(azs)
 91 | 
 92 |     #Contains the logic to extract relevant fields from the API response to the output csv file.
 93 |     def get_finding_rec_from_response(self, accelerator):
 94 | 
 95 |         finding_rec = self.get_finding_rec_with_common_fields()
 96 |         finding_rec['resource_id'] = accelerator['DnsName']
 97 |         finding_rec['resource_name'] = accelerator['Name']
 98 |         finding_rec['resource_arn'] = accelerator['AcceleratorArn']
 99 |         return finding_rec 
100 | 


--------------------------------------------------------------------------------
/src/service_specific_analysers/lambda_analyser.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import boto3
 5 | import logging
 6 | import utils
 7 | from service_analyser import ServiceAnalyser
 8 | 
 9 | class LambdaAnalyser(ServiceAnalyser):
10 | 
11 |     def __init__(self, account_analyser, region):
12 |         super().__init__(account_analyser, region, 'lambda')
13 | 
14 |     def get_findings(self):
15 |         session = self.get_aws_session()
16 |         aws_lambda = session.client("lambda", region_name=self.region)
17 | 
18 |         for lambda_func in utils.invoke_aws_api_full_list(aws_lambda.list_functions, "Functions"):
19 | 
20 |             if "VpcConfig" not in lambda_func.keys(): #Ignore if there is no VpcConfig in the function
21 |                 continue
22 | 
23 |             if lambda_func["VpcConfig"]["VpcId"]: #If it is populated only then is it VPC Enabld. If not, this check can be ignored.
24 |                 finding_rec = self.get_finding_rec_from_response(lambda_func)
25 |                 if len(lambda_func["VpcConfig"]["SubnetIds"]) == 1:
26 |                     finding_rec['potential_issue'] = True
27 |                     finding_rec['message'] = f"Lambda: VPC Enabled Lambda function {lambda_func['FunctionName']} is configured to run in only one subnet."
28 |                 else:
29 |                     finding_rec['potential_issue'] = False
30 |                     finding_rec['message'] = f"Lambda: VPC Enabled Lambda Function {lambda_func['FunctionName']} is configured to run in more than one subnet"
31 |                 self.findings.append(finding_rec)
32 | 
33 |     #Contains the logic to extract relevant fields from the API response to the output csv file.
34 |     def get_finding_rec_from_response(self, lambda_func):
35 | 
36 |         finding_rec = self.get_finding_rec_with_common_fields()
37 |         finding_rec['resource_id'] = ''
38 |         finding_rec['resource_name'] = lambda_func['FunctionName']
39 |         finding_rec['resource_arn'] = lambda_func['FunctionArn']
40 |         return finding_rec 
41 | 


--------------------------------------------------------------------------------
/src/service_specific_analysers/memorydb_analyser.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import boto3
 5 | import logging
 6 | import utils
 7 | from service_analyser import ServiceAnalyser
 8 | 
 9 | class MemoryDBAnalyser(ServiceAnalyser):
10 | 
11 |     def __init__(self, account_analyser, region):
12 |         super().__init__(account_analyser, region, 'memorydb')
13 | 
14 |     def get_findings(self):
15 |         self.session = self.get_aws_session()
16 |         self.memorydb = self.session.client("memorydb", region_name=self.region)
17 |         self.get_memorydb_findings()
18 | 
19 |     def get_memorydb_findings(self):
20 | 
21 |         for cluster in utils.invoke_aws_api_full_list(self.memorydb.describe_clusters, "Clusters", ShowShardDetails = True):
22 |             finding_rec = self.get_finding_rec_from_response(cluster)
23 |             issue_found = False
24 |             for shard in cluster["Shards"]:
25 |                 if len(shard["Nodes"]) == 1:
26 |                     finding_rec['potential_issue'] = True
27 |                     finding_rec['message'] = f"Memory DB Cluster: Shard {shard['Name']} in cluster {cluster['Name']} does not have any replicas"
28 |                     issue_found = True
29 |                     break            
30 | 
31 |             if not issue_found:
32 |                 finding_rec['potential_issue'] = False
33 |                 finding_rec['message'] = f"Memory DB Cluster: All shards in cluster {cluster['Name']} have replicas"
34 |             self.findings.append(finding_rec)
35 | 
36 |     def get_finding_rec_from_response(self, cluster):
37 | 
38 |         finding_rec = self.get_finding_rec_with_common_fields()
39 |         finding_rec['resource_id'] = ''
40 |         finding_rec['resource_name'] = cluster['Name']
41 |         finding_rec['resource_arn'] = cluster['ARN']
42 |         return finding_rec 
43 | 


--------------------------------------------------------------------------------
/src/service_specific_analysers/opensearch_analyser.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import boto3
 5 | import logging
 6 | import utils
 7 | import utils
 8 | from service_analyser import ServiceAnalyser
 9 | 
10 | class OpensearchAnalyser(ServiceAnalyser):
11 | 
12 |     def __init__(self, account_analyser, region):
13 |         super().__init__(account_analyser, region, 'opensearch')
14 | 
15 |     def get_findings(self):
16 | 
17 |         session = self.get_aws_session()
18 |         opensearch = session.client("opensearch", region_name=self.region)
19 |         domain_name_batches = [] #List of batches
20 |         batch_size = 5
21 |         batch_counter = 0
22 |         domain_counter = 0
23 | 
24 |         #Get the list of domain names and batch them in batch_size
25 |         for domain_name in utils.invoke_aws_api_full_list(opensearch.list_domain_names, "DomainNames"):
26 |             if ((domain_counter % batch_size) == 0):
27 |                 domain_name_batches.append([])
28 |             domain_counter = domain_counter + 1
29 |             domain_name_batches[len(domain_name_batches)-1].append(domain_name['DomainName'])
30 | 
31 |         #Validate the domain names in batches as the validate_opensearch_domains API can get information about multiple domains in one API call.
32 |         for domain_name_batch in domain_name_batches:
33 |             self.validate_opensearch_domains(opensearch, domain_name_batch)
34 | 
35 |     def validate_opensearch_domains(self, opensearch, domain_names):
36 | 
37 |         for domain in utils.invoke_aws_api_full_list(opensearch.describe_domains, "DomainStatusList", DomainNames = domain_names):
38 |             finding_rec = self.get_finding_rec_from_response(domain)
39 |             if len(domain["VPCOptions"]["AvailabilityZones"]) > 1:
40 |                 finding_rec['potential_issue'] = False
41 |                 finding_rec['message'] = f"Opensearch domain: Domain {domain['DomainName']} with ARN {domain['ARN'] } is multi AZ enabled."
42 |             else:
43 |                 finding_rec['potential_issue'] = True
44 |                 finding_rec['message'] = f"Opensearch domain: Domain {domain['DomainName']} with ARN {domain['ARN'] } is only in a single AZ."
45 |             self.findings.append(finding_rec)
46 | 
47 |     #Contains the logic to extract relevant fields from the API response to the output csv file.
48 |     def get_finding_rec_from_response(self, domain):
49 |         finding_rec = self.get_finding_rec_with_common_fields()
50 |         finding_rec['service'] = 'opensearch'
51 |         finding_rec['region'] = self.region
52 |         finding_rec['resource_id'] = domain['DomainId']
53 |         finding_rec['resource_name'] = domain['DomainName']
54 |         finding_rec['resource_arn'] = domain['ARN']
55 |         return finding_rec 
56 | 


--------------------------------------------------------------------------------
/src/service_specific_analysers/rds_analyser.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import boto3
 5 | import logging
 6 | import utils
 7 | from service_analyser import ServiceAnalyser
 8 | 
 9 | class RDSAnalyser(ServiceAnalyser):
10 | 
11 |     def __init__(self, account_analyser, region):
12 |         super().__init__(account_analyser, region, 'rds')
13 | 
14 |     def get_findings(self):
15 |         self.session = self.get_aws_session()
16 |         self.rds = self.session.client("rds", region_name=self.region)
17 |         self.get_db_instance_findings()
18 |         self.get_db_cluster_findings()
19 |     
20 |     def get_db_instance_findings(self):
21 |         for db_instance in utils.invoke_aws_api_full_list(self.rds.describe_db_instances, "DBInstances"):
22 |             if db_instance["Engine"] == "docdb": #Ignore any Document DB instances as they are covered separately.
23 |                 continue
24 |             
25 |             if "DBClusterIdentifier" in db_instance: #This DB instance is part of a cluster. So it will be handled as part of cluster analyser
26 |                 continue
27 | 
28 |             finding_rec = self.get_finding_rec_from_response_instance(db_instance)
29 | 
30 |             if db_instance["MultiAZ"]:
31 |                 finding_rec['potential_issue'] = False
32 |                 finding_rec['message'] = f"RDS Instance: {db_instance['DBInstanceIdentifier']} has MultiAZ enabled"
33 |             else:
34 |                 finding_rec['potential_issue'] = True
35 |                 finding_rec['message'] = f"RDS Instance: {db_instance['DBInstanceIdentifier']} has MultiAZ disabled"
36 |             self.findings.append(finding_rec)
37 | 
38 |     def get_db_cluster_findings(self):
39 |         for db_cluster in utils.invoke_aws_api_full_list(self.rds.describe_db_clusters, "DBClusters"):
40 |             if db_cluster["Engine"] in ["docdb","neptune"]: #Ignore any Document DB, Neptune clusters.
41 |                 continue
42 |             
43 |             finding_rec = self.get_finding_rec_from_response_cluster(db_cluster)
44 | 
45 |             if db_cluster["MultiAZ"]:
46 |                 finding_rec['potential_issue'] = False
47 |                 finding_rec['message'] = f"RDS Cluster: {db_cluster['DBClusterIdentifier']} has MultiAZ enabled"
48 |             else:
49 |                 finding_rec['potential_issue'] = True
50 |                 finding_rec['message'] = f"RDS Cluster {db_cluster['DBClusterIdentifier']} has MultiAZ disabled"
51 |             self.findings.append(finding_rec)
52 | 
53 |     #Contains the logic to extract relevant fields from the API response to the output csv file.
54 |     def get_finding_rec_from_response_instance(self, db_instance):
55 | 
56 |         finding_rec = self.get_finding_rec_with_common_fields()
57 | 
58 |         finding_rec['resource_id'] = ''
59 |         finding_rec['resource_name'] = db_instance['DBInstanceIdentifier']
60 |         finding_rec['resource_arn'] = db_instance['DBInstanceArn']
61 |         finding_rec['engine'] = db_instance["Engine"]
62 |         return finding_rec 
63 | 
64 |     def get_finding_rec_from_response_cluster(self, db_cluster):
65 | 
66 |         finding_rec = self.get_finding_rec_with_common_fields()
67 | 
68 |         finding_rec['resource_id'] = ''
69 |         finding_rec['resource_name'] = db_cluster['DBClusterIdentifier']
70 |         finding_rec['resource_arn'] = db_cluster['DBClusterArn']
71 |         finding_rec['engine'] = db_cluster["Engine"]
72 |         return finding_rec 
73 | 


--------------------------------------------------------------------------------
/src/service_specific_analysers/redshift_analyser.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import utils
 5 | from service_analyser import ServiceAnalyser
 6 | 
 7 | class RedshiftAnalyser(ServiceAnalyser):
 8 | 
 9 |     def __init__(self, account_analyser, region):
10 |         super().__init__(account_analyser, region, 'redshift')
11 | 
12 |     def get_findings(self):
13 |         session = self.get_aws_session()
14 |         redshift = session.client("redshift", region_name=self.region)
15 | 
16 |         for cluster in utils.invoke_aws_api_full_list(redshift.describe_clusters, "Clusters"):
17 |             finding_rec = self.get_finding_rec_from_response(cluster)
18 |             if cluster["MultiAZ"] == "Enabled":
19 |                 finding_rec['potential_issue'] = False
20 |                 finding_rec['message'] = f"Redshift Cluster: {cluster['ClusterIdentifier']} is in multiple AZs"
21 |             else:
22 |                 finding_rec['potential_issue'] = True
23 |                 finding_rec['message'] = f"Redshift Cluster: {cluster['ClusterIdentifier']} is in a single AZ"
24 |             self.findings.append(finding_rec)
25 | 
26 |     #Contains the logic to extract relevant fields from the API response to the output csv file.
27 |     def get_finding_rec_from_response(self, cluster):
28 | 
29 |         finding_rec = self.get_finding_rec_with_common_fields()
30 |         finding_rec['resource_id'] = cluster['ClusterIdentifier']
31 |         finding_rec['resource_name'] = cluster['ClusterIdentifier']
32 |         finding_rec['resource_arn'] = f"arn:aws:redshift:{self.region}:{self.account_analyser.account_id}:cluster-name/{cluster['ClusterIdentifier']}"
33 |         return finding_rec 
34 | 


--------------------------------------------------------------------------------
/src/service_specific_analysers/sgw_analyser.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import boto3
 5 | import logging
 6 | import utils
 7 | from service_analyser import ServiceAnalyser
 8 | 
 9 | class SGWAnalyser(ServiceAnalyser):
10 | 
11 |     def __init__(self, account_analyser, region):
12 |         super().__init__(account_analyser, region, 'sgw')
13 | 
14 |     def get_findings(self):
15 | 
16 |         session = self.get_aws_session()
17 |         sgw = session.client("storagegateway", region_name=self.region)
18 | 
19 |         for gateway in utils.invoke_aws_api_full_list(sgw.list_gateways, "Gateways"):
20 |             finding_rec = self.get_finding_rec_from_response(gateway)
21 |             if (("Ec2InstanceRegion" in gateway.keys()) and (len(gateway["Ec2InstanceRegion"]))):
22 |                 finding_rec['potential_issue'] = True
23 |                 finding_rec['message'] = f"Storge Gateway: Gateway {gateway['GatewayName']} with ARN {gateway['GatewayARN']} in hosted on AWS. Please ensure this gateway is not used for critical workloads"
24 |             else:
25 |                 finding_rec['potential_issue'] = False
26 |                 finding_rec['message'] = f"Storge Gateway: Gateway {gateway['GatewayName']} is not hosted on AWS"
27 | 
28 |             self.findings.append(finding_rec)
29 | 
30 |     #Contains the logic to extract relevant fields from the API response to the output csv file.
31 |     def get_finding_rec_from_response(self, gateway):
32 | 
33 |         finding_rec = self.get_finding_rec_with_common_fields()
34 | 
35 |         finding_rec['resource_id'] = gateway['GatewayId']
36 |         finding_rec['resource_name'] = gateway['GatewayName']
37 |         finding_rec['resource_arn'] = gateway['GatewayARN']
38 | 
39 |         return finding_rec
40 | 


--------------------------------------------------------------------------------
/src/service_specific_analysers/vpce_analyser.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import boto3
 5 | import logging
 6 | import utils
 7 | from service_analyser import ServiceAnalyser
 8 | 
 9 | class VPCEAnalyser(ServiceAnalyser):
10 | 
11 |     def __init__(self, account_analyser, region):
12 |         super().__init__(account_analyser, region, 'vpce')
13 | 
14 |     def get_findings(self):
15 |         session = self.get_aws_session()
16 |         ec2 = session.client("ec2", region_name=self.region)
17 | 
18 |         for vpce in utils.invoke_aws_api_full_list(ec2.describe_vpc_endpoints, "VpcEndpoints", Filters = [ {'Name':'vpc-endpoint-type', 'Values' : ['Interface']} ]):
19 |             subnet_ids = vpce["SubnetIds"]
20 | 
21 |             finding_rec = self.get_finding_rec_from_response(vpce)
22 | 
23 |             if len(subnet_ids) > 1:
24 |                 finding_rec['potential_issue'] = False
25 |                 finding_rec['message'] = f"VPCE: {vpce['VpcEndpointId']} has multiple subnets: {subnet_ids}"
26 |             else:
27 |                 finding_rec['potential_issue'] = True
28 |                 finding_rec['message'] = f"VPCE: {vpce['VpcEndpointId']} has a single subnet: {subnet_ids}"
29 | 
30 |             self.findings.append(finding_rec)
31 | 
32 |     def get_finding_rec_from_response(self, vpce):
33 | 
34 |         finding_rec = self.get_finding_rec_with_common_fields()
35 | 
36 |         finding_rec['resource_id'] = vpce['VpcEndpointId']
37 |         finding_rec['resource_name'] = ''
38 |         for tag in vpce['Tags']:
39 |             if tag['Key'] == 'Name':
40 |                 finding_rec['resource_name'] = tag['Value']
41 |         finding_rec['resource_arn'] = vpce['ServiceName']
42 |         return finding_rec
43 | 


--------------------------------------------------------------------------------
/src/utils.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | import argparse
  5 | import logging
  6 | import re
  7 | import time
  8 | import threading
  9 | import boto3
 10 | import botocore
 11 | from datetime import datetime, date
 12 | from dataclasses import dataclass
 13 | 
 14 | 
 15 | @dataclass
 16 | class ConfigInfo:
 17 |     regions: list
 18 |     services: list
 19 |     max_concurrent_threads: int
 20 |     output_folder_name: str
 21 |     event_bus_arn: str
 22 |     log_level: str
 23 |     aws_profile_name: str
 24 |     aws_assume_role_name: str
 25 |     single_threaded: bool
 26 |     run_report_file_name: str
 27 |     bucket_name: str
 28 |     account_id: str
 29 |     truncate_output: bool
 30 |     filename_with_accountid: bool
 31 |     report_only_issues: bool
 32 | 
 33 | all_services = ['vpce',
 34 |                 'dms',
 35 |                 'docdb',
 36 |                 'sgw',
 37 |                 'efs',
 38 |                 'opensearch',
 39 |                 'fsx',
 40 |                 'lambda',
 41 |                 'elasticache',
 42 |                 'dax',
 43 |                 'globalaccelerator',
 44 |                 'rds',
 45 |                 'memorydb',
 46 |                 'dx',
 47 |                 'cloudhsm',
 48 |                 'redshift']
 49 | 
 50 | #Use the below function,if needed, as print(json.dumps(db_instance,  default = json_serialise, indent = 4))
 51 | def json_serialise(obj):
 52 |     if isinstance(obj, datetime):
 53 |         return obj.strftime("%Y-%m-%d, %H:%M:%S %Z")
 54 |     elif isinstance(obj, date):
 55 |         return obj.strftime("%Y-%m-%d %Z")
 56 |     else:
 57 |         raise TypeError (f"Type {type(obj)} not serializable")
 58 | 
 59 | #Reference: https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_iam-quotas.html
 60 | def regex_validator_generator(regex, desc_param_name, custom_message = ""):
 61 |     pattern = re.compile(regex)
 62 |     def regex_validator(arg_value):
 63 |         if arg_value is None:
 64 |             return arg_value
 65 |         elif not pattern.match(arg_value):
 66 |             raise argparse.ArgumentTypeError(f"Invalid {desc_param_name}. {custom_message}")
 67 |         return arg_value
 68 |     return regex_validator
 69 | 
 70 | def maxlen_validator_generator(max_len, desc_param_name):
 71 |     def maxlen_validator(arg_value):
 72 |         if arg_value is None:
 73 |             return arg_value
 74 |         elif len(arg_value) > max_len:
 75 |             raise argparse.ArgumentTypeError(f"{desc_param_name} too long. It should not exceed {max_len} characters.")
 76 |         return arg_value
 77 |     return maxlen_validator
 78 | 
 79 | def log_func(func):
 80 |     def inner(*args, **kwargs): 
 81 |         logging.debug(f"In thread {threading.current_thread().name}: Starting {func.__name__} with args: {args} and key word args: {kwargs}")
 82 |         start = time.time()
 83 |         result = func(*args, **kwargs)
 84 |         end = time.time()
 85 |         logging.info(f"Completed {func.__name__} in thread {threading.current_thread().name} with args: {args} and key word args: {kwargs} in {end-start} seconds.")
 86 |         return result
 87 |     return inner
 88 | 
 89 | def get_aws_session(session_name = None):
 90 |     session = boto3.session.Session(profile_name = config_info.aws_profile_name)
 91 | 
 92 |     if config_info.aws_assume_role_name: #Need to assume a role before creating an org client
 93 |         
 94 |         logging.info(f"aws-assume-role option is used. About to assume the role {config_info.aws_assume_role_name}")
 95 | 
 96 |         sts_client = session.client('sts')
 97 |         account_id = sts_client.get_caller_identity()["Account"]
 98 | 
 99 |         if not session_name:
100 |             session_name = "AssumeRoleForFaultToleranceAnalyser"
101 | 
102 |         assumed_role_object=sts_client.assume_role(
103 |             RoleArn=f"arn:aws:iam::{account_id}:role/{config_info.aws_assume_role_name}",
104 |             RoleSessionName=session_name
105 |         )
106 |         credentials=assumed_role_object['Credentials']
107 | 
108 |         assumed_role_session = boto3.session.Session(
109 |             aws_access_key_id=credentials['AccessKeyId'],
110 |             aws_secret_access_key=credentials['SecretAccessKey'],
111 |             aws_session_token=credentials['SessionToken'],
112 |         )
113 |         logging.info(f"Assumed the role {config_info.aws_assume_role_name} with session name {session_name}")
114 |         return assumed_role_session
115 |     else:
116 |         return session
117 | 
118 | def check_aws_credentials():
119 |     try:
120 |         session = boto3.session.Session(profile_name = config_info.aws_profile_name)
121 |         sts = session.client("sts")
122 |         resp = sts.get_caller_identity()
123 |         account_id = resp["Account"]
124 |         return account_id
125 |     except botocore.exceptions.ClientError as error:
126 |         raise error
127 | 
128 | def get_approved_regions():
129 |     session = get_aws_session(session_name = 'ValidateRegions')
130 |     ec2 = session.client("ec2", region_name='us-east-1')
131 |     response = ec2.describe_regions()
132 |     approved_regions = [region["RegionName"] for region in response["Regions"]]
133 |     return approved_regions
134 | 
135 | def regions_validator(input_regions):
136 | 
137 |     approved_regions = get_approved_regions()
138 | 
139 |     if 'ALL' in input_regions:
140 |         if len(input_regions) == 1: #'ALL' is the only input
141 |             return approved_regions
142 |         else:
143 |             raise argparse.ArgumentTypeError(f"When providing 'ALL' as an input region, please do not provide any other regions. 'ALL' implies all approved regions.")
144 |     else:
145 |         for input_region in input_regions:
146 |             if input_region not in approved_regions:
147 |                 raise argparse.ArgumentTypeError(f"{input_region} is not in the list of approved regions for this account. Please provide only approved regions, or specify ALL for all regions that are approved")
148 |     return input_regions
149 | 
150 | def services_validator(input_services):
151 |     if 'ALL' in input_services:
152 |         if len(input_services) == 1: #'ALL' is the only input
153 |             return all_services
154 |         else:
155 |             raise argparse.ArgumentTypeError(f"When providing 'ALL' as an input service, please do not provide any other services. 'ALL' implies the following services: {all_services}")
156 |     else:
157 |         return input_services
158 | 
159 | def bus_arn_validator(event_bus_arn):
160 | 
161 |     if event_bus_arn is None:
162 |         return event_bus_arn
163 | 
164 |     arn_parts = parse_arn(event_bus_arn)
165 | 
166 |     #ARN is validated. Now check if the region is correct.
167 |     if arn_parts['region'] == 'ALL':
168 |         raise argparse.ArgumentTypeError(f"Invalid region in event bus arn")
169 |     else:
170 |         approved_regions = get_approved_regions()
171 |         if arn_parts['region'] not in approved_regions:
172 |             raise argparse.ArgumentTypeError(f"{arn_parts['region']} is not in the list of approved regions for this account. Please provide an event bus in an approved regions")
173 |     
174 |     #Check if the resource is in the right format
175 |     bus_name_regex = r"^[A-Za-z0-9._-]{1,256}$"
176 |     bus_name_pattern = re.compile(bus_name_regex)
177 | 
178 |     if arn_parts['resource_type'] != "event-bus":
179 |         raise argparse.ArgumentTypeError(f"Resource type '{arn_parts['resource_type']}' in the ARN is not valid for an event bus ARN. It should be 'event-bus'")
180 |     elif not bus_name_pattern.match(arn_parts['resource_id']):
181 |         raise argparse.ArgumentTypeError(f"{arn_parts['resource_id']} is not a valid bus name. Maximum of 256 characters consisting of numbers, lower/upper case letters, .,-,_.")
182 | 
183 |     return event_bus_arn
184 | 
185 | def arn_validator(arn):
186 |     regex = r"^arn:(aws|aws-gov|aws-cn):.*:.*:.*:.*/$"
187 |     pattern = re.compile(regex)
188 |     if not pattern.match(arn):
189 |         raise argparse.ArgumentTypeError(f"The provided ARN is invalid. Please provide a valid ARN. Ref: https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html")
190 |     return arn
191 | 
192 | def bucket_name_validator(bucket_name):
193 | 
194 |     regex = r"^[a-z0-9][a-z0-9.-]{1,61}[a-z0-9]$"
195 |     pattern = re.compile(regex)
196 |     if not (3 <= len(bucket_name) <= 63):
197 |         raise argparse.ArgumentTypeError(f"Invalid bucket name. It must be between 3 and 63 characters in length")
198 |     if not pattern.match(bucket_name):
199 |         raise argparse.ArgumentTypeError(f"Invalid bucket name. Bucket names must be between 3 (min) and 63 (max) characters long. Bucket names can consist only of lowercase letters, numbers, dots (.), and hyphens (-). Bucket names must begin and end with a letter or number.")
200 |     if ".." in bucket_name:
201 |         raise argparse.ArgumentTypeError(f"Invalid bucket name. Bucket names should not have consecutive periods '..' ")
202 |     if bucket_name.startswith("xn--") or bucket_name.endswith('-s3alias'):
203 |         raise argparse.ArgumentTypeError(f"Invalid bucket name. Bucket names should not start with 'xn--' or end with '-s3alias'")
204 |     
205 |     return bucket_name
206 | 
207 | 
208 | def get_config_info():
209 | 
210 |     #Define the arguments
211 |     parser = argparse.ArgumentParser(description='Generate fault tolerance findings for different services', add_help=False)
212 | 
213 |     required_params_group = parser.add_argument_group('Required arguments')
214 |     required_params_group.add_argument('-s', '--services', nargs='+', choices = all_services + ['ALL'],
215 |                         help=f"Indicate which service(s) you want to fetch fault tolerance findings for. Options are {all_services}. Use 'ALL' for all services",
216 |                         required = True
217 |                         )
218 |     required_params_group.add_argument('-r', '--regions', nargs='+',
219 |                         help='Indicate which region(s) you want to fetch fault tolerance findings for. Use "ALL" for all approved regions',
220 |                         required = True
221 |                         )
222 | 
223 |     optional_params_group = parser.add_argument_group('Optional arguments')
224 |     optional_params_group.add_argument('-h', '--help', action="help", help = "show this message and exit")
225 | 
226 |     optional_params_group.add_argument('-m', '--max-concurrent-threads', dest='max_concurrent_threads',
227 |                         default = 20,
228 |                         type=int,
229 |                         help='Maximum number of threads that will be running at any given time. Default is 20')
230 |     optional_params_group.add_argument('-o', '--output', dest='output_folder_name',
231 |                         default='output/',
232 |                         type=regex_validator_generator(regex = r".+/$", desc_param_name = "Output folder name",
233 |                         custom_message = "Provide an output folder name where the findings csv and the run report csv will be placed"),
234 |                         help='''Name of the folder where findings output csv file and the run report csv file will be written. 
235 |                             If it does not exist, it will be created. If a bucket name is also provided, then the folder will be looked for under the bucket, and if not present, will be created
236 |                             If a bucket name is not provided, then this folder will be expected under the directory in which the script is ran. In case a bucket is provided, the files will be generated in this folder first and then pushed to the bucket
237 |                             Please ensure there is a forward slash '/' at the end of the folder path
238 |                             Output file name will be of the format Fault_Tolerance_Findings_<account_id>_<account_name>_<Run date in YYYY_MM_DD format>.csv. Example: Fault_Tolerance_Findings_123456789101_TestAccount_2022_11_01.csv
239 |                             If you do not use the --filename-with-accountid option, the output file name will be of the format:
240 |                             Fault_Tolerance_Findings_<Run date in YYYY_MM_DD format>.csv. Example: Fault_Tolerance_Findings_2022_11_01.csv''')
241 |     optional_params_group.add_argument('-b', '--bucket', dest='bucket_name',
242 |                         default = None,
243 |                         type=bucket_name_validator,
244 |                         help='Name of the bucket where findings output csv file and the run report csv file will be uploaded to')
245 |     optional_params_group.add_argument('--event-bus-arn', dest='event_bus_arn',
246 |                         default=None,
247 |                         type=regex_validator_generator(regex = r"arn:(aws|aws-gov|aws-cn):events:.*:.*:event-bus*/[A-Za-z0-9._-]{1,256}$", desc_param_name = "Event Bus ARN",
248 |                         custom_message = "Provide the ARN of an event bus in AWS Eventbridge to which findings will be published"),
249 |                         help='''ARN of the event bus in AWS Eventbridge to which findings will be published.''')
250 |     optional_params_group.add_argument('--aws-profile', dest='aws_profile_name',
251 |                         default=None,
252 |                         type=maxlen_validator_generator(max_len = 250,desc_param_name = "AWS Profile name"),
253 |                         help="Use this option if you want to pass in an AWS profile already congigured for the CLI")
254 |     optional_params_group.add_argument('--aws-assume-role', dest='aws_assume_role_name',
255 |                         default=None,
256 |                         type=regex_validator_generator(regex = r"^[a-zA-Z0-9+=,.@_-]+$", desc_param_name = "IAM Role name"),
257 |                         #type=iam_entity,
258 |                         help="Use this option if you want the aws profile to assume a role before querying Org related information")
259 |     optional_params_group.add_argument('--log-level', dest='log_level',
260 |                         default='ERROR', choices = ['DEBUG','INFO','WARNING','ERROR','CRITICAL'],
261 |                         help="Log level. Needs to be one of the following: 'DEBUG','INFO','WARNING','ERROR','CRITICAL'")
262 |     optional_params_group.add_argument('--single-threaded', action='store_true', dest='single_threaded',
263 |                         default=False,
264 |                         help="Use this option to specify that the service+region level information gathering threads should not run in parallel. Default is False, which means the script uses multi-threading by default. Same effect as setting max-running-threads to 1")
265 |     optional_params_group.add_argument('--truncate-output', action='store_true', dest='truncate_output',
266 |                         default=False,
267 |                         help="Use this flag to make sure that if the output file already exists, the file is truncated. Default is False. Useful if you are invoking this script to refresh findings within the same day (on a different day, the output file will have a different file name)")
268 |     optional_params_group.add_argument('--filename-with-accountid', action='store_true', dest='filename_with_accountid',
269 |                         default=False,
270 |                         help='''Use this flag to include account id in the output file name. 
271 |                         By default this is False, meaning, account id will not be in the file name. 
272 |                         The default mode is useful if you are running the script for more than one account,
273 |                         and want all the accounts' findings to be in the same output file.''')
274 |     optional_params_group.add_argument('--report-only-issues', action='store_true', dest='report_only_issues',
275 |                         default=False,
276 |                         help="Use this flag to report only findings that are potential issues. Resources that have no identified issues will not appear in the final csv file. Default is to report all findings.")
277 |     args = parser.parse_args()
278 | 
279 |     #Set up logging
280 |     logging.basicConfig(
281 |         format='%(asctime)s %(levelname)-8s %(message)s',
282 |         level=args.log_level,
283 |         datefmt='%Y-%m-%d %H:%M:%S')
284 | 
285 |     global config_info
286 | 
287 |     config_info = ConfigInfo(
288 |                             regions = [],
289 |                             services = [],
290 |                             max_concurrent_threads = args.max_concurrent_threads,
291 |                             output_folder_name = args.output_folder_name,
292 |                             event_bus_arn=args.event_bus_arn,
293 |                             log_level = args.log_level,
294 |                             aws_profile_name = args.aws_profile_name,
295 |                             aws_assume_role_name = args.aws_assume_role_name,
296 |                             single_threaded = args.single_threaded,
297 |                             run_report_file_name = "run_report.csv",
298 |                             bucket_name = args.bucket_name,
299 |                             account_id = '',
300 |                             truncate_output = args.truncate_output,
301 |                             filename_with_accountid = args.filename_with_accountid,
302 |                             report_only_issues = args.report_only_issues
303 |                 )
304 | 
305 | 
306 |     #First check credentials
307 |     account_id = check_aws_credentials()
308 | 
309 |     #Validate regions
310 |     config_info.account_id = account_id
311 |     config_info.regions = regions_validator(args.regions)
312 |     config_info.services = services_validator(args.services)
313 |     config_info.event_bus_arn = bus_arn_validator(args.event_bus_arn)
314 | 
315 | def invoke_aws_api_full_list (api_method, top_level_member, **kwargs):
316 | 
317 |     logging.info(f"Invoking {api_method.__self__.__class__.__name__}.{api_method.__name__} for {top_level_member} with the parameters {kwargs}")
318 |     response = api_method(**kwargs)
319 | 
320 |     for response_item in response[top_level_member]:
321 |         yield(response_item)
322 | 
323 |     while ('NextToken' in response):
324 |         response = api_method(NextToken = response['NextToken'], **kwargs)
325 |         for response_item in response[top_level_member]:
326 |             yield(response_item)
327 | 
328 | def parse_arn(arn):
329 |     parts = arn.split(":")
330 |     if len(parts) == 7: #Follows the format "arn:partition:service:region:account-id:resource-type:resource-id"
331 |         result = {
332 |             'arn': parts[0],
333 |             'partition': parts[1],
334 |             'service': parts[2],
335 |             'region': parts[3],
336 |             'account_id': parts[4],
337 |             'resource_type': parts[5],
338 |             'resource_id': parts[6]
339 |         }
340 |     elif len(parts) == 6:
341 |         if "/" in parts[5]: #Follows the format "arn:partition:service:region:account-id:resource-type/resource-id"
342 |             resource_info = parts[5]
343 |             resource_parts = resource_info.split("/")
344 |             result = {
345 |                 'arn': parts[0],
346 |                 'partition': parts[1],
347 |                 'service': parts[2],
348 |                 'region': parts[3],
349 |                 'account_id': parts[4],
350 |                 'resource_type': resource_parts[0],
351 |                 'resource_id': resource_parts[1],
352 |             }
353 |         else: #follows the format #Follows the format "arn:partition:service:region:account-id:resource-id"
354 |             result = {
355 |                 'arn': parts[0],
356 |                 'partition': parts[1],
357 |                 'service': parts[2],
358 |                 'region': parts[3],
359 |                 'account_id': parts[4],
360 |                 'resource_type': None,
361 |                 'resource_id': parts[5],
362 |             }
363 |     else:
364 |         raise argparse.ArgumentTypeError(f"Invalid ARN. Does not follow the pattern defined here: https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html")
365 | 
366 |     return result
367 | 


--------------------------------------------------------------------------------