├── .github ├── PULL_REQUEST_TEMPLATE.md └── workflows │ ├── deploy.yml │ └── test.yml ├── .yamllint ├── DEV.md ├── LICENSE ├── README.md ├── data └── network.json ├── doc ├── update1.png ├── update2.png └── update3.png ├── marbot-alb.yml ├── marbot-auto-scaling-group.yml ├── marbot-aws-account-connection.yml ├── marbot-cloudformation-drift.yml ├── marbot-cloudfront.yml ├── marbot-ec2-instance.yml ├── marbot-ec2-instances-nested.yml ├── marbot-ec2-instances.yml ├── marbot-efs.yml ├── marbot-elastic-beanstalk.config ├── marbot-elastic-beanstalk.yml ├── marbot-elasticache-memcached.yml ├── marbot-elasticsearch.yml ├── marbot-interface-endpoint.yml ├── marbot-lambda-function.yml ├── marbot-nat-gateway.yml ├── marbot-rds-cluster.yml ├── marbot-rds.yml ├── marbot-redshift.yml ├── marbot-repeated-task.yml ├── marbot-reserved-instance.yml ├── marbot-sqs-queue.yml ├── marbot-standalone-topic.yml ├── marbot-synthetics-website.yml ├── marbot-workspaces.yml └── marbot.yml /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | **(Override all values in parentheses)** 2 | 3 | (Run `yamllint template.yml` and `aws cloudformation validate-template --template-body file://template.yml` before you open a PR) 4 | 5 | (Do not include multiple changes in one PR. Open additional PRs instead.) 6 | 7 | --- 8 | 9 | (describe your change here) 10 | -------------------------------------------------------------------------------- /.github/workflows/deploy.yml: -------------------------------------------------------------------------------- 1 | --- 2 | name: Deploy 3 | on: 4 | push: 5 | branches: 6 | - master 7 | permissions: 8 | id-token: write 9 | contents: read 10 | defaults: 11 | run: 12 | shell: bash 13 | jobs: 14 | deploy: 15 | runs-on: ubuntu-22.04 16 | steps: 17 | - uses: actions/checkout@v4 18 | - uses: aws-actions/configure-aws-credentials@v4 19 | with: 20 | role-to-assume: arn:aws:iam::853553028582:role/github-openid-connect 21 | role-session-name: github-actions-monitoring-jump-start 22 | aws-region: eu-west-1 23 | - run: | 24 | aws s3 sync . s3://monitoring-jump-start --exclude '*' --include 'marbot*.yml' --include 'data/*' --delete 25 | for file in marbot*.yml; do version=$(yq e '.Outputs.StackVersion.Value' $file); if [ "$version" != "null" ]; then aws s3 cp $file s3://monitoring-jump-start/v$version/$file; fi; done 26 | -------------------------------------------------------------------------------- /.github/workflows/test.yml: -------------------------------------------------------------------------------- 1 | --- 2 | name: Test 3 | on: 4 | push: 5 | branches: 6 | - master 7 | pull_request: 8 | branches: 9 | - master 10 | defaults: 11 | run: 12 | shell: bash 13 | jobs: 14 | test: 15 | runs-on: ubuntu-22.04 16 | steps: 17 | - uses: actions/checkout@v4 18 | - uses: actions/setup-python@v5 19 | with: 20 | python-version: '3.13' 21 | - run: | 22 | pip install cfn-lint==1.34.2 23 | yamllint *.yml 24 | cfn-lint -t *.yml 25 | find . -name 'marbot*.yml' | while read file; do set -ex && grep -q "LICENSE-2.0" "$file"; done; 26 | for file in marbot*.yml; do version1=$(yq e '.Outputs.StackVersion.Value' $file); version2=$(yq e '.Resources.MonitoringJumpStartEvent.Properties.Targets[0].Input' $file | jq -r '.StackVersion'); if [ "$version1" != "$version2" ]; then echo "version $version1 does not match $version2 in $file"; exit 1; fi; done 27 | -------------------------------------------------------------------------------- /.yamllint: -------------------------------------------------------------------------------- 1 | --- 2 | extends: default 3 | rules: 4 | indentation: 5 | indent-sequences: false 6 | line-length: 7 | max: 999 8 | comments: 9 | min-spaces-from-content: 1 10 | -------------------------------------------------------------------------------- /DEV.md: -------------------------------------------------------------------------------- 1 | # Developer notes 2 | 3 | > Every push to master is automatically "released". A realse means that the files are copied to S3 (bucket `monitoring-jump-start`). 4 | 5 | ## New version 6 | 7 | If you update the `StackVersion` output: 8 | 9 | * Update the version that is reported in the template (resource `MonitoringJumpStartEvent`.). 10 | * In the marbot code base, update the latest version in `data/jumpstart.js` as well. 11 | * Push to master. 12 | 13 | ## New Template 14 | 15 | If you add a new template: 16 | 17 | * In the marbot code base, add the latest version to `data/jumpstart.js`. 18 | * In the marbot code base, add the template to `lib/nav.js`. 19 | * Push to master. 20 | * Consider to port it to Terraform as well. 21 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright {yyyy} {name of copyright owner} 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | 203 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Free Monitoring Templates for AWS CloudFormation 2 | Setting up monitoring on AWS is hard. There are countless monitoring possibilities on AWS. Overlooking the important settings is easy. Monitoring Jump Starts connect you with all relevant AWS sources for comprehensive monitoring coverage. 3 | 4 | Jump Starts are CloudFormation templates or [Terraform modules](https://github.com/marbot-io/monitoring-jump-start-tf) that you can deploy to your AWS account to setup CloudWatch Alarms, CloudWatch Event Rules, and much more. Events are sent to Slack or Microsoft Teams. 5 | 6 | At the moment, you can monitor: 7 | 8 | | Monitoring goal | CloudFormation Action | CloudFormation Template URL | 9 | | --- | --- | --- | 10 | | [AWS basics](marbot.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot.yml` | 11 | | [Application Load Balancer (ALB)](marbot-alb.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-alb.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-alb.yml` | 12 | | [Auto Scaling Group](marbot-auto-scaling-group.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-auto-scaling-group.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-auto-scaling-group.yml` | 13 | | [EC2 instance](marbot-ec2-instance.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-ec2-instance.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-ec2-instance.yml` | 14 | | [EC2 instances](marbot-ec2-instances.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-ec2-instances.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-ec2-instances.yml` | 15 | | [EFS file system](marbot-efs.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-efs.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-efs.yml` | 16 | | [Elastic Beanstalk](marbot-elastic-beanstalk.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-elastic-beanstalk.yml) and don't forget to put the [marbot-elastic-beanstalk.config](marbot-elastic-beanstalk.config) file into your .ebextensions folder! | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-elastic-beanstalk.yml` | 17 | | [ElastiCache memcached cluster](marbot-elasticache-memcached.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-elasticache-memcached.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-elasticache-memcached.yml` | 18 | | [Elasticsearch domain](marbot-elasticsearch.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-elasticsearch.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-elasticsearch.yml` | 19 | | [VPC interface endpoint](marbot-interface-endpoint.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-interface-endpoint.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-interface-endpoint.yml` | 20 | | [CloudFormation Drift Detection](marbot-cloudformation-drift.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-cloudformation-drift.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-cloudformation-drift.yml` | 21 | | [CloudFront](marbot-cloudfront.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-cloudfront.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-cloudfront.yml` | 22 | | [Lambda function](marbot-lambda-function.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-lambda-function.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-lambda-function.yml` | 23 | | [NAT Gateway](marbot-nat-gateway.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-nat-gateway.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-nat-gateway.yml` | 24 | | [RDS database instance](marbot-rds.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-rds.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-rds.yml` | 25 | | [RDS cluster (Aurora)](marbot-rds-cluster.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-rds-cluster.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-rds-cluster.yml` | 26 | | [Redshift cluster](marbot-redshift.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-redshift.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-redshift.yml` | 27 | | [Repeated task](marbot-repeated-task.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-repeated-task.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-repeated-task.yml` | 28 | | [Reserved Instance](marbot-reserved-instance) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-reserved-instance) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-reserved-instance.yml` | 29 | | [SQS queue](marbot-sqs-queue.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-sqs-queue.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-sqs-queue.yml` | 30 | | [Synthetics Website](marbot-synthetics-website.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-synthetics-website.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-synthetics-website.yml` | 31 | | [WorkSpaces](marbot-workspaces.yml) | [Launch Stack](https://console.aws.amazon.com/cloudformation/home#/stacks/create/review?templateURL=https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-workspaces.yml) | `https://s3-eu-west-1.amazonaws.com/monitoring-jump-start/marbot-workspaces.yml` | 32 | 33 | ## Update procedure 34 | 35 | To update a Jump Start: 36 | 37 | 1. Visit the [AWS Management Console](https://console.aws.amazon.com/cloudformation/home#/stacks?filteringText=marbot). 38 | 2. Double-check the selected AWS region. 39 | 3. Search for marbot. 40 | ![Update: step 1](doc/update1.png) 41 | 4. Select a stack and grab the matching **CloudFormation Template URL** from the table above (the **Description** has to match the **Monitoring goal** here!). 42 | 5. Click on **Update**. 43 | 6. Select **Replace current template** and paste the *CloudFormation Template URL* into the **Amazon S3 URL** field. 44 | ![Update: step 2](doc/update2.png) 45 | 7. Click on **Next**. 46 | 8. Scroll to the bottom of the page and click on **Next**. 47 | 9. Once again, scroll to the bottom of the page and click on **Next**. 48 | 10. Scroll to the bottom of the page and select **I acknowledge that AWS CloudFormation might create IAM resources**. 49 | ![Update: step 3](doc/update3.png) 50 | 11. Click on **Update stack**. 51 | 12. Repeat this procedure for every CloudFormation stack that is based on the updated Jump Starts. 52 | 13. If you use more than one AWS region, repeat the procedure for each region. 53 | 54 | ## License 55 | All templates are published under Apache License Version 2.0. 56 | 57 | ## About 58 | A [marbot.io](https://marbot.io/) project. Engineered by [widdix](https://widdix.net). 59 | -------------------------------------------------------------------------------- /data/network.json: -------------------------------------------------------------------------------- 1 | { 2 | "c4.large": { 3 | "baseline": 0.62 4 | }, 5 | "c4.xlarge": { 6 | "baseline": 1.24 7 | }, 8 | "c4.2xlarge": { 9 | "baseline": 2.48 10 | }, 11 | "c4.4xlarge": { 12 | "baseline": 4.96 13 | }, 14 | "c4.8xlarge": { 15 | "baseline": 9.85 16 | }, 17 | "c5.large": { 18 | "baseline": 0.74, 19 | "burst": 10.04 20 | }, 21 | "c5.xlarge": { 22 | "baseline": 1.24, 23 | "burst": 10.04 24 | }, 25 | "c5.2xlarge": { 26 | "baseline": 2.49, 27 | "burst": 10.04 28 | }, 29 | "c5.4xlarge": { 30 | "baseline": 4.97, 31 | "burst": 10.04 32 | }, 33 | "c5.9xlarge": { 34 | "baseline": 10.04 35 | }, 36 | "c5.18xlarge": { 37 | "baseline": 23.88 38 | }, 39 | "d2.xlarge": { 40 | "baseline": 1.24 41 | }, 42 | "d2.2xlarge": { 43 | "baseline": 2.48 44 | }, 45 | "d2.4xlarge": { 46 | "baseline": 4.96 47 | }, 48 | "d2.8xlarge": { 49 | "baseline": 9.85 50 | }, 51 | "g3.4xlarge": { 52 | "baseline": 4.99, 53 | "burst": 10.09 54 | }, 55 | "g3.8xlarge": { 56 | "baseline": 10.09 57 | }, 58 | "g3.16xlarge": { 59 | "baseline": 22.7 60 | }, 61 | "h1.2xlarge": { 62 | "baseline": 2.48, 63 | "burst": 10.09 64 | }, 65 | "h1.4xlarge": { 66 | "baseline": 4.99, 67 | "burst": 10.09 68 | }, 69 | "h1.8xlarge": { 70 | "baseline": 10.09 71 | }, 72 | "h1.16xlarge": { 73 | "baseline": 22.07 74 | }, 75 | "i3.large": { 76 | "baseline": 0.74, 77 | "burst": 10.09 78 | }, 79 | "i3.xlarge": { 80 | "baseline": 1.24, 81 | "burst": 10.09 82 | }, 83 | "i3.2xlarge": { 84 | "baseline": 2.48, 85 | "burst": 10.09 86 | }, 87 | "i3.4xlarge": { 88 | "baseline": 4.99, 89 | "burst": 10.09 90 | }, 91 | "i3.8xlarge": { 92 | "baseline": 10.09 93 | }, 94 | "i3.16xlarge": { 95 | "baseline": 19.65, 96 | "burst": 22.46 97 | }, 98 | "i3.metal": { 99 | "baseline": 22.05, 100 | "burst": 24.16 101 | }, 102 | "m3.medium": { 103 | "baseline": 0.3 104 | }, 105 | "m3.large": { 106 | "baseline": 0.69 107 | }, 108 | "m3.xlarge": { 109 | "baseline": 0.99 110 | }, 111 | "m3.2xlarge": { 112 | "baseline": 0.99 113 | }, 114 | "m4.large": { 115 | "baseline": 0.45 116 | }, 117 | "m4.xlarge": { 118 | "baseline": 0.74 119 | }, 120 | "m4.2xlarge": { 121 | "baseline": 0.99 122 | }, 123 | "m4.4xlarge": { 124 | "baseline": 1.99 125 | }, 126 | "m4.10xlarge": { 127 | "baseline": 9.85 128 | }, 129 | "m4.16xlarge": { 130 | "baseline": 19.95 131 | }, 132 | "m5.large": { 133 | "baseline": 0.74, 134 | "burst": 10.04 135 | }, 136 | "m5.xlarge": { 137 | "baseline": 1.24, 138 | "burst": 10.04 139 | }, 140 | "m5.2xlarge": { 141 | "baseline": 2.49, 142 | "burst": 10.04 143 | }, 144 | "m5.4xlarge": { 145 | "baseline": 4.97, 146 | "burst": 10.04 147 | }, 148 | "m5.12xlarge": { 149 | "baseline": 10.04 150 | }, 151 | "m5.24xlarge": { 152 | "baseline": 21.49 153 | }, 154 | "p2.xlarge": { 155 | "baseline": 1.24 156 | }, 157 | "p2.8xlarge": { 158 | "baseline": 10.09 159 | }, 160 | "p2.16xlarge": { 161 | "baseline": 21.05 162 | }, 163 | "p3.2xlarge": { 164 | "baseline": 2.48, 165 | "burst": 10.09 166 | }, 167 | "p3.8xlarge": { 168 | "baseline": 10.09 169 | }, 170 | "p3.16xlarge": { 171 | "baseline": 21.3 172 | }, 173 | "r3.large": { 174 | "baseline": 0.5 175 | }, 176 | "r3.xlarge": { 177 | "baseline": 0.69 178 | }, 179 | "r3.2xlarge": { 180 | "baseline": 0.99 181 | }, 182 | "r3.4xlarge": { 183 | "baseline": 1.98 184 | }, 185 | "r3.8xlarge": { 186 | "baseline": 4.96 187 | }, 188 | "r4.large": { 189 | "baseline": 0.74, 190 | "burst": 10.09 191 | }, 192 | "r4.xlarge": { 193 | "baseline": 1.24, 194 | "burst": 10.09 195 | }, 196 | "r4.2xlarge": { 197 | "baseline": 2.48, 198 | "burst": 10.09 199 | }, 200 | "r4.4xlarge": { 201 | "baseline": 4.99, 202 | "burst": 10.09 203 | }, 204 | "r4.8xlarge": { 205 | "baseline": 10.09 206 | }, 207 | "r4.16xlarge": { 208 | "baseline": 22.03 209 | }, 210 | "r5.large": { 211 | "baseline": 0.74, 212 | "burst": 10.04 213 | }, 214 | "r5.xlarge": { 215 | "baseline": 1.24, 216 | "burst": 10.04 217 | }, 218 | "r5.2xlarge": { 219 | "baseline": 2.49, 220 | "burst": 10.04 221 | }, 222 | "r5.4xlarge": { 223 | "baseline": 4.97, 224 | "burst": 10.04 225 | }, 226 | "r5.12xlarge": { 227 | "baseline": 12.04 228 | }, 229 | "r5.24xlarge": { 230 | "baseline": 23.51 231 | }, 232 | "t1.micro": { 233 | "baseline": 0.07 234 | }, 235 | "t2.nano": { 236 | "baseline": 0.03, 237 | "burst": 0.28 238 | }, 239 | "t2.micro": { 240 | "baseline": 0.06, 241 | "burst": 0.72 242 | }, 243 | "t2.small": { 244 | "baseline": 0.13, 245 | "burst": 0.59 246 | }, 247 | "t2.medium": { 248 | "baseline": 0.25, 249 | "burst": 0.65 250 | }, 251 | "t2.large": { 252 | "baseline": 0.51, 253 | "burst": 0.78 254 | }, 255 | "t2.xlarge": { 256 | "baseline": 0.74, 257 | "burst": 0.89 258 | }, 259 | "t2.2xlarge": { 260 | "baseline": 0.99 261 | }, 262 | "t3.nano": { 263 | "baseline": 0.03, 264 | "burst": 5.06 265 | }, 266 | "t3.micro": { 267 | "baseline": 0.06, 268 | "burst": 5.09 269 | }, 270 | "t3.small": { 271 | "baseline": 0.13, 272 | "burst": 5.11 273 | }, 274 | "t3.medium": { 275 | "baseline": 0.25, 276 | "burst": 4.98 277 | }, 278 | "t3.large": { 279 | "baseline": 0.51, 280 | "burst": 5.11 281 | }, 282 | "t3.xlarge": { 283 | "baseline": 1.02, 284 | "burst": 5.11 285 | }, 286 | "t3.2xlarge": { 287 | "baseline": 2.04, 288 | "burst": 5.11 289 | }, 290 | "t3a.nano": { 291 | "baseline": 0.03, 292 | "burst": 5.06 293 | }, 294 | "t3a.micro": { 295 | "baseline": 0.06, 296 | "burst": 5.09 297 | }, 298 | "t3a.small": { 299 | "baseline": 0.13, 300 | "burst": 5.11 301 | }, 302 | "t3a.medium": { 303 | "baseline": 0.25, 304 | "burst": 4.98 305 | }, 306 | "t3a.large": { 307 | "baseline": 0.51, 308 | "burst": 5.11 309 | }, 310 | "t3a.xlarge": { 311 | "baseline": 1.02, 312 | "burst": 5.11 313 | }, 314 | "t3a.2xlarge": { 315 | "baseline": 2.04, 316 | "burst": 5.11 317 | }, 318 | "x1.16xlarge": { 319 | "baseline": 10.09 320 | }, 321 | "x1.32xlarge": { 322 | "baseline": 20.91, 323 | "burst": 23.2 324 | }, 325 | "x1e.xlarge": { 326 | "baseline": 0.62, 327 | "burst": 10.09 328 | }, 329 | "x1e.2xlarge": { 330 | "baseline": 1.24, 331 | "burst": 10.09 332 | }, 333 | "x1e.4xlarge": { 334 | "baseline": 2.48, 335 | "burst": 10.07 336 | }, 337 | "x1e.8xlarge": { 338 | "baseline": 4.99, 339 | "burst": 10.09 340 | }, 341 | "x1e.16xlarge": { 342 | "baseline": 10.09 343 | }, 344 | "x1e.32xlarge": { 345 | "baseline": 22.13 346 | }, 347 | "z1d.large": { 348 | "baseline": 0.74, 349 | "burst": 10.03 350 | }, 351 | "z1d.xlarge": { 352 | "baseline": 1.24, 353 | "burst": 10.04 354 | }, 355 | "z1d.2xlarge": { 356 | "baseline": 2.49, 357 | "burst": 10.04 358 | }, 359 | "z1d.3xlarge": { 360 | "baseline": 4.97, 361 | "burst": 10.04 362 | }, 363 | "z1d.6xlarge": { 364 | "baseline": 12.05 365 | }, 366 | "z1d.12xlarge": { 367 | "baseline": 23.14 368 | } 369 | } -------------------------------------------------------------------------------- /doc/update1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marbot-io/monitoring-jump-start/3a7d6cd5a09c26e425b5593b2a1d91e76f019763/doc/update1.png -------------------------------------------------------------------------------- /doc/update2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marbot-io/monitoring-jump-start/3a7d6cd5a09c26e425b5593b2a1d91e76f019763/doc/update2.png -------------------------------------------------------------------------------- /doc/update3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marbot-io/monitoring-jump-start/3a7d6cd5a09c26e425b5593b2a1d91e76f019763/doc/update3.png -------------------------------------------------------------------------------- /marbot-alb.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: Application Load Balancer (ALB) monitoring (https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | - Label: 26 | default: 'ALB' 27 | Parameters: 28 | - LoadBalancerFullName 29 | - TargetGroupFullName 30 | - Label: 31 | default: 'Thresholds' 32 | Parameters: 33 | - ALB5XXCountThreshold 34 | - ALBRejectedConnectionCountThreshold 35 | - Target5XXCountThreshold 36 | - TargetConnectionErrorCountThreshold 37 | Parameters: 38 | EndpointId: 39 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 40 | Type: String 41 | Stage: 42 | Description: 'marbot stage (never change this!).' 43 | Type: String 44 | Default: v1 45 | AllowedValues: [v1, dev] 46 | LoadBalancerFullName: 47 | Description: 'The full name of the load balancer (last part of ARN, e.g., app/load-balancer-name/1234567890123456).' 48 | Type: String 49 | TargetGroupFullName: 50 | Description: 'The full name of the target group (last part of ARN, e.g., targetgroup/target-group-name/1234567890123456).' 51 | Type: String 52 | ALB5XXCountThreshold: 53 | Description: 'The maximum number of 5XX responses from the ALB (not the targets) (set to -1 to disable).' 54 | Type: Number 55 | Default: 0 56 | MinValue: -1 57 | ALBRejectedConnectionCountThreshold: 58 | Description: 'The maximum number of connections that were rejected because the ALB had reached its maximum number of connections (set -1 to disable).' 59 | Type: Number 60 | Default: 0 61 | MinValue: -1 62 | Target5XXCountThreshold: 63 | Description: 'The maximum number of 5XX responses from the targets (set -1 to disable).' 64 | Type: Number 65 | Default: 0 66 | MinValue: -1 67 | TargetConnectionErrorCountThreshold: 68 | Description: 'The maximum number of connection errors from the ALB to the targets (set -1 to disable).' 69 | Type: Number 70 | Default: 0 71 | MinValue: -1 72 | Conditions: 73 | HasALB5XXCountThreshold: !Not [!Equals [!Ref ALB5XXCountThreshold, '-1']] 74 | HasALBRejectedConnectionCountThreshold: !Not [!Equals [!Ref ALBRejectedConnectionCountThreshold, '-1']] 75 | HasTarget5XXCountThreshold: !Not [!Equals [!Ref Target5XXCountThreshold, '-1']] 76 | HasTargetConnectionErrorCountThreshold: !Not [!Equals [!Ref TargetConnectionErrorCountThreshold, '-1']] 77 | Resources: 78 | ########################################################################## 79 | # # 80 | # TOPIC # 81 | # # 82 | ########################################################################## 83 | Topic: 84 | Type: 'AWS::SNS::Topic' 85 | Properties: {} 86 | TopicPolicy: 87 | Type: 'AWS::SNS::TopicPolicy' 88 | Properties: 89 | PolicyDocument: 90 | Id: Id1 91 | Version: '2012-10-17' 92 | Statement: 93 | - Sid: Sid1 94 | Effect: Allow 95 | Principal: 96 | AWS: '*' # Allow CloudWatch Alarms 97 | Action: 'sns:Publish' 98 | Resource: !Ref Topic 99 | Condition: 100 | StringEquals: 101 | 'AWS:SourceOwner': !Ref 'AWS::AccountId' 102 | Topics: 103 | - !Ref Topic 104 | TopicEndpointSubscription: 105 | DependsOn: TopicPolicy 106 | Type: 'AWS::SNS::Subscription' 107 | Properties: 108 | DeliveryPolicy: 109 | healthyRetryPolicy: 110 | minDelayTarget: 1 111 | maxDelayTarget: 60 112 | numRetries: 100 113 | numNoDelayRetries: 0 114 | backoffFunction: exponential 115 | throttlePolicy: 116 | maxReceivesPerSecond: 1 117 | Endpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 118 | Protocol: https 119 | TopicArn: !Ref Topic 120 | MonitoringJumpStartEvent: 121 | DependsOn: TopicEndpointSubscription 122 | Type: 'AWS::Events::Rule' 123 | Properties: 124 | Description: 'Monitoring Jump Start connection. (created by marbot)' 125 | ScheduleExpression: 'rate(30 days)' 126 | State: ENABLED 127 | Targets: 128 | - Arn: !Ref Topic 129 | Id: marbot 130 | Input: !Sub | 131 | { 132 | "Type": "monitoring-jump-start-connection", 133 | "StackTemplate": "marbot-alb", 134 | "StackVersion": "1.0.0", 135 | "Partition": "${AWS::Partition}", 136 | "AccountId": "${AWS::AccountId}", 137 | "Region": "${AWS::Region}", 138 | "StackId": "${AWS::StackId}", 139 | "StackName": "${AWS::StackName}" 140 | } 141 | ########################################################################## 142 | # # 143 | # ALARMS # 144 | # # 145 | ########################################################################## 146 | ALB5XXCountTooHighAlarm: 147 | Condition: HasALB5XXCountThreshold 148 | DependsOn: TopicEndpointSubscription 149 | Type: 'AWS::CloudWatch::Alarm' 150 | Properties: 151 | AlarmDescription: 'Number of 5XX responses from ALB over the last minute too high. (created by marbot)' 152 | Namespace: 'AWS/ApplicationELB' 153 | MetricName: HTTPCode_ELB_5XX_Count 154 | Dimensions: 155 | - Name: LoadBalancer 156 | Value: !Ref LoadBalancerFullName 157 | Threshold: !Ref ALB5XXCountThreshold 158 | ComparisonOperator: GreaterThanThreshold 159 | Statistic: Sum 160 | Period: 60 161 | EvaluationPeriods: 1 162 | AlarmActions: 163 | - !Ref Topic 164 | OKActions: 165 | - !Ref Topic 166 | TreatMissingData: notBreaching 167 | ALBRejectedConnectionCountTooHighAlarm: 168 | Condition: HasALBRejectedConnectionCountThreshold 169 | DependsOn: TopicEndpointSubscription 170 | Type: 'AWS::CloudWatch::Alarm' 171 | Properties: 172 | AlarmDescription: 'Number of rejected connections by ALB too high, ALB needs time to scale up. (created by marbot)' 173 | Namespace: 'AWS/ApplicationELB' 174 | MetricName: RejectedConnectionCount 175 | Dimensions: 176 | - Name: LoadBalancer 177 | Value: !Ref LoadBalancerFullName 178 | Threshold: !Ref ALBRejectedConnectionCountThreshold 179 | ComparisonOperator: GreaterThanThreshold 180 | Statistic: Sum 181 | Period: 60 182 | EvaluationPeriods: 1 183 | AlarmActions: 184 | - !Ref Topic 185 | OKActions: 186 | - !Ref Topic 187 | TreatMissingData: notBreaching 188 | Target5XXCountTooHighAlarm: 189 | Condition: HasTarget5XXCountThreshold 190 | DependsOn: TopicEndpointSubscription 191 | Type: 'AWS::CloudWatch::Alarm' 192 | Properties: 193 | AlarmDescription: 'Number of 5XX responses from targets over the last minute too high. (created by marbot)' 194 | Namespace: 'AWS/ApplicationELB' 195 | MetricName: HTTPCode_Target_5XX_Count 196 | Dimensions: 197 | - Name: LoadBalancer 198 | Value: !Ref LoadBalancerFullName 199 | - Name: TargetGroup 200 | Value: !Ref TargetGroupFullName 201 | Threshold: !Ref Target5XXCountThreshold 202 | ComparisonOperator: GreaterThanThreshold 203 | Statistic: Sum 204 | Period: 60 205 | EvaluationPeriods: 1 206 | AlarmActions: 207 | - !Ref Topic 208 | OKActions: 209 | - !Ref Topic 210 | TreatMissingData: notBreaching 211 | TargetConnectionErrorCountTooHighAlarm: 212 | Condition: HasTargetConnectionErrorCountThreshold 213 | DependsOn: TopicEndpointSubscription 214 | Type: 'AWS::CloudWatch::Alarm' 215 | Properties: 216 | AlarmDescription: 'Number of rejected connections from ALB to targets over the last minute too high. (created by marbot)' 217 | Namespace: 'AWS/ApplicationELB' 218 | MetricName: TargetConnectionErrorCount 219 | Dimensions: 220 | - Name: LoadBalancer 221 | Value: !Ref LoadBalancerFullName 222 | - Name: TargetGroup 223 | Value: !Ref TargetGroupFullName 224 | Threshold: !Ref TargetConnectionErrorCountThreshold 225 | ComparisonOperator: GreaterThanThreshold 226 | Statistic: Sum 227 | Period: 60 228 | EvaluationPeriods: 1 229 | AlarmActions: 230 | - !Ref Topic 231 | OKActions: 232 | - !Ref Topic 233 | TreatMissingData: notBreaching 234 | Outputs: 235 | StackName: 236 | Description: 'Stack name.' 237 | Value: !Sub '${AWS::StackName}' 238 | StackTemplate: 239 | Description: 'Stack template.' 240 | Value: 'marbot-alb' 241 | StackVersion: 242 | Description: 'Stack version.' 243 | Value: '1.0.0' 244 | -------------------------------------------------------------------------------- /marbot-auto-scaling-group.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: Auto Scaling Group monitoring (https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | - Label: 26 | default: 'EC2' 27 | Parameters: 28 | - AutoScalingGroupName 29 | - Label: 30 | default: 'Thresholds' 31 | Parameters: 32 | - CPUUtilizationThreshold 33 | - CPUCreditBalanceThreshold 34 | - EBSIOCreditBalanceThreshold 35 | - EBSThroughputCreditBalanceThreshold 36 | Parameters: 37 | EndpointId: 38 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 39 | Type: String 40 | Stage: 41 | Description: 'marbot stage (never change this!).' 42 | Type: String 43 | Default: v1 44 | AllowedValues: [v1, dev] 45 | AutoScalingGroupName: 46 | Description: 'The name of the Auto Scaling Group that you want to monitor.' 47 | Type: 'String' 48 | CPUUtilizationThreshold: 49 | Description: 'The maximum percentage of CPU utilization (set to -1 to disable).' 50 | Type: Number 51 | Default: 80 52 | MinValue: -1 53 | MaxValue: 100 54 | CPUCreditBalanceThreshold: 55 | Description: 'The minimum number of CPU credits available (t* instances only; set to -1 to disable).' 56 | Type: Number 57 | Default: 20 58 | MinValue: -1 59 | EBSIOCreditBalanceThreshold: 60 | Description: 'The minimum percentage of I/O credits remaining in the burst bucket (smaller instance only; set to -1 to disable).' 61 | Type: Number 62 | Default: 20 63 | MinValue: -1 64 | MaxValue: 100 65 | EBSThroughputCreditBalanceThreshold: 66 | Description: 'The minimum percentage of throughput credits remaining in the burst bucket (smaller instance only; set to -1 to disable).' 67 | Type: Number 68 | Default: 20 69 | MinValue: -1 70 | MaxValue: 100 71 | Conditions: 72 | HasCPUUtilizationThreshold: !Not [!Equals [!Ref CPUUtilizationThreshold, '-1']] 73 | HasCPUCreditBalanceThreshold: !Not [!Equals [!Ref CPUCreditBalanceThreshold, '-1']] 74 | HasEBSIOCreditBalanceThreshold: !Not [!Equals [!Ref EBSIOCreditBalanceThreshold, '-1']] 75 | HasEBSThroughputCreditBalanceThreshold: !Not [!Equals [!Ref EBSThroughputCreditBalanceThreshold, '-1']] 76 | Resources: 77 | ########################################################################## 78 | # # 79 | # TOPIC # 80 | # # 81 | ########################################################################## 82 | Topic: 83 | Type: 'AWS::SNS::Topic' 84 | Properties: {} 85 | TopicPolicy: 86 | Type: 'AWS::SNS::TopicPolicy' 87 | Properties: 88 | PolicyDocument: 89 | Id: Id1 90 | Version: '2012-10-17' 91 | Statement: 92 | - Sid: Sid1 93 | Effect: Allow 94 | Principal: 95 | Service: 'events.amazonaws.com' # Allow EventBridge 96 | Action: 'sns:Publish' 97 | Resource: !Ref Topic 98 | - Sid: Sid2 99 | Effect: Allow 100 | Principal: 101 | AWS: '*' # Allow CloudWatch Alarms 102 | Action: 'sns:Publish' 103 | Resource: !Ref Topic 104 | Condition: 105 | StringEquals: 106 | 'AWS:SourceOwner': !Ref 'AWS::AccountId' 107 | Topics: 108 | - !Ref Topic 109 | TopicEndpointSubscription: 110 | DependsOn: TopicPolicy 111 | Type: 'AWS::SNS::Subscription' 112 | Properties: 113 | DeliveryPolicy: 114 | healthyRetryPolicy: 115 | minDelayTarget: 1 116 | maxDelayTarget: 60 117 | numRetries: 100 118 | numNoDelayRetries: 0 119 | backoffFunction: exponential 120 | throttlePolicy: 121 | maxReceivesPerSecond: 1 122 | Endpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 123 | Protocol: https 124 | TopicArn: !Ref Topic 125 | MonitoringJumpStartEvent: 126 | DependsOn: TopicEndpointSubscription 127 | Type: 'AWS::Events::Rule' 128 | Properties: 129 | Description: 'Monitoring Jump Start connection. (created by marbot)' 130 | ScheduleExpression: 'rate(30 days)' 131 | State: ENABLED 132 | Targets: 133 | - Arn: !Ref Topic 134 | Id: marbot 135 | Input: !Sub | 136 | { 137 | "Type": "monitoring-jump-start-connection", 138 | "StackTemplate": "marbot-auto-scaling-group", 139 | "StackVersion": "1.4.0", 140 | "Partition": "${AWS::Partition}", 141 | "AccountId": "${AWS::AccountId}", 142 | "Region": "${AWS::Region}", 143 | "StackId": "${AWS::StackId}", 144 | "StackName": "${AWS::StackName}" 145 | } 146 | ########################################################################## 147 | # # 148 | # ALARMS # 149 | # # 150 | ########################################################################## 151 | CPUUtilizationTooHighAlarm: 152 | Condition: HasCPUUtilizationThreshold 153 | DependsOn: TopicEndpointSubscription 154 | Type: 'AWS::CloudWatch::Alarm' 155 | Properties: 156 | AlarmActions: 157 | - !Ref Topic 158 | AlarmDescription: 'Average CPU utilization over last 10 minutes too high. (created by marbot)' 159 | ComparisonOperator: GreaterThanThreshold 160 | Dimensions: 161 | - Name: AutoScalingGroupName 162 | Value: !Ref AutoScalingGroupName 163 | EvaluationPeriods: 1 164 | MetricName: CPUUtilization 165 | Namespace: 'AWS/EC2' 166 | OKActions: 167 | - !Ref Topic 168 | Period: 600 169 | Statistic: Average 170 | Threshold: !Ref CPUUtilizationThreshold 171 | TreatMissingData: notBreaching 172 | CPUCreditBalanceTooLowAlarm: 173 | Condition: HasCPUCreditBalanceThreshold 174 | DependsOn: TopicEndpointSubscription 175 | Type: 'AWS::CloudWatch::Alarm' 176 | Properties: 177 | AlarmActions: 178 | - !Ref Topic 179 | AlarmDescription: 'Average CPU credit balance over last 10 minutes too low, expect a significant performance drop soon. (created by marbot)' 180 | ComparisonOperator: LessThanThreshold 181 | Dimensions: 182 | - Name: AutoScalingGroupName 183 | Value: !Ref AutoScalingGroupName 184 | EvaluationPeriods: 1 185 | MetricName: CPUCreditBalance 186 | Namespace: 'AWS/EC2' 187 | OKActions: 188 | - !Ref Topic 189 | Period: 600 190 | Statistic: Average 191 | Threshold: !Ref CPUCreditBalanceThreshold 192 | TreatMissingData: notBreaching 193 | EBSIOCreditBalanceTooLowAlarm: 194 | Condition: HasEBSIOCreditBalanceThreshold 195 | DependsOn: TopicEndpointSubscription 196 | Type: 'AWS::CloudWatch::Alarm' 197 | Properties: 198 | AlarmActions: 199 | - !Ref Topic 200 | AlarmDescription: 'Average EBS IO credit balance over last 10 minutes too low, expect a significant performance drop soon. (created by marbot)' 201 | ComparisonOperator: LessThanThreshold 202 | Dimensions: 203 | - Name: AutoScalingGroupName 204 | Value: !Ref AutoScalingGroupName 205 | EvaluationPeriods: 1 206 | MetricName: 'EBSIOBalance%' 207 | Namespace: 'AWS/EC2' 208 | OKActions: 209 | - !Ref Topic 210 | Period: 600 211 | Statistic: Average 212 | Threshold: !Ref EBSIOCreditBalanceThreshold 213 | TreatMissingData: notBreaching 214 | EBSThroughputCreditBalanceTooLowAlarm: 215 | Condition: HasEBSThroughputCreditBalanceThreshold 216 | DependsOn: TopicEndpointSubscription 217 | Type: 'AWS::CloudWatch::Alarm' 218 | Properties: 219 | AlarmActions: 220 | - !Ref Topic 221 | AlarmDescription: 'Average EBS throughput credit balance over last 10 minutes too low, expect a significant performance drop soon. (created by marbot)' 222 | ComparisonOperator: LessThanThreshold 223 | Dimensions: 224 | - Name: AutoScalingGroupName 225 | Value: !Ref AutoScalingGroupName 226 | EvaluationPeriods: 1 227 | MetricName: 'EBSByteBalance%' 228 | Namespace: 'AWS/EC2' 229 | OKActions: 230 | - !Ref Topic 231 | Period: 600 232 | Statistic: Average 233 | Threshold: !Ref EBSThroughputCreditBalanceThreshold 234 | TreatMissingData: notBreaching 235 | # TODO add network in+out 236 | ########################################################################## 237 | # # 238 | # EVENTS # 239 | # # 240 | ########################################################################## 241 | UnsuccessfulEvent: 242 | DependsOn: TopicEndpointSubscription 243 | Type: 'AWS::Events::Rule' 244 | Properties: 245 | Description: 'EC2 Auto Scaling failed to launch or terminate an instance. (created by marbot)' 246 | EventPattern: 247 | source: 248 | - 'aws.autoscaling' 249 | 'detail-type': 250 | - 'EC2 Instance Launch Unsuccessful' 251 | - 'EC2 Instance Terminate Unsuccessful' 252 | - 'EC2 Auto Scaling Instance Refresh Failed' 253 | detail: 254 | AutoScalingGroupName: 255 | - !Ref AutoScalingGroupName 256 | State: ENABLED 257 | Targets: 258 | - Arn: !Ref Topic 259 | Id: marbot 260 | Outputs: 261 | StackName: 262 | Description: 'Stack name.' 263 | Value: !Sub '${AWS::StackName}' 264 | StackTemplate: 265 | Description: 'Stack template.' 266 | Value: 'marbot-auto-scaling-group' 267 | StackVersion: 268 | Description: 'Stack version.' 269 | Value: '1.4.0' 270 | -------------------------------------------------------------------------------- /marbot-cloudfront.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: CloudFront monitoring (https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | - Label: 26 | default: 'CloudFront' 27 | Parameters: 28 | - CloudFrontDistributionId 29 | - Label: 30 | default: 'Thresholds' 31 | Parameters: 32 | - 5xxErrorRateThreshold 33 | - LambdaExecutionErrorThreshold 34 | - LambdaValidationErrorThreshold 35 | - CacheHitRateThreshold 36 | - OriginLatenyThreshold 37 | Parameters: 38 | EndpointId: 39 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 40 | Type: String 41 | Stage: 42 | Description: 'marbot stage (never change this!).' 43 | Type: String 44 | Default: v1 45 | AllowedValues: [v1, dev] 46 | CloudFrontDistributionId: 47 | Description: 'The CloudFront distribution ID that you want to monitor.' 48 | Type: String 49 | 5xxErrorRateThreshold: 50 | Description: 'The 5xx error rate threshold (set to -1 to disable).' 51 | Type: Number 52 | Default: 0.01 53 | MinValue: -1 54 | LambdaExecutionErrorThreshold: 55 | Description: 'The Lambda@Edge execution error count threshold (set to -1 to disable).' 56 | Type: Number 57 | Default: -1 58 | MinValue: -1 59 | LambdaValidationErrorThreshold: 60 | Description: 'The Lambda@Edge validation error count threshold (set to -1 to disable).' 61 | Type: Number 62 | Default: -1 63 | MinValue: -1 64 | CacheHitRateThreshold: 65 | Description: 'The cache hit rate threshold (set to -1 to disable; requires detailed monitoring).' 66 | Type: Number 67 | Default: -1 68 | MinValue: -1 69 | OriginLatenyThreshold: 70 | Description: 'The origin latency threshold in ms using the 99.5 percentile (set to -1 to disable; requires detailed monitoring).' 71 | Type: Number 72 | Default: -1 73 | MinValue: -1 74 | Conditions: 75 | Has5xxErrorRateThreshold: !Not [!Equals [!Ref 5xxErrorRateThreshold, '-1']] 76 | HasLambdaExecutionErrorThreshold: !Not [!Equals [!Ref LambdaExecutionErrorThreshold, '-1']] 77 | HasLambdaValidationErrorThreshold: !Not [!Equals [!Ref LambdaValidationErrorThreshold, '-1']] 78 | HasCacheHitRateThreshold: !Not [!Equals [!Ref CacheHitRateThreshold, '-1']] 79 | HasOriginLatenyThreshold: !Not [!Equals [!Ref OriginLatenyThreshold, '-1']] 80 | Resources: 81 | ########################################################################## 82 | # # 83 | # TOPIC # 84 | # # 85 | ########################################################################## 86 | Topic: 87 | Type: 'AWS::SNS::Topic' 88 | Properties: {} 89 | TopicPolicy: 90 | Type: 'AWS::SNS::TopicPolicy' 91 | Properties: 92 | PolicyDocument: 93 | Id: Id1 94 | Version: '2012-10-17' 95 | Statement: 96 | - Sid: Sid1 97 | Effect: Allow 98 | Principal: 99 | Service: 'events.amazonaws.com' # Allow EventBridge 100 | Action: 'sns:Publish' 101 | Resource: !Ref Topic 102 | - Sid: Sid2 103 | Effect: Allow 104 | Principal: 105 | AWS: '*' # Allow CloudWatch Alarms 106 | Action: 'sns:Publish' 107 | Resource: !Ref Topic 108 | Condition: 109 | StringEquals: 110 | 'AWS:SourceOwner': !Ref 'AWS::AccountId' 111 | Topics: 112 | - !Ref Topic 113 | TopicEndpointSubscription: 114 | DependsOn: TopicPolicy 115 | Type: 'AWS::SNS::Subscription' 116 | Properties: 117 | DeliveryPolicy: 118 | healthyRetryPolicy: 119 | minDelayTarget: 1 120 | maxDelayTarget: 60 121 | numRetries: 100 122 | numNoDelayRetries: 0 123 | backoffFunction: exponential 124 | throttlePolicy: 125 | maxReceivesPerSecond: 1 126 | Endpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 127 | Protocol: https 128 | TopicArn: !Ref Topic 129 | MonitoringJumpStartEvent: 130 | DependsOn: TopicEndpointSubscription 131 | Type: 'AWS::Events::Rule' 132 | Properties: 133 | Description: 'Monitoring Jump Start connection. (created by marbot)' 134 | ScheduleExpression: 'rate(30 days)' 135 | State: ENABLED 136 | Targets: 137 | - Arn: !Ref Topic 138 | Id: marbot 139 | Input: !Sub | 140 | { 141 | "Type": "monitoring-jump-start-connection", 142 | "StackTemplate": "marbot-cloudfront", 143 | "StackVersion": "1.1.1", 144 | "Partition": "${AWS::Partition}", 145 | "AccountId": "${AWS::AccountId}", 146 | "Region": "${AWS::Region}", 147 | "StackId": "${AWS::StackId}", 148 | "StackName": "${AWS::StackName}" 149 | } 150 | ########################################################################## 151 | # # 152 | # ALARMS # 153 | # # 154 | ########################################################################## 155 | 5xxErrorRateTooHighAlarm: 156 | Condition: Has5xxErrorRateThreshold 157 | Type: 'AWS::CloudWatch::Alarm' 158 | Properties: 159 | AlarmActions: 160 | - !Ref Topic 161 | AlarmDescription: 'CloudFront distribution returns too many 5xx errors.' 162 | ComparisonOperator: GreaterThanThreshold 163 | Dimensions: 164 | - Name: Region 165 | Value: Global 166 | - Name: DistributionId 167 | Value: !Ref CloudFrontDistributionId 168 | EvaluationPeriods: 1 169 | MetricName: 5xxErrorRate 170 | Namespace: 'AWS/CloudFront' 171 | OKActions: 172 | - !Ref Topic 173 | Period: 60 174 | Statistic: Average 175 | Threshold: !Ref 5xxErrorRateThreshold 176 | TreatMissingData: notBreaching 177 | LambdaExecutionErrorTooHighAlarm: 178 | Condition: HasLambdaExecutionErrorThreshold 179 | Type: 'AWS::CloudWatch::Alarm' 180 | Properties: 181 | AlarmActions: 182 | - !Ref Topic 183 | AlarmDescription: 'CloudFront distribution failed to execute Lambde@Edge.' 184 | ComparisonOperator: GreaterThanThreshold 185 | Dimensions: 186 | - Name: Region 187 | Value: Global 188 | - Name: DistributionId 189 | Value: !Ref CloudFrontDistributionId 190 | EvaluationPeriods: 1 191 | MetricName: LambdaExecutionError 192 | Namespace: 'AWS/CloudFront' 193 | OKActions: 194 | - !Ref Topic 195 | Period: 60 196 | Statistic: Sum 197 | Threshold: !Ref LambdaValidationErrorThreshold 198 | TreatMissingData: notBreaching 199 | LambdaValidationErrorTooHighAlarm: 200 | Condition: HasLambdaValidationErrorThreshold 201 | Type: 'AWS::CloudWatch::Alarm' 202 | Properties: 203 | AlarmActions: 204 | - !Ref Topic 205 | AlarmDescription: 'CloudFront distribution received invalid response from Lambde@Edge.' 206 | ComparisonOperator: GreaterThanThreshold 207 | Dimensions: 208 | - Name: Region 209 | Value: Global 210 | - Name: DistributionId 211 | Value: !Ref CloudFrontDistributionId 212 | EvaluationPeriods: 1 213 | MetricName: LambdaValidationError 214 | Namespace: 'AWS/CloudFront' 215 | OKActions: 216 | - !Ref Topic 217 | Period: 60 218 | Statistic: Sum 219 | Threshold: !Ref LambdaValidationErrorThreshold 220 | TreatMissingData: notBreaching 221 | CacheHitRateAlarm: 222 | Condition: HasCacheHitRateThreshold 223 | Type: 'AWS::CloudWatch::Alarm' 224 | Properties: 225 | AlarmActions: 226 | - !Ref Topic 227 | AlarmDescription: 'Cache hit rate on CloudFront distribution too low.' 228 | ComparisonOperator: LessThanThreshold 229 | Dimensions: 230 | - Name: Region 231 | Value: Global 232 | - Name: DistributionId 233 | Value: !Ref CloudFrontDistributionId 234 | EvaluationPeriods: 1 235 | MetricName: CacheHitRate 236 | Namespace: 'AWS/CloudFront' 237 | OKActions: 238 | - !Ref Topic 239 | Period: 900 240 | Statistic: Average 241 | Threshold: !Ref CacheHitRateThreshold 242 | TreatMissingData: notBreaching 243 | OriginLatencyAlarm: 244 | Condition: HasOriginLatenyThreshold 245 | Type: 'AWS::CloudWatch::Alarm' 246 | Properties: 247 | AlarmActions: 248 | - !Ref Topic 249 | AlarmDescription: 'Low latency from CloudFront to origin.' 250 | ComparisonOperator: GreaterThanThreshold 251 | Dimensions: 252 | - Name: Region 253 | Value: Global 254 | - Name: DistributionId 255 | Value: !Ref CloudFrontDistributionId 256 | EvaluationPeriods: 1 257 | MetricName: OriginLatency 258 | Namespace: 'AWS/CloudFront' 259 | OKActions: 260 | - !Ref Topic 261 | Period: 900 262 | ExtendedStatistic: 'p99.5' 263 | Threshold: !Ref OriginLatenyThreshold 264 | TreatMissingData: notBreaching 265 | Outputs: 266 | StackName: 267 | Description: 'Stack name.' 268 | Value: !Sub '${AWS::StackName}' 269 | StackTemplate: 270 | Description: 'Stack template.' 271 | Value: 'marbot-cloudfront' 272 | StackVersion: 273 | Description: 'Stack version.' 274 | Value: '1.1.1' 275 | -------------------------------------------------------------------------------- /marbot-ec2-instances-nested.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | # 16 | # ONLY USE AS A NESTED STACK OF marbot-ec2-instances.yaml 17 | AWSTemplateFormatVersion: '2010-09-09' 18 | Description: 'marbot.io: EC2 instances (up to ten) monitoring [nested] (https://github.com/marbot-io/monitoring-jump-start)' 19 | Parameters: 20 | Topic: # ARN 21 | Type: String 22 | CustomResourceLambda: # ARN 23 | Type: String 24 | InstanceId: 25 | Type: String 26 | CPUUtilizationThreshold: 27 | Type: Number 28 | Default: 80 29 | MinValue: -1 30 | MaxValue: 100 31 | CPUCreditBalanceThreshold: 32 | Type: Number 33 | Default: 20 34 | MinValue: -1 35 | EBSIOCreditBalanceThreshold: 36 | Type: Number 37 | Default: 20 38 | MinValue: -1 39 | MaxValue: 100 40 | EBSThroughputCreditBalanceThreshold: 41 | Type: Number 42 | Default: 20 43 | MinValue: -1 44 | MaxValue: 100 45 | NetworkUtilizationThreshold: 46 | Type: Number 47 | Default: 80 48 | MinValue: -1 49 | MaxValue: 100 50 | Conditions: 51 | HasCPUUtilizationThreshold: !Not [!Equals [!Ref CPUUtilizationThreshold, '-1']] 52 | HasCPUCreditBalanceThreshold: !Not [!Equals [!Ref CPUCreditBalanceThreshold, '-1']] 53 | HasEBSIOCreditBalanceThreshold: !Not [!Equals [!Ref EBSIOCreditBalanceThreshold, '-1']] 54 | HasEBSThroughputCreditBalanceThreshold: !Not [!Equals [!Ref EBSThroughputCreditBalanceThreshold, '-1']] 55 | HasNetworkUtilizationThreshold: !Not [!Equals [!Ref NetworkUtilizationThreshold, '-1']] 56 | Resources: 57 | ########################################################################## 58 | # # 59 | # ALARMS # 60 | # # 61 | ########################################################################## 62 | CPUUtilizationTooHighAlarm: 63 | Condition: HasCPUUtilizationThreshold 64 | Type: 'AWS::CloudWatch::Alarm' 65 | Properties: 66 | AlarmActions: 67 | - !Ref Topic 68 | AlarmDescription: 'Average CPU utilization over last 10 minutes too high. (created by marbot)' 69 | ComparisonOperator: GreaterThanThreshold 70 | Dimensions: 71 | - Name: InstanceId 72 | Value: !Ref InstanceId 73 | EvaluationPeriods: 1 74 | MetricName: CPUUtilization 75 | Namespace: 'AWS/EC2' 76 | OKActions: 77 | - !Ref Topic 78 | Period: 600 79 | Statistic: Average 80 | Threshold: !Ref CPUUtilizationThreshold 81 | TreatMissingData: notBreaching 82 | CPUCreditBalanceTooLowAlarm: 83 | Condition: HasCPUCreditBalanceThreshold 84 | Type: 'AWS::CloudWatch::Alarm' 85 | Properties: 86 | AlarmActions: 87 | - !Ref Topic 88 | AlarmDescription: 'Average CPU credit balance over last 10 minutes too low, expect a significant performance drop soon. (created by marbot)' 89 | ComparisonOperator: LessThanThreshold 90 | Dimensions: 91 | - Name: InstanceId 92 | Value: !Ref InstanceId 93 | EvaluationPeriods: 1 94 | MetricName: CPUCreditBalance 95 | Namespace: 'AWS/EC2' 96 | OKActions: 97 | - !Ref Topic 98 | Period: 600 99 | Statistic: Average 100 | Threshold: !Ref CPUCreditBalanceThreshold 101 | TreatMissingData: notBreaching 102 | EBSIOCreditBalanceTooLowAlarm: 103 | Condition: HasEBSIOCreditBalanceThreshold 104 | Type: 'AWS::CloudWatch::Alarm' 105 | Properties: 106 | AlarmActions: 107 | - !Ref Topic 108 | AlarmDescription: 'Average EBS IO credit balance over last 10 minutes too low, expect a significant performance drop soon. (created by marbot)' 109 | ComparisonOperator: LessThanThreshold 110 | Dimensions: 111 | - Name: InstanceId 112 | Value: !Ref InstanceId 113 | EvaluationPeriods: 1 114 | MetricName: 'EBSIOBalance%' 115 | Namespace: 'AWS/EC2' 116 | OKActions: 117 | - !Ref Topic 118 | Period: 600 119 | Statistic: Average 120 | Threshold: !Ref EBSIOCreditBalanceThreshold 121 | TreatMissingData: notBreaching 122 | EBSThroughputCreditBalanceTooLowAlarm: 123 | Condition: HasEBSThroughputCreditBalanceThreshold 124 | Type: 'AWS::CloudWatch::Alarm' 125 | Properties: 126 | AlarmActions: 127 | - !Ref Topic 128 | AlarmDescription: 'Average EBS throughput credit balance over last 10 minutes too low, expect a significant performance drop soon. (created by marbot)' 129 | ComparisonOperator: LessThanThreshold 130 | Dimensions: 131 | - Name: InstanceId 132 | Value: !Ref InstanceId 133 | EvaluationPeriods: 1 134 | MetricName: 'EBSByteBalance%' 135 | Namespace: 'AWS/EC2' 136 | OKActions: 137 | - !Ref Topic 138 | Period: 600 139 | Statistic: Average 140 | Threshold: !Ref EBSThroughputCreditBalanceThreshold 141 | TreatMissingData: notBreaching 142 | StatusCheckFailedAlarm: 143 | Type: 'AWS::CloudWatch::Alarm' 144 | Properties: 145 | AlarmActions: 146 | - !Ref Topic 147 | AlarmDescription: 'EC2 instance status check or the system status check has failed. (created by marbot)' 148 | ComparisonOperator: GreaterThanThreshold 149 | Dimensions: 150 | - Name: InstanceId 151 | Value: !Ref InstanceId 152 | EvaluationPeriods: 1 153 | MetricName: StatusCheckFailed 154 | Namespace: 'AWS/EC2' 155 | OKActions: 156 | - !Ref Topic 157 | Period: 600 158 | Statistic: Sum 159 | Threshold: 0 160 | TreatMissingData: notBreaching 161 | NetworkBurstUtilizationTooHighAlarm: 162 | Condition: HasNetworkUtilizationThreshold 163 | Type: 'AWS::CloudWatch::Alarm' 164 | Properties: 165 | AlarmActions: 166 | - !Ref Topic 167 | AlarmDescription: 'Average Network In+Out burst utilization over last 10 minutes too high, expect a significant performance drop soon. (created by marbot)' 168 | ComparisonOperator: GreaterThanThreshold 169 | EvaluationPeriods: 1 170 | Metrics: 171 | - Id: 'in' 172 | Label: 'In' 173 | MetricStat: 174 | Metric: 175 | Namespace: 'AWS/EC2' 176 | MetricName: NetworkIn # bytes per minute 177 | Dimensions: 178 | - Name: InstanceId 179 | Value: !Ref InstanceId 180 | Period: 600 181 | Stat: Average 182 | Unit: Bytes 183 | ReturnData: false 184 | - Id: 'out' 185 | Label: 'Out' 186 | MetricStat: 187 | Metric: 188 | Namespace: 'AWS/EC2' 189 | MetricName: NetworkOut # bytes per minute 190 | Dimensions: 191 | - Name: InstanceId 192 | Value: !Ref InstanceId 193 | Period: 600 194 | Stat: Average 195 | Unit: Bytes 196 | ReturnData: false 197 | - Expression: '(in+out)/60*8/1000/1000/1000' # to Gbit/s 198 | Id: 'inout' 199 | Label: 'In+Out' 200 | ReturnData: true 201 | OKActions: 202 | - !Ref Topic 203 | Threshold: !GetAtt 'InstanceDetails.NetworkBurst' # in Gbit/s 204 | TreatMissingData: notBreaching 205 | NetworkBaselineUtilizationTooHighAlarm: 206 | Condition: HasNetworkUtilizationThreshold 207 | Type: 'AWS::CloudWatch::Alarm' 208 | Properties: 209 | AlarmActions: 210 | - !Ref Topic 211 | AlarmDescription: 'Average Network In+Out baseline utilization over last 10 minutes too high, you might can burst. (created by marbot)' 212 | ComparisonOperator: GreaterThanThreshold 213 | EvaluationPeriods: 1 214 | Metrics: 215 | - Id: in 216 | Label: In 217 | MetricStat: 218 | Metric: 219 | Namespace: 'AWS/EC2' 220 | MetricName: NetworkIn # bytes per minute 221 | Dimensions: 222 | - Name: InstanceId 223 | Value: !Ref InstanceId 224 | Period: 600 225 | Stat: Average 226 | Unit: Bytes 227 | ReturnData: false 228 | - Id: out 229 | Label: Out 230 | MetricStat: 231 | Metric: 232 | Namespace: 'AWS/EC2' 233 | MetricName: NetworkOut # bytes per minute 234 | Dimensions: 235 | - Name: InstanceId 236 | Value: !Ref InstanceId 237 | Period: 600 238 | Stat: Average 239 | Unit: Bytes 240 | ReturnData: false 241 | - Expression: '(in+out)/60*8/1000/1000/1000' # to Gbit/s 242 | Id: inout 243 | Label: 'In+Out' 244 | ReturnData: true 245 | OKActions: 246 | - !Ref Topic 247 | Threshold: !GetAtt 'InstanceDetails.NetworkBaseline' # in Gbit/s 248 | TreatMissingData: notBreaching 249 | NetworkUtilizationTooHighAlarm: 250 | Condition: HasNetworkUtilizationThreshold 251 | Type: 'AWS::CloudWatch::Alarm' 252 | Properties: 253 | AlarmActions: 254 | - !Ref Topic 255 | AlarmDescription: 'Average Network In+Out utilization over last 10 minutes too high. (created by marbot)' 256 | ComparisonOperator: GreaterThanThreshold 257 | EvaluationPeriods: 1 258 | Metrics: 259 | - Id: in 260 | Label: In 261 | MetricStat: 262 | Metric: 263 | Namespace: 'AWS/EC2' 264 | MetricName: NetworkIn # bytes out per minute 265 | Dimensions: 266 | - Name: InstanceId 267 | Value: !Ref InstanceId 268 | Period: 600 269 | Stat: Average 270 | Unit: Bytes 271 | ReturnData: false 272 | - Id: out 273 | Label: Out 274 | MetricStat: 275 | Metric: 276 | Namespace: 'AWS/EC2' 277 | MetricName: NetworkOut # bytes out per minute 278 | Dimensions: 279 | - Name: InstanceId 280 | Value: !Ref InstanceId 281 | Period: 600 282 | Stat: Average 283 | Unit: Bytes 284 | ReturnData: false 285 | - Expression: '(in+out)/60*8/1000/1000/1000' # to Gbit/s 286 | Id: inout 287 | Label: 'In+Out' 288 | ReturnData: true 289 | OKActions: 290 | - !Ref Topic 291 | Threshold: !GetAtt 'InstanceDetails.NetworkMaximum' # in Gbit/s 292 | TreatMissingData: notBreaching 293 | InstanceDetails: 294 | Type: 'Custom::InstanceDetails' 295 | Version: '1.0' 296 | Properties: 297 | InstanceId: !Ref InstanceId 298 | NetworkUtilizationThreshold: !Ref NetworkUtilizationThreshold 299 | ServiceToken: !Ref CustomResourceLambda 300 | -------------------------------------------------------------------------------- /marbot-efs.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: EFS file system monitoring (https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | - Label: 26 | default: 'EFS' 27 | Parameters: 28 | - FileSystemId 29 | - Label: 30 | default: 'Thresholds' 31 | Parameters: 32 | - BurstCreditBalanceThreshold 33 | Parameters: 34 | EndpointId: 35 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 36 | Type: String 37 | Stage: 38 | Description: 'marbot stage (never change this!).' 39 | Type: String 40 | Default: v1 41 | AllowedValues: [v1, dev] 42 | FileSystemId: 43 | Description: 'The EFS file system ID that you want to monitor.' 44 | Type: String 45 | BurstCreditBalanceThreshold: 46 | Description: 'The minimum number of burst credits that a file system should have (set to -1 to disable).' 47 | Type: Number 48 | Default: 192000000000 # 192 GB in Bytes (last ~30 minutes where you can burst at 100 MB/sec) 49 | MinValue: -1 50 | ThroughputThreshold: 51 | Description: 'The maximum percentage of throughput utilization (set to -1 to disable).' 52 | Type: Number 53 | Default: 80 54 | MinValue: -1 55 | Conditions: 56 | HasBurstCreditBalanceThreshold: !Not [!Equals [!Ref BurstCreditBalanceThreshold, '-1']] 57 | HasThroughputThreshold: !Not [!Equals [!Ref ThroughputThreshold, '-1']] 58 | Resources: 59 | ########################################################################## 60 | # # 61 | # TOPIC # 62 | # # 63 | ########################################################################## 64 | Topic: 65 | Type: 'AWS::SNS::Topic' 66 | Properties: {} 67 | TopicPolicy: 68 | Type: 'AWS::SNS::TopicPolicy' 69 | Properties: 70 | PolicyDocument: 71 | Id: Id1 72 | Version: '2012-10-17' 73 | Statement: 74 | - Sid: Sid1 75 | Effect: Allow 76 | Principal: 77 | Service: 'events.amazonaws.com' # Allow EventBridge 78 | Action: 'sns:Publish' 79 | Resource: !Ref Topic 80 | - Sid: Sid2 81 | Effect: Allow 82 | Principal: 83 | AWS: '*' # Allow CloudWatch Alarms 84 | Action: 'sns:Publish' 85 | Resource: !Ref Topic 86 | Condition: 87 | StringEquals: 88 | 'AWS:SourceOwner': !Ref 'AWS::AccountId' 89 | Topics: 90 | - !Ref Topic 91 | TopicEndpointSubscription: 92 | DependsOn: TopicPolicy 93 | Type: 'AWS::SNS::Subscription' 94 | Properties: 95 | DeliveryPolicy: 96 | healthyRetryPolicy: 97 | minDelayTarget: 1 98 | maxDelayTarget: 60 99 | numRetries: 100 100 | numNoDelayRetries: 0 101 | backoffFunction: exponential 102 | throttlePolicy: 103 | maxReceivesPerSecond: 1 104 | Endpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 105 | Protocol: https 106 | TopicArn: !Ref Topic 107 | MonitoringJumpStartEvent: 108 | DependsOn: TopicEndpointSubscription 109 | Type: 'AWS::Events::Rule' 110 | Properties: 111 | Description: 'Monitoring Jump Start connection. (created by marbot)' 112 | ScheduleExpression: 'rate(30 days)' 113 | State: ENABLED 114 | Targets: 115 | - Arn: !Ref Topic 116 | Id: marbot 117 | Input: !Sub | 118 | { 119 | "Type": "monitoring-jump-start-connection", 120 | "StackTemplate": "marbot-efs", 121 | "StackVersion": "1.3.0", 122 | "Partition": "${AWS::Partition}", 123 | "AccountId": "${AWS::AccountId}", 124 | "Region": "${AWS::Region}", 125 | "StackId": "${AWS::StackId}", 126 | "StackName": "${AWS::StackName}" 127 | } 128 | ########################################################################## 129 | # # 130 | # ALARMS # 131 | # # 132 | ########################################################################## 133 | BurstCreditBalanceTooLowAlarm: 134 | Condition: HasBurstCreditBalanceThreshold 135 | DependsOn: TopicEndpointSubscription 136 | Type: 'AWS::CloudWatch::Alarm' 137 | Properties: 138 | AlarmActions: 139 | - !Ref Topic 140 | AlarmDescription: 'Average burst credit balance over last 10 minutes too low, expect a significant performance drop soon. (created by marbot)' 141 | ComparisonOperator: LessThanThreshold 142 | Dimensions: 143 | - Name: FileSystemId 144 | Value: !Ref FileSystemId 145 | EvaluationPeriods: 1 146 | MetricName: BurstCreditBalance 147 | Namespace: 'AWS/EFS' 148 | OKActions: 149 | - !Ref Topic 150 | Period: 600 151 | Statistic: Average 152 | Threshold: !Ref BurstCreditBalanceThreshold 153 | PercentIOLimitTooHighAlarm: 154 | DependsOn: TopicEndpointSubscription 155 | Type: 'AWS::CloudWatch::Alarm' 156 | Properties: 157 | AlarmActions: 158 | - !Ref Topic 159 | AlarmDescription: 'I/O limit has been reached, consider using Max I/O performance mode. (created by marbot)' 160 | ComparisonOperator: GreaterThanThreshold 161 | Dimensions: 162 | - Name: FileSystemId 163 | Value: !Ref FileSystemId 164 | EvaluationPeriods: 3 165 | MetricName: PercentIOLimit 166 | Namespace: 'AWS/EFS' 167 | OKActions: 168 | - !Ref Topic 169 | Period: 600 170 | Statistic: Maximum 171 | Threshold: 95 172 | ThroughputAlarm: # https://docs.aws.amazon.com/efs/latest/ug/monitoring-metric-math.html#metric-math-throughput-utilization 173 | Condition: HasThroughputThreshold 174 | DependsOn: TopicEndpointSubscription 175 | Type: 'AWS::CloudWatch::Alarm' 176 | Properties: 177 | AlarmActions: 178 | - !Ref Topic 179 | AlarmDescription: 'Throughput over last 10 minutes too high, performance is likely impacted. (created by marbot)' 180 | ComparisonOperator: GreaterThanThreshold 181 | DatapointsToAlarm: 6 182 | EvaluationPeriods: 10 183 | Metrics: 184 | - Id: m1 185 | Label: MeteredIOBytes 186 | MetricStat: 187 | Metric: 188 | Namespace: 'AWS/EFS' 189 | MetricName: MeteredIOBytes 190 | Dimensions: 191 | - Name: FileSystemId 192 | Value: !Ref FileSystemId 193 | Period: 60 194 | Stat: Sum 195 | Unit: Bytes 196 | ReturnData: false 197 | - Id: m2 198 | Label: PermittedThroughput 199 | MetricStat: 200 | Metric: 201 | Namespace: 'AWS/EFS' 202 | MetricName: PermittedThroughput 203 | Dimensions: 204 | - Name: FileSystemId 205 | Value: !Ref FileSystemId 206 | Period: 60 207 | Stat: Sum 208 | Unit: 'Bytes/Second' 209 | ReturnData: false 210 | - Expression: '(m1/1048576)/PERIOD(m1)' 211 | Id: e1 212 | Label: e1 213 | ReturnData: false 214 | - Expression: 'm2/1048576' 215 | Id: e2 216 | Label: e2 217 | ReturnData: false 218 | - Expression: '((e1)*100)/(e2)' 219 | Id: e3 220 | Label: 'Throughput utilization (%)' 221 | ReturnData: true 222 | OKActions: 223 | - !Ref Topic 224 | Threshold: !Ref ThroughputThreshold 225 | Outputs: 226 | StackName: 227 | Description: 'Stack name.' 228 | Value: !Sub '${AWS::StackName}' 229 | StackTemplate: 230 | Description: 'Stack template.' 231 | Value: 'marbot-efs' 232 | StackVersion: 233 | Description: 'Stack version.' 234 | Value: '1.3.0' 235 | -------------------------------------------------------------------------------- /marbot-elastic-beanstalk.config: -------------------------------------------------------------------------------- 1 | # Setup instructions 2 | # 1. In your Elastic Beanstalk project's source code, create a folder .ebextensions (learn more https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/ebextensions.html). 3 | # 2. Copy this file into the .ebextensions folder (file must end with .config!). 4 | # 3. Adjust the MarbotEndpointId option_settings. 5 | # 4. Deploy the application. 6 | 7 | option_settings: 8 | aws:elasticbeanstalk:customoption: 9 | # Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id"). 10 | MarbotEndpointId: '' 11 | 12 | # The maximum percentage of CPU utilization. 13 | MarbotCPUUtilizationThreshold: 80 14 | 15 | # The minimum number of CPU credits available (t* instances only). 16 | MarbotCPUCreditBalanceThreshold: 20 17 | 18 | Resources: 19 | MarbotTopic: 20 | Type: 'AWS::SNS::Topic' 21 | Properties: {} 22 | MarbotTopicPolicy: 23 | Type: 'AWS::SNS::TopicPolicy' 24 | Properties: 25 | PolicyDocument: 26 | Id: Id1 27 | Version: '2012-10-17' 28 | Statement: 29 | - Sid: Sid1 30 | Effect: Allow 31 | Principal: 32 | AWS: '*' 33 | Action: 'sns:Publish' 34 | Resource: {Ref: MarbotTopic} 35 | Condition: 36 | StringEquals: 37 | 'AWS:SourceOwner': {Ref: 'AWS::AccountId'} 38 | Topics: 39 | - {Ref: MarbotTopic} 40 | MarbotTopicEndpointSubscription: 41 | DependsOn: MarbotTopicPolicy 42 | Type: 'AWS::SNS::Subscription' 43 | Properties: 44 | DeliveryPolicy: 45 | healthyRetryPolicy: 46 | minDelayTarget: 1 47 | maxDelayTarget: 60 48 | numRetries: 100 49 | numNoDelayRetries: 0 50 | backoffFunction: exponential 51 | throttlePolicy: 52 | maxReceivesPerSecond: 1 53 | Endpoint: {'Fn::Join': ['', ['https://api.marbot.io/v1/endpoint/', {'Fn::GetOptionSetting': {OptionName: MarbotEndpointId}}]]} 54 | Protocol: https 55 | TopicArn: {Ref: MarbotTopic} 56 | MarbotCPUUtilizationTooHighAlarm: 57 | DependsOn: MarbotTopicEndpointSubscription 58 | Type: 'AWS::CloudWatch::Alarm' 59 | Properties: 60 | AlarmActions: 61 | - {Ref: MarbotTopic} 62 | AlarmDescription: 'Average CPU utilization over last 10 minutes too high. (created by marbot)' 63 | ComparisonOperator: GreaterThanThreshold 64 | Dimensions: 65 | - Name: AutoScalingGroupName 66 | Value: {Ref: AWSEBAutoScalingGroup} 67 | EvaluationPeriods: 1 68 | MetricName: CPUUtilization 69 | Namespace: 'AWS/EC2' 70 | OKActions: 71 | - {Ref: MarbotTopic} 72 | Period: 600 73 | Statistic: Average 74 | Threshold: {'Fn::GetOptionSetting': {OptionName: MarbotCPUUtilizationThreshold}} 75 | MarbotCPUCreditBalanceTooLowAlarm: 76 | DependsOn: MarbotTopicEndpointSubscription 77 | Type: 'AWS::CloudWatch::Alarm' 78 | Properties: 79 | AlarmActions: 80 | - {Ref: MarbotTopic} 81 | AlarmDescription: 'Average CPU credit balance over last 10 minutes too low, expect a significant performance drop soon. (created by marbot)' 82 | ComparisonOperator: LessThanThreshold 83 | Dimensions: 84 | - Name: AutoScalingGroupName 85 | Value: {Ref: AWSEBAutoScalingGroup} 86 | EvaluationPeriods: 1 87 | MetricName: CPUCreditBalance 88 | Namespace: 'AWS/EC2' 89 | OKActions: 90 | - {Ref: MarbotTopic} 91 | Period: 600 92 | Statistic: Average 93 | Threshold: {'Fn::GetOptionSetting': {OptionName: MarbotCPUCreditBalanceThreshold}} 94 | -------------------------------------------------------------------------------- /marbot-elastic-beanstalk.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: Elastic Beanstalk monitoring (don''t forget to put the https://github.com/marbot-io/monitoring-jump-start/blob/master/marbot-elastic-beanstalk.config file into your .ebextensions folder; https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | - Label: 26 | default: 'Elastic Beanstalk status changes' 27 | Parameters: 28 | - ElasticBeanstalkResourceStatusChange 29 | - OtherResourceStatusChange 30 | - HealthStatusChange 31 | - ManagedUpdateStatusChange 32 | - Label: 33 | default: 'Elastic Beanstalk severities' 34 | Parameters: 35 | - InfoSeverity 36 | - WarnSeverity 37 | - ErrorSeverity 38 | - Label: 39 | default: 'Elastic Beanstalk applications and environments' 40 | Parameters: 41 | - ApplicationNames 42 | - EnvironmentNames 43 | Parameters: 44 | EndpointId: 45 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 46 | Type: String 47 | Stage: 48 | Description: 'marbot stage (never change this!).' 49 | Type: String 50 | Default: v1 51 | AllowedValues: [v1, dev] 52 | ElasticBeanstalkResourceStatusChange: 53 | Description: 'Do you want to get notified about Elastic Beanstalk resource status changes (e.g., environment?' 54 | Type: String 55 | Default: true 56 | AllowedValues: [true, false] 57 | OtherResourceStatusChange: 58 | Description: 'Do you want to get notified about other resource status changes (e.g., Auto Scaling Group, Instance, ...)?' 59 | Type: String 60 | Default: true 61 | AllowedValues: [true, false] 62 | HealthStatusChange: 63 | Description: 'Do you want to get notified about health status changes?' 64 | Type: String 65 | Default: true 66 | AllowedValues: [true, false] 67 | ManagedUpdateStatusChange: 68 | Description: 'Do you want to get notified about managed update health status changes?' 69 | Type: String 70 | Default: true 71 | AllowedValues: [true, false] 72 | InfoSeverity: 73 | Description: 'Do you want to receive status changes with severity INFO' 74 | Type: String 75 | Default: true 76 | AllowedValues: [true, false] 77 | WarnSeverity: 78 | Description: 'Do you want to receive status changes with severity WARN' 79 | Type: String 80 | Default: true 81 | AllowedValues: [true, false] 82 | ErrorSeverity: 83 | Description: 'Do you want to receive status changes with severity ERROR' 84 | Type: String 85 | Default: true 86 | AllowedValues: [true, false] 87 | ApplicationNames: 88 | Description: 'Which applications do you want to monitor? Use * to monitor all environments in the region.' 89 | Type: CommaDelimitedList 90 | Default: '*' 91 | EnvironmentNames: 92 | Description: 'Which environments do you want to monitor? Use * to monitor all environments in the region.' 93 | Type: CommaDelimitedList 94 | Default: '*' 95 | Conditions: 96 | ElasticBeanstalkResourceStatusChangeEnabled: !Equals [!Ref ElasticBeanstalkResourceStatusChange, 'true'] 97 | OtherResourceStatusChangeEnabled: !Equals [!Ref OtherResourceStatusChange, 'true'] 98 | HealthStatusChangeEnabled: !Equals [!Ref HealthStatusChange, 'true'] 99 | ManagedUpdateStatusChangeEnabled: !Equals [!Ref ManagedUpdateStatusChange, 'true'] 100 | InfoSeverityEnabled: !Equals [!Ref InfoSeverity, 'true'] 101 | WarnSeverityEnabled: !Equals [!Ref WarnSeverity, 'true'] 102 | ErrorSeverityEnabled: !Equals [!Ref ErrorSeverity, 'true'] 103 | ApplicationNamesAll: !Equals [!Join ['', !Ref ApplicationNames], '*'] 104 | EnvironmentNamesAll: !Equals [!Join ['', !Ref EnvironmentNames], '*'] 105 | Resources: 106 | ########################################################################## 107 | # # 108 | # API # 109 | # # 110 | ########################################################################## 111 | ApiConnection: 112 | Type: 'AWS::Events::Connection' 113 | Properties: 114 | AuthorizationType: 'API_KEY' 115 | AuthParameters: 116 | ApiKeyAuthParameters: 117 | ApiKeyName: 'X-API-Key' 118 | ApiKeyValue: !Ref EndpointId 119 | Description: 'marbot' 120 | ApiDestination: 121 | Type: 'AWS::Events::ApiDestination' 122 | Properties: 123 | ConnectionArn: !GetAtt 'ApiConnection.Arn' 124 | Description: 'Forwards notifiations and alarms to marbot' 125 | HttpMethod: 'POST' 126 | InvocationEndpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 127 | InvocationRateLimitPerSecond: 1 128 | ApiRole: 129 | Type: 'AWS::IAM::Role' 130 | Properties: 131 | AssumeRolePolicyDocument: 132 | Version: '2012-10-17' 133 | Statement: 134 | - Effect: 'Allow' 135 | Principal: 136 | Service: 137 | - 'events.amazonaws.com' 138 | Action: 139 | - sts:AssumeRole 140 | Path: '/service-role/' 141 | Policies: 142 | - PolicyName: eventbridge 143 | PolicyDocument: 144 | Version: '2012-10-17' 145 | Statement: 146 | - Effect: 'Allow' 147 | Action: 'events:InvokeApiDestination' 148 | Resource: !GetAtt 'ApiDestination.Arn' 149 | MonitoringJumpStartEvent: 150 | Type: 'AWS::Events::Rule' 151 | Properties: 152 | Description: 'Monitoring Jump Start connection. (created by marbot)' 153 | ScheduleExpression: 'rate(30 days)' 154 | State: ENABLED 155 | Targets: 156 | - Arn: !GetAtt 'ApiDestination.Arn' 157 | Id: marbot 158 | RoleArn: !GetAtt 'ApiRole.Arn' 159 | Input: !Sub | 160 | { 161 | "Type": "monitoring-jump-start-connection", 162 | "StackTemplate": "marbot-elastic-beanstalk", 163 | "StackVersion": "1.5.0", 164 | "Partition": "${AWS::Partition}", 165 | "AccountId": "${AWS::AccountId}", 166 | "Region": "${AWS::Region}", 167 | "StackId": "${AWS::StackId}", 168 | "StackName": "${AWS::StackName}" 169 | } 170 | ########################################################################## 171 | # # 172 | # EVENTS # 173 | # # 174 | ########################################################################## 175 | ElasticBeanstalkEvent: 176 | Type: 'AWS::Events::Rule' 177 | Properties: 178 | Description: 'Monitoring Elastic Beanstalk. (created by marbot)' 179 | State: ENABLED 180 | EventPattern: 181 | source: 182 | - 'aws.elasticbeanstalk' 183 | detail-type: 184 | - !If [ElasticBeanstalkResourceStatusChangeEnabled, 'Elastic Beanstalk resource status change', !Ref 'AWS::NoValue'] 185 | - !If [OtherResourceStatusChangeEnabled, 'Other resource status change', !Ref 'AWS::NoValue'] 186 | - !If [HealthStatusChangeEnabled, 'Health status change', !Ref 'AWS::NoValue'] 187 | - !If [ManagedUpdateStatusChangeEnabled, 'Managed update status change', !Ref 'AWS::NoValue'] 188 | detail: 189 | Severity: 190 | - !If [InfoSeverityEnabled, 'INFO', !Ref 'AWS::NoValue'] 191 | - !If [WarnSeverityEnabled, 'WARN', !Ref 'AWS::NoValue'] 192 | - !If [ErrorSeverityEnabled, 'ERROR', !Ref 'AWS::NoValue'] 193 | ApplicationName: !If [ApplicationNamesAll, !Ref 'AWS::NoValue', !Ref ApplicationNames] 194 | EnvironmentName: !If [EnvironmentNamesAll, !Ref 'AWS::NoValue', !Ref EnvironmentNames] 195 | Targets: 196 | - Arn: !GetAtt 'ApiDestination.Arn' 197 | Id: marbot 198 | RoleArn: !GetAtt 'ApiRole.Arn' 199 | Outputs: 200 | StackName: 201 | Description: 'Stack name.' 202 | Value: !Sub '${AWS::StackName}' 203 | StackTemplate: 204 | Description: 'Stack template.' 205 | Value: 'marbot-elastic-beanstalk' 206 | StackVersion: 207 | Description: 'Stack version.' 208 | Value: '1.5.0' 209 | -------------------------------------------------------------------------------- /marbot-elasticache-memcached.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: ElastiCache memcached cluster monitoring (https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | - Label: 26 | default: 'ElastiCache' 27 | Parameters: 28 | - CacheClusterId 29 | - Label: 30 | default: 'Thresholds' 31 | Parameters: 32 | - CPUUtilizationThreshold 33 | - SwapUsageThreshold 34 | - EvictionsThreshold 35 | Parameters: 36 | EndpointId: 37 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 38 | Type: String 39 | Stage: 40 | Description: 'marbot stage (never change this!).' 41 | Type: String 42 | Default: v1 43 | AllowedValues: [v1, dev] 44 | CacheClusterId: 45 | Description: 'The cluster ID of the ElastiCache memcached cluster that you want to monitor.' 46 | Type: String 47 | CPUUtilizationThreshold: 48 | Description: 'The maximum percentage of CPU utilization (set to -1 to disable).' 49 | Type: Number 50 | Default: 80 51 | MinValue: -1 52 | MaxValue: 100 53 | SwapUsageThreshold: 54 | Description: 'The maximum amount of swap space used in Byte (set to -1 to disable).' 55 | Type: Number 56 | Default: 256000000 # 256 Megabyte in Byte 57 | MinValue: -1 58 | EvictionsThreshold: 59 | Description: 'The maximum number of keys evicted per minute because of missing memory (set to -1 to disable).' 60 | Type: Number 61 | Default: 1000 62 | MinValue: -1 63 | Conditions: 64 | HasCPUUtilizationThreshold: !Not [!Equals [!Ref CPUUtilizationThreshold, '-1']] 65 | HasSwapUsageThreshold: !Not [!Equals [!Ref SwapUsageThreshold, '-1']] 66 | HasEvictionsThreshold: !Not [!Equals [!Ref EvictionsThreshold, '-1']] 67 | Resources: 68 | ########################################################################## 69 | # # 70 | # TOPIC # 71 | # # 72 | ########################################################################## 73 | Topic: 74 | Type: 'AWS::SNS::Topic' 75 | Properties: {} 76 | TopicPolicy: 77 | Type: 'AWS::SNS::TopicPolicy' 78 | Properties: 79 | PolicyDocument: 80 | Id: Id1 81 | Version: '2012-10-17' 82 | Statement: 83 | - Sid: Sid1 84 | Effect: Allow 85 | Principal: 86 | Service: 'events.amazonaws.com' # Allow EventBridge 87 | Action: 'sns:Publish' 88 | Resource: !Ref Topic 89 | - Sid: Sid2 90 | Effect: Allow 91 | Principal: 92 | AWS: '*' # Allow CloudWatch Alarms, ElastiCache Notifications 93 | Action: 'sns:Publish' 94 | Resource: !Ref Topic 95 | Condition: 96 | StringEquals: 97 | 'AWS:SourceOwner': !Ref 'AWS::AccountId' 98 | Topics: 99 | - !Ref Topic 100 | TopicEndpointSubscription: 101 | DependsOn: TopicPolicy 102 | Type: 'AWS::SNS::Subscription' 103 | Properties: 104 | DeliveryPolicy: 105 | healthyRetryPolicy: 106 | minDelayTarget: 1 107 | maxDelayTarget: 60 108 | numRetries: 100 109 | numNoDelayRetries: 0 110 | backoffFunction: exponential 111 | throttlePolicy: 112 | maxReceivesPerSecond: 1 113 | Endpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 114 | Protocol: https 115 | TopicArn: !Ref Topic 116 | MonitoringJumpStartEvent: 117 | DependsOn: TopicEndpointSubscription 118 | Type: 'AWS::Events::Rule' 119 | Properties: 120 | Description: 'Monitoring Jump Start connection. (created by marbot)' 121 | ScheduleExpression: 'rate(30 days)' 122 | State: ENABLED 123 | Targets: 124 | - Arn: !Ref Topic 125 | Id: marbot 126 | Input: !Sub | 127 | { 128 | "Type": "monitoring-jump-start-connection", 129 | "StackTemplate": "marbot-elasticache-memcached", 130 | "StackVersion": "1.8.0", 131 | "Partition": "${AWS::Partition}", 132 | "AccountId": "${AWS::AccountId}", 133 | "Region": "${AWS::Region}", 134 | "StackId": "${AWS::StackId}", 135 | "StackName": "${AWS::StackName}" 136 | } 137 | ########################################################################## 138 | # # 139 | # ALARMS # 140 | # # 141 | ########################################################################## 142 | CPUUtilizationTooHighAlarm: 143 | Condition: HasCPUUtilizationThreshold 144 | DependsOn: TopicEndpointSubscription 145 | Type: 'AWS::CloudWatch::Alarm' 146 | Properties: 147 | AlarmActions: 148 | - !Ref Topic 149 | AlarmDescription: 'Average CPU utilization over last 10 minutes too high. (created by marbot)' 150 | ComparisonOperator: GreaterThanThreshold 151 | Dimensions: 152 | - Name: CacheClusterId 153 | Value: !Ref CacheClusterId 154 | EvaluationPeriods: 1 155 | MetricName: CPUUtilization 156 | Namespace: 'AWS/ElastiCache' 157 | OKActions: 158 | - !Ref Topic 159 | Period: 600 160 | Statistic: Average 161 | Threshold: !Ref CPUUtilizationThreshold 162 | SwapUsageTooHighAlarm: 163 | Condition: HasSwapUsageThreshold 164 | DependsOn: TopicEndpointSubscription 165 | Type: 'AWS::CloudWatch::Alarm' 166 | Properties: 167 | AlarmActions: 168 | - !Ref Topic 169 | AlarmDescription: 'Average swap usage over last 10 minutes too high, performance may suffer. (created by marbot)' 170 | ComparisonOperator: GreaterThanThreshold 171 | Dimensions: 172 | - Name: CacheClusterId 173 | Value: !Ref CacheClusterId 174 | EvaluationPeriods: 1 175 | MetricName: SwapUsage 176 | Namespace: 'AWS/ElastiCache' 177 | OKActions: 178 | - !Ref Topic 179 | Period: 600 180 | Statistic: Average 181 | Threshold: !Ref SwapUsageThreshold 182 | EvictionsTooHighAlarm: 183 | Condition: HasEvictionsThreshold 184 | DependsOn: TopicEndpointSubscription 185 | Type: 'AWS::CloudWatch::Alarm' 186 | Properties: 187 | AlarmActions: 188 | - !Ref Topic 189 | AlarmDescription: 'Evictions over last 10 minutes too high, memory may to less for all keys. (created by marbot)' 190 | ComparisonOperator: GreaterThanThreshold 191 | Dimensions: 192 | - Name: CacheClusterId 193 | Value: !Ref CacheClusterId 194 | EvaluationPeriods: 10 195 | MetricName: Evictions 196 | Namespace: 'AWS/ElastiCache' 197 | OKActions: 198 | - !Ref Topic 199 | Period: 60 200 | Statistic: Sum 201 | Threshold: !Ref EvictionsThreshold 202 | NotificationTopicConfiguration: 203 | DependsOn: TopicEndpointSubscription 204 | Type: 'Custom::NotificationTopicConfiguration' 205 | Version: '1.0' 206 | Properties: 207 | ServiceToken: !GetAtt 'CustomNotificationTopicConfigurationFunction.Arn' 208 | CacheClusterId: !Ref CacheClusterId 209 | NotificationTopicArn: !Ref Topic 210 | ########################################################################## 211 | # # 212 | # CUSTOM RESOURCES # 213 | # # 214 | ########################################################################## 215 | CustomNotificationTopicConfigurationFunction: # needs no monitoring because it is used as a custom resource 216 | Type: 'AWS::Lambda::Function' 217 | Properties: 218 | Code: 219 | ZipFile: | 220 | 'use strict'; 221 | const response = require('cfn-response'); 222 | const { ElastiCacheClient, DescribeCacheClustersCommand, ModifyCacheClusterCommand } = require('@aws-sdk/client-elasticache'); 223 | const elasticache = new ElastiCacheClient({apiVersion: '2015-02-02'}); 224 | exports.handler = (event, context, cb) => { 225 | console.log(JSON.stringify(event)); 226 | const failed = (err) => { 227 | console.log(JSON.stringify(err)); 228 | response.send(event, context, response.FAILED, {}); 229 | } 230 | const success = (msg) => { 231 | console.log(msg); 232 | response.send(event, context, response.SUCCESS, {}); 233 | }; 234 | const successCB = (msg) => { 235 | return () => { 236 | success(msg); 237 | } 238 | }; 239 | const describe = (cacheClusterId, cb) => { 240 | elasticache.send(new DescribeCacheClustersCommand({ 241 | CacheClusterId: cacheClusterId, 242 | }), function(err, data) { 243 | if (err) { 244 | failed(err); 245 | } else { 246 | if (data.CacheClusters.length === 0) { 247 | failed(new Error('cache cluster does not exist!')); 248 | } else { 249 | const cluster = data.CacheClusters[0]; 250 | console.log(JSON.stringify(cluster)); 251 | cb(cluster); 252 | } 253 | } 254 | }); 255 | }; 256 | const modify = (cacheClusterId, notificationTopicArn, notificationTopicStatus, cb) => { 257 | elasticache.send(new ModifyCacheClusterCommand({ 258 | CacheClusterId: cacheClusterId, 259 | ApplyImmediately: true, 260 | NotificationTopicArn: notificationTopicArn, 261 | NotificationTopicStatus: notificationTopicStatus 262 | }), function(err, data) { 263 | if (err) { 264 | failed(err); 265 | } else { 266 | cb(); 267 | } 268 | }); 269 | }; 270 | const create = (cacheClusterId, notificationTopicArn, cb) => { 271 | describe(cacheClusterId, (cluster) => { 272 | if ('NotificationConfiguration' in cluster 273 | && cluster.NotificationConfiguration.TopicStatus === 'active') { 274 | console.log('cache cluster already has an active notification configuration!'); 275 | cb(); 276 | } else { 277 | modify(cacheClusterId, notificationTopicArn, 'active', cb); 278 | } 279 | }); 280 | }; 281 | const remove = (cacheClusterId, notificationTopicArn, cb) => { 282 | describe(cacheClusterId, (cluster) => { 283 | console.log(JSON.stringify(cluster)); 284 | if ('NotificationConfiguration' in cluster 285 | && cluster.NotificationConfiguration.TopicStatus === 'active' 286 | && cluster.NotificationConfiguration.TopicArn === notificationTopicArn) { 287 | modify(cacheClusterId, '', 'inactive', cb); 288 | } else { 289 | console.log('cache cluster was not using the expected notification configuration!'); 290 | cb(); 291 | } 292 | }); 293 | }; 294 | if (event.RequestType === 'Create') { 295 | create(event.ResourceProperties.CacheClusterId, event.ResourceProperties.NotificationTopicArn, successCB('done')); 296 | } else if (event.RequestType === 'Update') { 297 | if (event.OldResourceProperties.CacheClusterId !== event.ResourceProperties.CacheClusterId) { 298 | remove(event.OldResourceProperties.CacheClusterId, event.OldResourceProperties.NotificationTopicArn, () => { 299 | create(event.ResourceProperties.CacheClusterId, event.ResourceProperties.NotificationTopicArn, successCB('done')); 300 | }); 301 | } else { 302 | describe(event.ResourceProperties.CacheClusterId, (cluster) => { 303 | if ('NotificationConfiguration' in cluster 304 | && cluster.NotificationConfiguration.TopicStatus === 'active' 305 | && cluster.NotificationConfiguration.TopicArn === event.OldResourceProperties.NotificationTopicArn) { 306 | modify(event.ResourceProperties.CacheClusterId, event.ResourceProperties.NotificationTopicArn, 'active', successCB('done')); 307 | } else { 308 | success('cache cluster was not using the expected notification configuration!'); 309 | } 310 | }); 311 | } 312 | } else if (event.RequestType === 'Delete') { 313 | remove(event.ResourceProperties.CacheClusterId, event.ResourceProperties.NotificationTopicArn, successCB('done')); 314 | } else { 315 | failed(new Error(`unsupported RequestType: ${event.RequestType}`)); 316 | } 317 | }; 318 | Handler: 'index.handler' 319 | MemorySize: 128 320 | Role: !GetAtt 'CustomNotificationTopicConfigurationRole.Arn' 321 | Runtime: 'nodejs22.x' 322 | Timeout: 10 323 | CustomNotificationTopicConfigurationRole: 324 | Type: 'AWS::IAM::Role' 325 | Properties: 326 | AssumeRolePolicyDocument: 327 | Version: '2012-10-17' 328 | Statement: 329 | - Effect: Allow 330 | Principal: 331 | Service: 'lambda.amazonaws.com' 332 | Action: 'sts:AssumeRole' 333 | ManagedPolicyArns: 334 | - 'arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole' 335 | Policies: 336 | - PolicyName: cloudwatch 337 | PolicyDocument: 338 | Statement: 339 | - Effect: Allow 340 | Action: 341 | - 'elasticache:DescribeCacheClusters' 342 | - 'elasticache:ModifyCacheCluster' 343 | Resource: '*' 344 | Outputs: 345 | StackName: 346 | Description: 'Stack name.' 347 | Value: !Sub '${AWS::StackName}' 348 | StackTemplate: 349 | Description: 'Stack template.' 350 | Value: 'marbot-elasticache-memcached' 351 | StackVersion: 352 | Description: 'Stack version.' 353 | Value: '1.8.0' 354 | -------------------------------------------------------------------------------- /marbot-interface-endpoint.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: VPC interface endpoint monitoring (https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | - Label: 26 | default: 'VPC interface endpoint' 27 | Parameters: 28 | - VPCEndpointId 29 | - Label: 30 | default: 'Thresholds' 31 | Parameters: 32 | - PacketsDroppedThreshold 33 | - RstPacketsReceivedThreshold 34 | - BandwidthUtilizationThreshold 35 | Parameters: 36 | EndpointId: 37 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 38 | Type: String 39 | Stage: 40 | Description: 'marbot stage (never change this!).' 41 | Type: String 42 | Default: v1 43 | AllowedValues: [v1, dev] 44 | VPCEndpointId: 45 | Description: 'The VPC interface endpoint ID that you want to monitor.' 46 | Type: String 47 | PacketsDroppedThreshold: 48 | Description: 'The maximum number of dropped packets (set to -1 to disable).' 49 | Type: Number 50 | Default: 0 51 | MinValue: -1 52 | RstPacketsReceivedThreshold: 53 | Description: 'The maximum number of RST packets received (set to -1 to disable).' 54 | Type: Number 55 | Default: 0 56 | MinValue: -1 57 | BandwidthUtilizationThreshold: 58 | Description: 'The maximum percentage of bandwidth utilization (set to -1 to disable).' 59 | Type: Number 60 | Default: 80 61 | MinValue: -1 62 | MaxValue: 100 63 | Conditions: 64 | HasPacketsDroppedThreshold: !Not [!Equals [!Ref PacketsDroppedThreshold, '-1']] 65 | HasRstPacketsReceivedThreshold: !Not [!Equals [!Ref RstPacketsReceivedThreshold, '-1']] 66 | HasBandwidthUtilizationThreshold: !Not [!Equals [!Ref BandwidthUtilizationThreshold, '-1']] 67 | Resources: 68 | ########################################################################## 69 | # # 70 | # TOPIC # 71 | # # 72 | ########################################################################## 73 | Topic: 74 | Type: 'AWS::SNS::Topic' 75 | Properties: {} 76 | TopicPolicy: 77 | Type: 'AWS::SNS::TopicPolicy' 78 | Properties: 79 | PolicyDocument: 80 | Id: Id1 81 | Version: '2012-10-17' 82 | Statement: 83 | - Sid: Sid1 84 | Effect: Allow 85 | Principal: 86 | Service: 'events.amazonaws.com' # Allow EventBridge 87 | Action: 'sns:Publish' 88 | Resource: !Ref Topic 89 | - Sid: Sid2 90 | Effect: Allow 91 | Principal: 92 | AWS: '*' # Allow CloudWatch Alarms 93 | Action: 'sns:Publish' 94 | Resource: !Ref Topic 95 | Condition: 96 | StringEquals: 97 | 'AWS:SourceOwner': !Ref 'AWS::AccountId' 98 | Topics: 99 | - !Ref Topic 100 | TopicEndpointSubscription: 101 | DependsOn: TopicPolicy 102 | Type: 'AWS::SNS::Subscription' 103 | Properties: 104 | DeliveryPolicy: 105 | healthyRetryPolicy: 106 | minDelayTarget: 1 107 | maxDelayTarget: 60 108 | numRetries: 100 109 | numNoDelayRetries: 0 110 | backoffFunction: exponential 111 | throttlePolicy: 112 | maxReceivesPerSecond: 1 113 | Endpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 114 | Protocol: https 115 | TopicArn: !Ref Topic 116 | MonitoringJumpStartEvent: 117 | DependsOn: TopicEndpointSubscription 118 | Type: 'AWS::Events::Rule' 119 | Properties: 120 | Description: 'Monitoring Jump Start connection. (created by marbot)' 121 | ScheduleExpression: 'rate(30 days)' 122 | State: ENABLED 123 | Targets: 124 | - Arn: !Ref Topic 125 | Id: marbot 126 | Input: !Sub | 127 | { 128 | "Type": "monitoring-jump-start-connection", 129 | "StackTemplate": "marbot-interface-endpoint", 130 | "StackVersion": "1.2.0", 131 | "Partition": "${AWS::Partition}", 132 | "AccountId": "${AWS::AccountId}", 133 | "Region": "${AWS::Region}", 134 | "StackId": "${AWS::StackId}", 135 | "StackName": "${AWS::StackName}" 136 | } 137 | ########################################################################## 138 | # # 139 | # ALARMS # 140 | # # 141 | ########################################################################## 142 | PacketsDroppedTooHighAlarm: 143 | Condition: HasPacketsDroppedThreshold 144 | DependsOn: TopicEndpointSubscription 145 | Type: 'AWS::CloudWatch::Alarm' 146 | Properties: 147 | AlarmActions: 148 | - !Ref Topic 149 | AlarmDescription: 'Dropped packets over last 10 minutes too high. Increasing values could indicate that the endpoint or endpoint service is unhealthy. (created by marbot)' 150 | Namespace: 'AWS/PrivateLinkEndpoints' 151 | MetricName: PacketsDropped 152 | Statistic: Sum 153 | Period: 600 154 | EvaluationPeriods: 1 155 | ComparisonOperator: GreaterThanThreshold 156 | Threshold: !Ref PacketsDroppedThreshold 157 | TreatMissingData: notBreaching 158 | Dimensions: 159 | - Name: 'Endpoint Type' 160 | Value: Interface 161 | - Name: 'Service Name' 162 | Value: !GetAtt 'EndpointDetails.ServiceName' 163 | - Name: 'VPC Endpoint Id' 164 | Value: !Ref VPCEndpointId 165 | - Name: 'VPC Id' 166 | Value: !GetAtt 'EndpointDetails.VpcId' 167 | RstPacketsReceivedTooHighAlarm: 168 | Condition: HasRstPacketsReceivedThreshold 169 | DependsOn: TopicEndpointSubscription 170 | Type: 'AWS::CloudWatch::Alarm' 171 | Properties: 172 | AlarmActions: 173 | - !Ref Topic 174 | AlarmDescription: 'RST packets received over last 10 minutes too high. Increasing values could indicate that the endpoint service is unhealthy.' 175 | Namespace: 'AWS/PrivateLinkEndpoints' 176 | MetricName: RstPacketsReceived 177 | Statistic: Sum 178 | Period: 600 179 | EvaluationPeriods: 1 180 | ComparisonOperator: GreaterThanThreshold 181 | Threshold: !Ref RstPacketsReceivedThreshold 182 | TreatMissingData: notBreaching 183 | Dimensions: 184 | - Name: 'Endpoint Type' 185 | Value: Interface 186 | - Name: 'Service Name' 187 | Value: !GetAtt 'EndpointDetails.ServiceName' 188 | - Name: 'VPC Endpoint Id' 189 | Value: !Ref VPCEndpointId 190 | - Name: 'VPC Id' 191 | Value: !GetAtt 'EndpointDetails.VpcId' 192 | BandwidthUtilizationTooHighAlarm: 193 | Condition: HasBandwidthUtilizationThreshold 194 | DependsOn: TopicEndpointSubscription 195 | Type: 'AWS::CloudWatch::Alarm' 196 | Properties: 197 | AlarmActions: 198 | - !Ref Topic 199 | AlarmDescription: 'Bandwidth utilization too high. (created by marbot)' 200 | ComparisonOperator: GreaterThanThreshold 201 | EvaluationPeriods: 1 202 | Metrics: 203 | - Id: bytesProcessed 204 | Label: BytesProcessed 205 | MetricStat: 206 | Metric: 207 | Namespace: 'AWS/PrivateLinkEndpoints' 208 | MetricName: BytesProcessed # bytes per minute 209 | Dimensions: 210 | - Name: 'Endpoint Type' 211 | Value: Interface 212 | - Name: 'Service Name' 213 | Value: !GetAtt 'EndpointDetails.ServiceName' 214 | - Name: 'VPC Endpoint Id' 215 | Value: !Ref VPCEndpointId 216 | - Name: 'VPC Id' 217 | Value: !GetAtt 'EndpointDetails.VpcId' 218 | Period: 60 219 | Stat: Sum 220 | ReturnData: false 221 | - Expression: 'bytesProcessed/60*8/1000/1000/1000' # to Gbit/s 222 | Id: 'bandwidth' 223 | Label: 'Bandwidth' 224 | ReturnData: false 225 | - Expression: 'bandwidth/100*100' # hard limit is 100 Gbit/s 226 | Id: 'utilization' 227 | Label: 'Utilization' 228 | ReturnData: true 229 | Threshold: !Ref BandwidthUtilizationThreshold 230 | TreatMissingData: notBreaching 231 | EndpointDetails: 232 | Type: 'Custom::EndpointDetails' 233 | DependsOn: LambdaPolicy 234 | Version: '1.0' 235 | Properties: 236 | EndpointId: !Ref VPCEndpointId 237 | ServiceToken: !GetAtt 'LambdaFunction.Arn' 238 | ########################################################################## 239 | # # 240 | # CUSTOM RESOURCES # 241 | # # 242 | ########################################################################## 243 | LambdaRole: 244 | Type: 'AWS::IAM::Role' 245 | Properties: 246 | AssumeRolePolicyDocument: 247 | Version: '2012-10-17' 248 | Statement: 249 | - Effect: Allow 250 | Principal: 251 | Service: 'lambda.amazonaws.com' 252 | Action: 'sts:AssumeRole' 253 | Policies: 254 | - PolicyName: ec2 255 | PolicyDocument: 256 | Statement: 257 | - Effect: Allow 258 | Action: 'ec2:DescribeVpcEndpoints' 259 | Resource: '*' 260 | LambdaPolicy: 261 | Type: 'AWS::IAM::Policy' 262 | Properties: 263 | PolicyDocument: 264 | Statement: 265 | - Effect: Allow 266 | Action: 267 | - 'logs:CreateLogStream' 268 | - 'logs:PutLogEvents' 269 | Resource: !GetAtt 'LambdaLogGroup.Arn' 270 | PolicyName: lambda 271 | Roles: 272 | - !Ref LambdaRole 273 | LambdaFunction: # needs no monitoring because it is used as a custom resource 274 | Type: 'AWS::Lambda::Function' 275 | Properties: 276 | Code: 277 | ZipFile: | 278 | 'use strict'; 279 | const response = require('cfn-response'); 280 | const { EC2Client, DescribeVpcEndpointsCommand } = require('@aws-sdk/client-ec2'); 281 | const ec2 = new EC2Client({apiVersion: '2016-11-15'}); 282 | exports.handler = (event, context) => { 283 | console.log(`Invoke: ${JSON.stringify(event)}`); 284 | if (event.RequestType === 'Delete') { 285 | response.send(event, context, response.SUCCESS, {}); 286 | } else if (event.RequestType === 'Create' || event.RequestType === 'Update') { 287 | ec2.send(new DescribeVpcEndpointsCommand({ 288 | VpcEndpointIds: [event.ResourceProperties.EndpointId] 289 | }), (err, data) => { 290 | if (err) { 291 | console.log(`Error: ${JSON.stringify(err)}`); 292 | response.send(event, context, response.FAILED, {}); 293 | } else { 294 | response.send(event, context, response.SUCCESS, { 295 | ServiceName: data.VpcEndpoints[0].ServiceName, 296 | VpcId: data.VpcEndpoints[0].VpcId 297 | }, event.ResourceProperties.EndpointId); 298 | } 299 | }); 300 | } else { 301 | cb(new Error(`unsupported RequestType: ${event.RequestType}`)); 302 | } 303 | }; 304 | Handler: 'index.handler' 305 | MemorySize: 128 306 | Role: !GetAtt 'LambdaRole.Arn' 307 | Runtime: 'nodejs22.x' 308 | Timeout: 60 309 | LambdaLogGroup: 310 | Type: 'AWS::Logs::LogGroup' 311 | Properties: 312 | LogGroupName: !Sub '/aws/lambda/${LambdaFunction}' 313 | RetentionInDays: 14 314 | Outputs: 315 | StackName: 316 | Description: 'Stack name.' 317 | Value: !Sub '${AWS::StackName}' 318 | StackTemplate: 319 | Description: 'Stack template.' 320 | Value: 'marbot-interface-endpoint' 321 | StackVersion: 322 | Description: 'Stack version.' 323 | Value: '1.2.0' 324 | -------------------------------------------------------------------------------- /marbot-lambda-function.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: Lambda function monitoring (https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | - Label: 26 | default: 'Lambda' 27 | Parameters: 28 | - FunctionName 29 | - Label: 30 | default: 'Thresholds' 31 | Parameters: 32 | - ErrorsThreshold 33 | - ThrottlesThreshold 34 | Parameters: 35 | EndpointId: 36 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 37 | Type: String 38 | Stage: 39 | Description: 'marbot stage (never change this!).' 40 | Type: String 41 | Default: v1 42 | AllowedValues: [v1, dev] 43 | FunctionName: 44 | Description: 'The Lambda function name that you want to monitor.' 45 | Type: String 46 | ErrorsThreshold: 47 | Description: 'The maximum errors of a Lambda function (set to -1 to disable).' 48 | Type: Number 49 | Default: 0 50 | MinValue: -1 51 | ThrottlesThreshold: 52 | Description: 'The maximum throttles of a Lambda function (set to -1 to disable).' 53 | Type: Number 54 | Default: 0 55 | MinValue: -1 56 | Conditions: 57 | HasErrorsThreshold: !Not [!Equals [!Ref ErrorsThreshold, '-1']] 58 | HasThrottlesThreshold: !Not [!Equals [!Ref ThrottlesThreshold, '-1']] 59 | Resources: 60 | ########################################################################## 61 | # # 62 | # TOPIC # 63 | # # 64 | ########################################################################## 65 | Topic: 66 | Type: 'AWS::SNS::Topic' 67 | Properties: {} 68 | TopicPolicy: 69 | Type: 'AWS::SNS::TopicPolicy' 70 | Properties: 71 | PolicyDocument: 72 | Id: Id1 73 | Version: '2012-10-17' 74 | Statement: 75 | - Sid: Sid1 76 | Effect: Allow 77 | Principal: 78 | Service: 'events.amazonaws.com' # Allow EventBridge 79 | Action: 'sns:Publish' 80 | Resource: !Ref Topic 81 | - Sid: Sid2 82 | Effect: Allow 83 | Principal: 84 | AWS: '*' # Allow CloudWatch Alarms 85 | Action: 'sns:Publish' 86 | Resource: !Ref Topic 87 | Condition: 88 | StringEquals: 89 | 'AWS:SourceOwner': !Ref 'AWS::AccountId' 90 | Topics: 91 | - !Ref Topic 92 | TopicEndpointSubscription: 93 | DependsOn: TopicPolicy 94 | Type: 'AWS::SNS::Subscription' 95 | Properties: 96 | DeliveryPolicy: 97 | healthyRetryPolicy: 98 | minDelayTarget: 1 99 | maxDelayTarget: 60 100 | numRetries: 100 101 | numNoDelayRetries: 0 102 | backoffFunction: exponential 103 | throttlePolicy: 104 | maxReceivesPerSecond: 1 105 | Endpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 106 | Protocol: https 107 | TopicArn: !Ref Topic 108 | MonitoringJumpStartEvent: 109 | DependsOn: TopicEndpointSubscription 110 | Type: 'AWS::Events::Rule' 111 | Properties: 112 | Description: 'Monitoring Jump Start connection. (created by marbot)' 113 | ScheduleExpression: 'rate(30 days)' 114 | State: ENABLED 115 | Targets: 116 | - Arn: !Ref Topic 117 | Id: marbot 118 | Input: !Sub | 119 | { 120 | "Type": "monitoring-jump-start-connection", 121 | "StackTemplate": "marbot-lambda-function", 122 | "StackVersion": "1.1.2", 123 | "Partition": "${AWS::Partition}", 124 | "AccountId": "${AWS::AccountId}", 125 | "Region": "${AWS::Region}", 126 | "StackId": "${AWS::StackId}", 127 | "StackName": "${AWS::StackName}" 128 | } 129 | ########################################################################## 130 | # # 131 | # ALARMS # 132 | # # 133 | ########################################################################## 134 | ErrorsAlarm: 135 | Condition: HasErrorsThreshold 136 | DependsOn: TopicEndpointSubscription 137 | Type: 'AWS::CloudWatch::Alarm' 138 | Properties: 139 | AlarmActions: 140 | - !Ref Topic 141 | AlarmDescription: 'Invocations failed due to errors' 142 | ComparisonOperator: GreaterThanThreshold 143 | DatapointsToAlarm: 1 # We use a 1 out of 18 alarm: if we only look at one period we might miss an error because of the eventual consistent nature of CloudWatch and the fact that Lambda uses the invocation timestamp for metric data 144 | Dimensions: 145 | - Name: FunctionName 146 | Value: !Ref FunctionName 147 | EvaluationPeriods: 18 # We use a 1 out of 18 alarm: if we only look at one period we might miss an error because of the eventual consistent nature of CloudWatch and the fact that Lambda uses the invocation timestamp for metric data 148 | MetricName: Errors 149 | Namespace: 'AWS/Lambda' 150 | OKActions: 151 | - !Ref Topic 152 | Period: 60 153 | Statistic: Sum 154 | Threshold: !Ref ErrorsThreshold 155 | TreatMissingData: notBreaching 156 | ThrottlesAlarm: 157 | Condition: HasThrottlesThreshold 158 | DependsOn: TopicEndpointSubscription 159 | Type: 'AWS::CloudWatch::Alarm' 160 | Properties: 161 | AlarmActions: 162 | - !Ref Topic 163 | AlarmDescription: 'Invocation attempts were throttled due to invocation rates exceeding the concurrent limits' 164 | ComparisonOperator: GreaterThanThreshold 165 | DatapointsToAlarm: 1 # We use a 1 out of 18 alarm: if we only look at one period we might miss an error because of the eventual consistent nature of CloudWatch and the fact that Lambda uses the invocation timestamp for metric data 166 | Dimensions: 167 | - Name: FunctionName 168 | Value: !Ref FunctionName 169 | EvaluationPeriods: 18 # We use a 1 out of 18 alarm: if we only look at one period we might miss an error because of the eventual consistent nature of CloudWatch and the fact that Lambda uses the invocation timestamp for metric data 170 | MetricName: Throttles 171 | Namespace: 'AWS/Lambda' 172 | OKActions: 173 | - !Ref Topic 174 | Period: 60 175 | Statistic: Sum 176 | Threshold: !Ref ThrottlesThreshold 177 | TreatMissingData: notBreaching 178 | Outputs: 179 | StackName: 180 | Description: 'Stack name.' 181 | Value: !Sub '${AWS::StackName}' 182 | StackTemplate: 183 | Description: 'Stack template.' 184 | Value: 'marbot-lambda-function' 185 | StackVersion: 186 | Description: 'Stack version.' 187 | Value: '1.1.2' 188 | -------------------------------------------------------------------------------- /marbot-nat-gateway.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: NAT gateway monitoring (https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | - Label: 26 | default: 'NAT Gateway' 27 | Parameters: 28 | - NatGatewayId 29 | - Label: 30 | default: 'Thresholds' 31 | Parameters: 32 | - ErrorPortAllocationThreshold 33 | - PacketsDropCountThreshold 34 | - BandwidthUtilizationThreshold 35 | - PacketsUtilizationThreshold 36 | Parameters: 37 | EndpointId: 38 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 39 | Type: String 40 | Stage: 41 | Description: 'marbot stage (never change this!).' 42 | Type: String 43 | Default: v1 44 | AllowedValues: [v1, dev] 45 | NatGatewayId: 46 | Description: 'The NAT gateway ID that you want to monitor.' 47 | Type: String 48 | ErrorPortAllocationThreshold: 49 | Description: 'The maximum port allocation errors (set to -1 to disable).' 50 | Type: Number 51 | Default: 0 52 | MinValue: -1 53 | PacketsDropCountThreshold: 54 | Description: 'The maximum packet drops (set to -1 to disable).' 55 | Type: Number 56 | Default: 0 57 | MinValue: -1 58 | BandwidthUtilizationThreshold: 59 | Description: 'The maximum percentage of bandwidth utilization (set to -1 to disable).' 60 | Type: Number 61 | Default: 80 62 | MinValue: -1 63 | MaxValue: 100 64 | PacketsUtilizationThreshold: 65 | Description: 'The maximum percentage of packets utilization (set to -1 to disable).' 66 | Type: Number 67 | Default: 80 68 | MinValue: -1 69 | MaxValue: 100 70 | Conditions: 71 | HasErrorPortAllocationThreshold: !Not [!Equals [!Ref ErrorPortAllocationThreshold, '-1']] 72 | HasPacketsDropCountThreshold: !Not [!Equals [!Ref PacketsDropCountThreshold, '-1']] 73 | HasBandwidthUtilizationThreshold: !Not [!Equals [!Ref BandwidthUtilizationThreshold, '-1']] 74 | HasPacketsUtilizationThreshold: !Not [!Equals [!Ref PacketsUtilizationThreshold, '-1']] 75 | Resources: 76 | ########################################################################## 77 | # # 78 | # TOPIC # 79 | # # 80 | ########################################################################## 81 | Topic: 82 | Type: 'AWS::SNS::Topic' 83 | Properties: {} 84 | TopicPolicy: 85 | Type: 'AWS::SNS::TopicPolicy' 86 | Properties: 87 | PolicyDocument: 88 | Id: Id1 89 | Version: '2012-10-17' 90 | Statement: 91 | - Sid: Sid1 92 | Effect: Allow 93 | Principal: 94 | Service: 'events.amazonaws.com' # Allow EventBridge 95 | Action: 'sns:Publish' 96 | Resource: !Ref Topic 97 | - Sid: Sid2 98 | Effect: Allow 99 | Principal: 100 | AWS: '*' # Allow CloudWatch Alarms 101 | Action: 'sns:Publish' 102 | Resource: !Ref Topic 103 | Condition: 104 | StringEquals: 105 | 'AWS:SourceOwner': !Ref 'AWS::AccountId' 106 | Topics: 107 | - !Ref Topic 108 | TopicEndpointSubscription: 109 | DependsOn: TopicPolicy 110 | Type: 'AWS::SNS::Subscription' 111 | Properties: 112 | DeliveryPolicy: 113 | healthyRetryPolicy: 114 | minDelayTarget: 1 115 | maxDelayTarget: 60 116 | numRetries: 100 117 | numNoDelayRetries: 0 118 | backoffFunction: exponential 119 | throttlePolicy: 120 | maxReceivesPerSecond: 1 121 | Endpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 122 | Protocol: https 123 | TopicArn: !Ref Topic 124 | MonitoringJumpStartEvent: 125 | DependsOn: TopicEndpointSubscription 126 | Type: 'AWS::Events::Rule' 127 | Properties: 128 | Description: 'Monitoring Jump Start connection. (created by marbot)' 129 | ScheduleExpression: 'rate(30 days)' 130 | State: ENABLED 131 | Targets: 132 | - Arn: !Ref Topic 133 | Id: marbot 134 | Input: !Sub | 135 | { 136 | "Type": "monitoring-jump-start-connection", 137 | "StackTemplate": "marbot-nat-gateway", 138 | "StackVersion": "1.0.1", 139 | "Partition": "${AWS::Partition}", 140 | "AccountId": "${AWS::AccountId}", 141 | "Region": "${AWS::Region}", 142 | "StackId": "${AWS::StackId}", 143 | "StackName": "${AWS::StackName}" 144 | } 145 | ########################################################################## 146 | # # 147 | # ALARMS # 148 | # # 149 | ########################################################################## 150 | ErrorPortAllocationTooHighAlarm: 151 | Condition: HasErrorPortAllocationThreshold 152 | DependsOn: TopicEndpointSubscription 153 | Type: 'AWS::CloudWatch::Alarm' 154 | Properties: 155 | AlarmActions: 156 | - !Ref Topic 157 | AlarmDescription: 'Errors allocating a source port over last 10 minutes too high. Too many concurrent connections are open through the NAT gateway. (created by marbot)' 158 | Namespace: 'AWS/NATGateway' 159 | MetricName: ErrorPortAllocation 160 | Statistic: Sum 161 | Period: 600 162 | EvaluationPeriods: 1 163 | ComparisonOperator: GreaterThanThreshold 164 | Threshold: !Ref ErrorPortAllocationThreshold 165 | Dimensions: 166 | - Name: NatGatewayId 167 | Value: !Ref NatGatewayId 168 | PacketsDropCountTooHighAlarm: 169 | Condition: HasPacketsDropCountThreshold 170 | DependsOn: TopicEndpointSubscription 171 | Type: 'AWS::CloudWatch::Alarm' 172 | Properties: 173 | AlarmActions: 174 | - !Ref Topic 175 | AlarmDescription: 'Dropped packets over last 10 minutes too high. This might indicate an ongoing transient issue with the NAT gateway. (created by marbot)' 176 | Namespace: 'AWS/NATGateway' 177 | MetricName: PacketsDropCount 178 | Statistic: Sum 179 | Period: 600 180 | EvaluationPeriods: 1 181 | ComparisonOperator: GreaterThanThreshold 182 | Threshold: !Ref PacketsDropCountThreshold 183 | Dimensions: 184 | - Name: NatGatewayId 185 | Value: !Ref NatGatewayId 186 | BandwidthUtilizationTooHighAlarm: 187 | Condition: HasBandwidthUtilizationThreshold 188 | DependsOn: TopicEndpointSubscription 189 | Type: 'AWS::CloudWatch::Alarm' 190 | Properties: 191 | AlarmActions: 192 | - !Ref Topic 193 | AlarmDescription: 'Bandwidth utilization too high. (created by marbot)' 194 | ComparisonOperator: GreaterThanThreshold 195 | EvaluationPeriods: 1 196 | Metrics: 197 | - Id: 'in1' 198 | Label: 'InFromDestination' 199 | MetricStat: 200 | Metric: 201 | Namespace: 'AWS/NATGateway' 202 | MetricName: BytesInFromDestination # bytes per minute 203 | Dimensions: 204 | - Name: NatGatewayId 205 | Value: !Ref NatGatewayId 206 | Period: 60 207 | Stat: Sum 208 | Unit: Bytes 209 | ReturnData: false 210 | - Id: 'in2' 211 | Label: 'InFromSource' 212 | MetricStat: 213 | Metric: 214 | Namespace: 'AWS/NATGateway' 215 | MetricName: BytesInFromSource # bytes per minute 216 | Dimensions: 217 | - Name: NatGatewayId 218 | Value: !Ref NatGatewayId 219 | Period: 60 220 | Stat: Sum 221 | Unit: Bytes 222 | ReturnData: false 223 | - Id: 'out1' 224 | Label: 'OutToDestination' 225 | MetricStat: 226 | Metric: 227 | Namespace: 'AWS/NATGateway' 228 | MetricName: BytesOutToDestination # bytes per minute 229 | Dimensions: 230 | - Name: NatGatewayId 231 | Value: !Ref NatGatewayId 232 | Period: 60 233 | Stat: Sum 234 | Unit: Bytes 235 | ReturnData: false 236 | - Id: 'out2' 237 | Label: 'OutToSource' 238 | MetricStat: 239 | Metric: 240 | Namespace: 'AWS/NATGateway' 241 | MetricName: BytesOutToSource # bytes per minute 242 | Dimensions: 243 | - Name: NatGatewayId 244 | Value: !Ref NatGatewayId 245 | Period: 60 246 | Stat: Sum 247 | Unit: Bytes 248 | ReturnData: false 249 | - Expression: '(in1+in2+out1+out2)/60*8/1000/1000/1000' # to Gbit/s 250 | Id: 'bandwidth' 251 | Label: 'Bandwidth' 252 | ReturnData: false 253 | - Expression: 'bandwidth/100*100' # hard limit is 100 Gbit/s 254 | Id: 'utilization' 255 | Label: 'Utilization' 256 | ReturnData: true 257 | Threshold: !Ref BandwidthUtilizationThreshold 258 | TreatMissingData: notBreaching 259 | PacketsUtilizationTooHighAlarm: 260 | Condition: HasPacketsUtilizationThreshold 261 | DependsOn: TopicEndpointSubscription 262 | Type: 'AWS::CloudWatch::Alarm' 263 | Properties: 264 | AlarmActions: 265 | - !Ref Topic 266 | AlarmDescription: 'Packets utilization too high. (created by marbot)' 267 | ComparisonOperator: GreaterThanThreshold 268 | EvaluationPeriods: 1 269 | Metrics: 270 | - Id: 'in1' 271 | Label: 'InFromDestination' 272 | MetricStat: 273 | Metric: 274 | Namespace: 'AWS/NATGateway' 275 | MetricName: PacketsInFromDestination # packets per minute 276 | Dimensions: 277 | - Name: NatGatewayId 278 | Value: !Ref NatGatewayId 279 | Period: 60 280 | Stat: Sum 281 | Unit: Count 282 | ReturnData: false 283 | - Id: 'in2' 284 | Label: 'InFromSource' 285 | MetricStat: 286 | Metric: 287 | Namespace: 'AWS/NATGateway' 288 | MetricName: PacketsInFromSource # packets per minute 289 | Dimensions: 290 | - Name: NatGatewayId 291 | Value: !Ref NatGatewayId 292 | Period: 60 293 | Stat: Sum 294 | Unit: Count 295 | ReturnData: false 296 | - Id: 'out1' 297 | Label: 'OutToDestination' 298 | MetricStat: 299 | Metric: 300 | Namespace: 'AWS/NATGateway' 301 | MetricName: PacketsOutToDestination # packets per minute 302 | Dimensions: 303 | - Name: NatGatewayId 304 | Value: !Ref NatGatewayId 305 | Period: 60 306 | Stat: Sum 307 | Unit: Count 308 | ReturnData: false 309 | - Id: 'out2' 310 | Label: 'OutToSource' 311 | MetricStat: 312 | Metric: 313 | Namespace: 'AWS/NATGateway' 314 | MetricName: PacketsOutToSource # packets per minute 315 | Dimensions: 316 | - Name: NatGatewayId 317 | Value: !Ref NatGatewayId 318 | Period: 60 319 | Stat: Sum 320 | Unit: Count 321 | ReturnData: false 322 | - Expression: '(in1+in2+out1+out2)/60' # to packets per second 323 | Id: 'packets' 324 | Label: 'Packets' 325 | ReturnData: false 326 | - Expression: 'packets/10000000*100' # hard limit is 10,000,000 packets per second 327 | Id: 'utilization' 328 | Label: 'Utilization' 329 | ReturnData: true 330 | Threshold: !Ref PacketsUtilizationThreshold 331 | TreatMissingData: notBreaching 332 | Outputs: 333 | StackName: 334 | Description: 'Stack name.' 335 | Value: !Sub '${AWS::StackName}' 336 | StackTemplate: 337 | Description: 'Stack template.' 338 | Value: 'marbot-nat-gateway' 339 | StackVersion: 340 | Description: 'Stack version.' 341 | Value: '1.0.1' 342 | -------------------------------------------------------------------------------- /marbot-rds-cluster.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: RDS cluster monitoring (https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | - Label: 26 | default: 'RDS' 27 | Parameters: 28 | - DBClusterIdentifier 29 | - Label: 30 | default: 'Thresholds' 31 | Parameters: 32 | - CPUUtilizationThreshold 33 | - CPUCreditBalanceThreshold 34 | - FreeableMemoryThreshold 35 | Parameters: 36 | EndpointId: 37 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 38 | Type: String 39 | Stage: 40 | Description: 'marbot stage (never change this!).' 41 | Type: String 42 | Default: v1 43 | AllowedValues: [v1, dev] 44 | DBClusterIdentifier: 45 | Description: 'The cluster ID of the RDS Aurora cluster that you want to monitor.' 46 | Type: String 47 | CPUUtilizationThreshold: 48 | Description: 'The maximum percentage of CPU utilization (set to -1 to disable).' 49 | Type: Number 50 | Default: 80 51 | MinValue: -1 52 | MaxValue: 100 53 | CPUCreditBalanceThreshold: 54 | Description: 'The minimum number of CPU credits available (t* instances only; set to -1 to disable).' 55 | Type: Number 56 | Default: 20 57 | MinValue: -1 58 | FreeableMemoryThreshold: 59 | Description: 'The minimum amount of available random access memory in Byte (set to -1 to disable).' 60 | Type: Number 61 | Default: 64000000 # 64 Megabyte in Byte 62 | MinValue: -1 63 | Conditions: 64 | HasCPUUtilizationThreshold: !Not [!Equals [!Ref CPUUtilizationThreshold, '-1']] 65 | HasCPUCreditBalanceThreshold: !Not [!Equals [!Ref CPUCreditBalanceThreshold, '-1']] 66 | HasFreeableMemoryThreshold: !Not [!Equals [!Ref FreeableMemoryThreshold, '-1']] 67 | Resources: 68 | ########################################################################## 69 | # # 70 | # TOPIC # 71 | # # 72 | ########################################################################## 73 | Topic: 74 | Type: 'AWS::SNS::Topic' 75 | Properties: {} 76 | TopicPolicy: 77 | Type: 'AWS::SNS::TopicPolicy' 78 | Properties: 79 | PolicyDocument: 80 | Id: Id1 81 | Version: '2012-10-17' 82 | Statement: 83 | - Sid: Sid1 84 | Effect: Allow 85 | Principal: 86 | Service: 87 | - 'events.amazonaws.com' # Allow EventBridge 88 | - 'rds.amazonaws.com' # Allow RDS Events 89 | Action: 'sns:Publish' 90 | Resource: !Ref Topic 91 | - Sid: Sid2 92 | Effect: Allow 93 | Principal: 94 | AWS: '*' # Allow CloudWatch Alarms 95 | Action: 'sns:Publish' 96 | Resource: !Ref Topic 97 | Condition: 98 | StringEquals: 99 | 'AWS:SourceOwner': !Ref 'AWS::AccountId' 100 | Topics: 101 | - !Ref Topic 102 | TopicEndpointSubscription: 103 | DependsOn: TopicPolicy 104 | Type: 'AWS::SNS::Subscription' 105 | Properties: 106 | DeliveryPolicy: 107 | healthyRetryPolicy: 108 | minDelayTarget: 1 109 | maxDelayTarget: 60 110 | numRetries: 100 111 | numNoDelayRetries: 0 112 | backoffFunction: exponential 113 | throttlePolicy: 114 | maxReceivesPerSecond: 1 115 | Endpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 116 | Protocol: https 117 | TopicArn: !Ref Topic 118 | MonitoringJumpStartEvent: 119 | DependsOn: TopicEndpointSubscription 120 | Type: 'AWS::Events::Rule' 121 | Properties: 122 | Description: 'Monitoring Jump Start connection. (created by marbot)' 123 | ScheduleExpression: 'rate(30 days)' 124 | State: ENABLED 125 | Targets: 126 | - Arn: !Ref Topic 127 | Id: marbot 128 | Input: !Sub | 129 | { 130 | "Type": "monitoring-jump-start-connection", 131 | "StackTemplate": "marbot-rds-cluster", 132 | "StackVersion": "1.5.1", 133 | "Partition": "${AWS::Partition}", 134 | "AccountId": "${AWS::AccountId}", 135 | "Region": "${AWS::Region}", 136 | "StackId": "${AWS::StackId}", 137 | "StackName": "${AWS::StackName}" 138 | } 139 | ########################################################################## 140 | # # 141 | # ALARMS # 142 | # # 143 | ########################################################################## 144 | CPUUtilizationTooHighAlarm: 145 | Condition: HasCPUUtilizationThreshold 146 | DependsOn: TopicEndpointSubscription 147 | Type: 'AWS::CloudWatch::Alarm' 148 | Properties: 149 | AlarmActions: 150 | - !Ref Topic 151 | AlarmDescription: 'Average database CPU utilization over last 10 minutes too high. (created by marbot)' 152 | ComparisonOperator: GreaterThanThreshold 153 | Dimensions: 154 | - Name: DBClusterIdentifier 155 | Value: !Ref DBClusterIdentifier 156 | EvaluationPeriods: 1 157 | MetricName: CPUUtilization 158 | Namespace: 'AWS/RDS' 159 | OKActions: 160 | - !Ref Topic 161 | Period: 600 162 | Statistic: Average 163 | Threshold: !Ref CPUUtilizationThreshold 164 | TreatMissingData: notBreaching 165 | CPUCreditBalanceTooLowAlarm: 166 | Condition: HasCPUCreditBalanceThreshold 167 | DependsOn: TopicEndpointSubscription 168 | Type: 'AWS::CloudWatch::Alarm' 169 | Properties: 170 | AlarmActions: 171 | - !Ref Topic 172 | AlarmDescription: 'Average database CPU credit balance over last 10 minutes too low, expect a significant performance drop soon. (created by marbot)' 173 | ComparisonOperator: LessThanThreshold 174 | Dimensions: 175 | - Name: DBClusterIdentifier 176 | Value: !Ref DBClusterIdentifier 177 | EvaluationPeriods: 1 178 | MetricName: CPUCreditBalance 179 | Namespace: 'AWS/RDS' 180 | OKActions: 181 | - !Ref Topic 182 | Period: 600 183 | Statistic: Average 184 | Threshold: !Ref CPUCreditBalanceThreshold 185 | TreatMissingData: notBreaching 186 | FreeableMemoryTooLowAlarm: 187 | Condition: HasFreeableMemoryThreshold 188 | DependsOn: TopicEndpointSubscription 189 | Type: 'AWS::CloudWatch::Alarm' 190 | Properties: 191 | AlarmActions: 192 | - !Ref Topic 193 | AlarmDescription: 'Average database freeable memory over last 10 minutes too low, performance may suffer. (created by marbot)' 194 | ComparisonOperator: LessThanThreshold 195 | Dimensions: 196 | - Name: DBClusterIdentifier 197 | Value: !Ref DBClusterIdentifier 198 | EvaluationPeriods: 1 199 | MetricName: FreeableMemory 200 | Namespace: 'AWS/RDS' 201 | OKActions: 202 | - !Ref Topic 203 | Period: 600 204 | Statistic: Average 205 | Threshold: !Ref FreeableMemoryThreshold 206 | TreatMissingData: notBreaching 207 | ########################################################################## 208 | # # 209 | # EVENTS # 210 | # # 211 | ########################################################################## 212 | EventSubscription: 213 | DependsOn: TopicEndpointSubscription 214 | Type: 'AWS::RDS::EventSubscription' 215 | Properties: 216 | SnsTopicArn: !Ref Topic 217 | SourceType: 'db-cluster' 218 | SourceIds: [!Ref DBClusterIdentifier] 219 | Outputs: 220 | StackName: 221 | Description: 'Stack name.' 222 | Value: !Sub '${AWS::StackName}' 223 | StackTemplate: 224 | Description: 'Stack template.' 225 | Value: 'marbot-rds-cluster' 226 | StackVersion: 227 | Description: 'Stack version.' 228 | Value: '1.5.1' 229 | -------------------------------------------------------------------------------- /marbot-rds.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: RDS database instance monitoring (https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | - Label: 26 | default: 'RDS' 27 | Parameters: 28 | - DBInstanceIdentifier 29 | - Label: 30 | default: 'Thresholds' 31 | Parameters: 32 | - DBLoadThreshold 33 | - BurstBalanceThreshold 34 | - CPUUtilizationThreshold 35 | - CPUCreditBalanceThreshold 36 | - DiskQueueDepthThreshold 37 | - FreeableMemoryThreshold 38 | - FreeStorageSpaceThreshold 39 | - SwapUsageThreshold 40 | Parameters: 41 | EndpointId: 42 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 43 | Type: String 44 | Stage: 45 | Description: 'marbot stage (never change this!).' 46 | Type: String 47 | Default: v1 48 | AllowedValues: [v1, dev] 49 | DBInstanceIdentifier: 50 | Description: 'The instance ID of the RDS database instance that you want to monitor.' 51 | Type: String 52 | DBLoadThreshold: 53 | Description: 'The maximum database load, set to number of vCPUs (requires RDS Performance Insights; set to -1 to disable).' 54 | Type: Number 55 | Default: -1 56 | MinValue: -1 57 | MaxValue: 128 58 | BurstBalanceThreshold: 59 | Description: 'The minimum percent of General Purpose SSD (gp2) burst-bucket I/O credits available.' 60 | Type: Number 61 | Default: 20 62 | MinValue: 0 63 | MaxValue: 100 64 | CPUUtilizationThreshold: 65 | Description: 'The maximum percentage of CPU utilization (set to -1 to disable).' 66 | Type: Number 67 | Default: 80 68 | MinValue: -1 69 | MaxValue: 100 70 | CPUCreditBalanceThreshold: 71 | Description: 'The minimum number of CPU credits available (t* instances only; set to -1 to disable).' 72 | Type: Number 73 | Default: 20 74 | MinValue: -1 75 | DiskQueueDepthThreshold: 76 | Description: 'The maximum number of outstanding IOs (read/write requests) waiting to access the disk (set to -1 to disable).' 77 | Type: Number 78 | Default: 64 79 | MinValue: -1 80 | FreeableMemoryThreshold: 81 | Description: 'The minimum amount of available random access memory in Byte (set to -1 to disable).' 82 | Type: Number 83 | Default: 64000000 # 64 Megabyte in Byte 84 | MinValue: -1 85 | FreeStorageSpaceThreshold: 86 | Description: 'The minimum amount of available storage space in Byte (set to -1 to disable).' 87 | Type: Number 88 | Default: 2000000000 # 2 Gigabyte in Byte 89 | MinValue: -1 90 | SwapUsageThreshold: 91 | Description: 'The maximum amount of swap space used on the DB instance in Byte (set to -1 to disable).' 92 | Type: Number 93 | Default: 256000000 # 256 Megabyte in Byte 94 | MinValue: -1 95 | Conditions: 96 | HasDBLoadThreshold: !Not [!Equals [!Ref DBLoadThreshold, '-1']] 97 | HasBurstBalanceThreshold: !Not [!Equals [!Ref BurstBalanceThreshold, '-1']] 98 | HasCPUUtilizationThreshold: !Not [!Equals [!Ref CPUUtilizationThreshold, '-1']] 99 | HasCPUCreditBalanceThreshold: !Not [!Equals [!Ref CPUCreditBalanceThreshold, '-1']] 100 | HasDiskQueueDepthThreshold: !Not [!Equals [!Ref DiskQueueDepthThreshold, '-1']] 101 | HasFreeableMemoryThreshold: !Not [!Equals [!Ref FreeableMemoryThreshold, '-1']] 102 | HasFreeStorageSpaceThreshold: !Not [!Equals [!Ref FreeStorageSpaceThreshold, '-1']] 103 | HasSwapUsageThreshold: !Not [!Equals [!Ref SwapUsageThreshold, '-1']] 104 | Resources: 105 | ########################################################################## 106 | # # 107 | # TOPIC # 108 | # # 109 | ########################################################################## 110 | Topic: 111 | Type: 'AWS::SNS::Topic' 112 | Properties: {} 113 | TopicPolicy: 114 | Type: 'AWS::SNS::TopicPolicy' 115 | Properties: 116 | PolicyDocument: 117 | Id: Id1 118 | Version: '2012-10-17' 119 | Statement: 120 | - Sid: Sid1 121 | Effect: Allow 122 | Principal: 123 | Service: 124 | - 'events.amazonaws.com' # Allow EventBridge 125 | - 'rds.amazonaws.com' # Allow RDS Events 126 | Action: 'sns:Publish' 127 | Resource: !Ref Topic 128 | - Sid: Sid2 129 | Effect: Allow 130 | Principal: 131 | AWS: '*' # Allow CloudWatch Alarms 132 | Action: 'sns:Publish' 133 | Resource: !Ref Topic 134 | Condition: 135 | StringEquals: 136 | 'AWS:SourceOwner': !Ref 'AWS::AccountId' 137 | Topics: 138 | - !Ref Topic 139 | TopicEndpointSubscription: 140 | DependsOn: TopicPolicy 141 | Type: 'AWS::SNS::Subscription' 142 | Properties: 143 | DeliveryPolicy: 144 | healthyRetryPolicy: 145 | minDelayTarget: 1 146 | maxDelayTarget: 60 147 | numRetries: 100 148 | numNoDelayRetries: 0 149 | backoffFunction: exponential 150 | throttlePolicy: 151 | maxReceivesPerSecond: 1 152 | Endpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 153 | Protocol: https 154 | TopicArn: !Ref Topic 155 | MonitoringJumpStartEvent: 156 | DependsOn: TopicEndpointSubscription 157 | Type: 'AWS::Events::Rule' 158 | Properties: 159 | Description: 'Monitoring Jump Start connection. (created by marbot)' 160 | ScheduleExpression: 'rate(30 days)' 161 | State: ENABLED 162 | Targets: 163 | - Arn: !Ref Topic 164 | Id: marbot 165 | Input: !Sub | 166 | { 167 | "Type": "monitoring-jump-start-connection", 168 | "StackTemplate": "marbot-rds", 169 | "StackVersion": "1.5.1", 170 | "Partition": "${AWS::Partition}", 171 | "AccountId": "${AWS::AccountId}", 172 | "Region": "${AWS::Region}", 173 | "StackId": "${AWS::StackId}", 174 | "StackName": "${AWS::StackName}" 175 | } 176 | ########################################################################## 177 | # # 178 | # ALARMS # 179 | # # 180 | ########################################################################## 181 | DBLoadTooHighAlarm: 182 | Condition: HasDBLoadThreshold 183 | DependsOn: TopicEndpointSubscription 184 | Type: 'AWS::CloudWatch::Alarm' 185 | Properties: 186 | AlarmActions: 187 | - !Ref Topic 188 | AlarmDescription: 'Average database load was too high over the last 10 minutes. (created by marbot)' 189 | ComparisonOperator: GreaterThanThreshold 190 | Dimensions: 191 | - Name: DBInstanceIdentifier 192 | Value: !Ref DBInstanceIdentifier 193 | EvaluationPeriods: 1 194 | MetricName: DBLoad 195 | Namespace: 'AWS/RDS' 196 | OKActions: 197 | - !Ref Topic 198 | Period: 600 199 | Statistic: Average 200 | Threshold: !Ref DBLoadThreshold 201 | TreatMissingData: notBreaching 202 | BurstBalanceTooLowAlarm: 203 | Condition: HasBurstBalanceThreshold 204 | DependsOn: TopicEndpointSubscription 205 | Type: 'AWS::CloudWatch::Alarm' 206 | Properties: 207 | AlarmActions: 208 | - !Ref Topic 209 | AlarmDescription: 'Average database storage burst balance over last 10 minutes too low, expect a significant performance drop soon. (created by marbot)' 210 | ComparisonOperator: LessThanThreshold 211 | Dimensions: 212 | - Name: DBInstanceIdentifier 213 | Value: !Ref DBInstanceIdentifier 214 | EvaluationPeriods: 1 215 | MetricName: BurstBalance 216 | Namespace: 'AWS/RDS' 217 | OKActions: 218 | - !Ref Topic 219 | Period: 600 220 | Statistic: Average 221 | Threshold: !Ref BurstBalanceThreshold 222 | TreatMissingData: notBreaching 223 | CPUUtilizationTooHighAlarm: 224 | Condition: HasCPUUtilizationThreshold 225 | DependsOn: TopicEndpointSubscription 226 | Type: 'AWS::CloudWatch::Alarm' 227 | Properties: 228 | AlarmActions: 229 | - !Ref Topic 230 | AlarmDescription: 'Average database CPU utilization over last 10 minutes too high. (created by marbot)' 231 | ComparisonOperator: GreaterThanThreshold 232 | Dimensions: 233 | - Name: DBInstanceIdentifier 234 | Value: !Ref DBInstanceIdentifier 235 | EvaluationPeriods: 1 236 | MetricName: CPUUtilization 237 | Namespace: 'AWS/RDS' 238 | OKActions: 239 | - !Ref Topic 240 | Period: 600 241 | Statistic: Average 242 | Threshold: !Ref CPUUtilizationThreshold 243 | TreatMissingData: notBreaching 244 | CPUCreditBalanceTooLowAlarm: 245 | Condition: HasCPUCreditBalanceThreshold 246 | DependsOn: TopicEndpointSubscription 247 | Type: 'AWS::CloudWatch::Alarm' 248 | Properties: 249 | AlarmActions: 250 | - !Ref Topic 251 | AlarmDescription: 'Average database CPU credit balance over last 10 minutes too low, expect a significant performance drop soon. (created by marbot)' 252 | ComparisonOperator: LessThanThreshold 253 | Dimensions: 254 | - Name: DBInstanceIdentifier 255 | Value: !Ref DBInstanceIdentifier 256 | EvaluationPeriods: 1 257 | MetricName: CPUCreditBalance 258 | Namespace: 'AWS/RDS' 259 | OKActions: 260 | - !Ref Topic 261 | Period: 600 262 | Statistic: Average 263 | Threshold: !Ref CPUCreditBalanceThreshold 264 | TreatMissingData: notBreaching 265 | DiskQueueDepthTooHighAlarm: 266 | Condition: HasDiskQueueDepthThreshold 267 | DependsOn: TopicEndpointSubscription 268 | Type: 'AWS::CloudWatch::Alarm' 269 | Properties: 270 | AlarmActions: 271 | - !Ref Topic 272 | AlarmDescription: 'Average database disk queue depth over last 10 minutes too high, performance may suffer. (created by marbot)' 273 | ComparisonOperator: GreaterThanThreshold 274 | Dimensions: 275 | - Name: DBInstanceIdentifier 276 | Value: !Ref DBInstanceIdentifier 277 | EvaluationPeriods: 1 278 | MetricName: DiskQueueDepth 279 | Namespace: 'AWS/RDS' 280 | OKActions: 281 | - !Ref Topic 282 | Period: 600 283 | Statistic: Average 284 | Threshold: !Ref DiskQueueDepthThreshold 285 | TreatMissingData: notBreaching 286 | FreeableMemoryTooLowAlarm: 287 | Condition: HasFreeableMemoryThreshold 288 | DependsOn: TopicEndpointSubscription 289 | Type: 'AWS::CloudWatch::Alarm' 290 | Properties: 291 | AlarmActions: 292 | - !Ref Topic 293 | AlarmDescription: 'Average database freeable memory over last 10 minutes too low, performance may suffer. (created by marbot)' 294 | ComparisonOperator: LessThanThreshold 295 | Dimensions: 296 | - Name: DBInstanceIdentifier 297 | Value: !Ref DBInstanceIdentifier 298 | EvaluationPeriods: 1 299 | MetricName: FreeableMemory 300 | Namespace: 'AWS/RDS' 301 | OKActions: 302 | - !Ref Topic 303 | Period: 600 304 | Statistic: Average 305 | Threshold: !Ref FreeableMemoryThreshold 306 | TreatMissingData: notBreaching 307 | FreeStorageSpaceTooLowAlarm: 308 | Condition: HasFreeStorageSpaceThreshold 309 | DependsOn: TopicEndpointSubscription 310 | Type: 'AWS::CloudWatch::Alarm' 311 | Properties: 312 | AlarmActions: 313 | - !Ref Topic 314 | AlarmDescription: 'Average database free storage space over last 10 minutes too low. (created by marbot)' 315 | ComparisonOperator: LessThanThreshold 316 | Dimensions: 317 | - Name: DBInstanceIdentifier 318 | Value: !Ref DBInstanceIdentifier 319 | EvaluationPeriods: 1 320 | MetricName: FreeStorageSpace 321 | Namespace: 'AWS/RDS' 322 | OKActions: 323 | - !Ref Topic 324 | Period: 600 325 | Statistic: Average 326 | Threshold: !Ref FreeStorageSpaceThreshold 327 | TreatMissingData: notBreaching 328 | SwapUsageTooHighAlarm: 329 | Condition: HasSwapUsageThreshold 330 | DependsOn: TopicEndpointSubscription 331 | Type: 'AWS::CloudWatch::Alarm' 332 | Properties: 333 | AlarmActions: 334 | - !Ref Topic 335 | AlarmDescription: 'Average database swap usage over last 10 minutes too high, performance may suffer. (created by marbot)' 336 | ComparisonOperator: GreaterThanThreshold 337 | Dimensions: 338 | - Name: DBInstanceIdentifier 339 | Value: !Ref DBInstanceIdentifier 340 | EvaluationPeriods: 1 341 | MetricName: SwapUsage 342 | Namespace: 'AWS/RDS' 343 | OKActions: 344 | - !Ref Topic 345 | Period: 600 346 | Statistic: Average 347 | Threshold: !Ref SwapUsageThreshold 348 | TreatMissingData: notBreaching 349 | ########################################################################## 350 | # # 351 | # EVENTS # 352 | # # 353 | ########################################################################## 354 | EventSubscription: 355 | DependsOn: TopicEndpointSubscription 356 | Type: 'AWS::RDS::EventSubscription' 357 | Properties: 358 | SnsTopicArn: !Ref Topic 359 | SourceIds: [!Ref DBInstanceIdentifier] 360 | SourceType: 'db-instance' 361 | Outputs: 362 | StackName: 363 | Description: 'Stack name.' 364 | Value: !Sub '${AWS::StackName}' 365 | StackTemplate: 366 | Description: 'Stack template.' 367 | Value: 'marbot-rds' 368 | StackVersion: 369 | Description: 'Stack version.' 370 | Value: '1.5.1' 371 | -------------------------------------------------------------------------------- /marbot-redshift.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: Redshift cluster monitoring (https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | - Label: 26 | default: 'Redshift' 27 | Parameters: 28 | - ClusterIdentifier 29 | - NodeType 30 | - Label: 31 | default: 'Thresholds' 32 | Parameters: 33 | - CPUUtilizationThreshold 34 | - DiskSpaceThreshold 35 | - ConcurrencyScalingThreshold 36 | Parameters: 37 | EndpointId: 38 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 39 | Type: String 40 | Stage: 41 | Description: 'marbot stage (never change this!).' 42 | Type: String 43 | Default: v1 44 | AllowedValues: [v1, dev] 45 | ClusterIdentifier: 46 | Description: 'The cluster ID of the Redshift cluster that you want to monitor.' 47 | Type: String 48 | NodeType: 49 | Description: 'The current Redshift node type.' 50 | Type: String 51 | AllowedValues: 52 | - 'dc2.large' 53 | - 'dc2.8xlarge' 54 | - 'ds2.xlarge' 55 | - 'ds2.8xlarge' 56 | - 'ra3.4xlarge' 57 | - 'ra3.16xlarge' 58 | - 'dc1.large' 59 | - 'dc1.8xlarge' 60 | CPUUtilizationThreshold: 61 | Description: 'The maximum percentage of CPU utilization (set to -1 to disable).' 62 | Type: Number 63 | Default: 80 64 | MinValue: -1 65 | MaxValue: 100 66 | DiskSpaceThreshold: 67 | Description: 'The maximum percentage of used disk space.' 68 | Type: Number 69 | Default: 90 70 | MinValue: 1 71 | MaxValue: 100 72 | ConcurrencyScalingThreshold: 73 | Description: 'The maximum number of concurrency scaling seconds per 24 hours (set -1 to disable).' 74 | Type: Number 75 | Default: 3600 76 | MinValue: -1 77 | Mappings: 78 | NodeTypes: 79 | 'dc2.large': 80 | IOCapacity: 0.54 # original value - 10% 81 | 'dc2.8xlarge': 82 | IOCapacity: 6.75 83 | 'ds2.xlarge': 84 | IOCapacity: 0.36 85 | 'ds2.8xlarge': 86 | IOCapacity: 2.97 87 | 'ra3.4xlarge': 88 | IOCapacity: 1.8 89 | 'ra3.16xlarge': 90 | IOCapacity: 7.2 91 | 'dc1.large': 92 | IOCapacity: 0.18 93 | 'dc1.8xlarge': 94 | IOCapacity: 3.33 95 | Conditions: 96 | HasCPUUtilizationThreshold: !Not [!Equals [!Ref CPUUtilizationThreshold, '-1']] 97 | HasConcurrencyScalingThreshold: !Not [!Equals [!Ref ConcurrencyScalingThreshold, '-1']] 98 | Resources: 99 | ########################################################################## 100 | # # 101 | # TOPIC # 102 | # # 103 | ########################################################################## 104 | Topic: 105 | Type: 'AWS::SNS::Topic' 106 | Properties: {} 107 | TopicPolicy: 108 | Type: 'AWS::SNS::TopicPolicy' 109 | Properties: 110 | PolicyDocument: 111 | Id: Id1 112 | Version: '2012-10-17' 113 | Statement: 114 | - Sid: Sid1 115 | Effect: Allow 116 | Principal: 117 | AWS: '*' # Allow CloudWatch Alarms 118 | Action: 'sns:Publish' 119 | Resource: !Ref Topic 120 | Condition: 121 | StringEquals: 122 | 'AWS:SourceOwner': !Ref 'AWS::AccountId' 123 | Topics: 124 | - !Ref Topic 125 | TopicEndpointSubscription: 126 | DependsOn: TopicPolicy 127 | Type: 'AWS::SNS::Subscription' 128 | Properties: 129 | DeliveryPolicy: 130 | healthyRetryPolicy: 131 | minDelayTarget: 1 132 | maxDelayTarget: 60 133 | numRetries: 100 134 | numNoDelayRetries: 0 135 | backoffFunction: exponential 136 | throttlePolicy: 137 | maxReceivesPerSecond: 1 138 | Endpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 139 | Protocol: https 140 | TopicArn: !Ref Topic 141 | MonitoringJumpStartEvent: 142 | DependsOn: TopicEndpointSubscription 143 | Type: 'AWS::Events::Rule' 144 | Properties: 145 | Description: 'Monitoring Jump Start connection. (created by marbot)' 146 | ScheduleExpression: 'rate(30 days)' 147 | State: ENABLED 148 | Targets: 149 | - Arn: !Ref Topic 150 | Id: marbot 151 | Input: !Sub | 152 | { 153 | "Type": "monitoring-jump-start-connection", 154 | "StackTemplate": "marbot-redshift", 155 | "StackVersion": "1.1.1", 156 | "Partition": "${AWS::Partition}", 157 | "AccountId": "${AWS::AccountId}", 158 | "Region": "${AWS::Region}", 159 | "StackId": "${AWS::StackId}", 160 | "StackName": "${AWS::StackName}" 161 | } 162 | ########################################################################## 163 | # # 164 | # ALARMS # 165 | # # 166 | ########################################################################## 167 | HealthStatusAlarm: 168 | DependsOn: TopicEndpointSubscription 169 | Type: 'AWS::CloudWatch::Alarm' 170 | Properties: 171 | AlarmDescription: 'Redshift cluster is unhealty. (created by marbot)' 172 | Namespace: 'AWS/Redshift' 173 | MetricName: HealthStatus 174 | Dimensions: 175 | - Name: ClusterIdentifier 176 | Value: !Ref ClusterIdentifier 177 | Threshold: 1 178 | ComparisonOperator: LessThanThreshold 179 | Statistic: Minimum 180 | Period: 60 181 | EvaluationPeriods: 1 182 | AlarmActions: 183 | - !Ref Topic 184 | OKActions: 185 | - !Ref Topic 186 | TreatMissingData: notBreaching 187 | MaintenanceModeAlarm: 188 | DependsOn: TopicEndpointSubscription 189 | Type: 'AWS::CloudWatch::Alarm' 190 | Properties: 191 | AlarmDescription: 'Redshift cluster in maintenance mode. (created by marbot)' 192 | Namespace: 'AWS/Redshift' 193 | MetricName: MaintenanceMode 194 | Dimensions: 195 | - Name: ClusterIdentifier 196 | Value: !Ref ClusterIdentifier 197 | Threshold: 1 198 | ComparisonOperator: GreaterThanThreshold 199 | Statistic: Maximum 200 | Period: 60 201 | EvaluationPeriods: 1 202 | AlarmActions: 203 | - !Ref Topic 204 | OKActions: 205 | - !Ref Topic 206 | TreatMissingData: notBreaching 207 | DiskSpaceAlarm: 208 | DependsOn: TopicEndpointSubscription 209 | Type: 'AWS::CloudWatch::Alarm' 210 | Properties: 211 | AlarmDescription: 'Redshift cluster is running out of disk space. (created by marbot)' 212 | Namespace: 'AWS/Redshift' 213 | MetricName: PercentageDiskSpaceUsed 214 | Dimensions: 215 | - Name: ClusterIdentifier 216 | Value: !Ref ClusterIdentifier 217 | Threshold: !Ref DiskSpaceThreshold 218 | ComparisonOperator: GreaterThanThreshold 219 | Statistic: Average 220 | Period: 60 221 | EvaluationPeriods: 1 222 | AlarmActions: 223 | - !Ref Topic 224 | OKActions: 225 | - !Ref Topic 226 | TreatMissingData: notBreaching 227 | SchemaQuotasAlarm: 228 | DependsOn: TopicEndpointSubscription 229 | Type: 'AWS::CloudWatch::Alarm' 230 | Properties: 231 | AlarmDescription: 'More than one schema reached their storage quota. (created by marbot)' 232 | Namespace: 'AWS/Redshift' 233 | MetricName: NumExceededSchemaQuotas 234 | Dimensions: 235 | - Name: ClusterIdentifier 236 | Value: !Ref ClusterIdentifier 237 | Threshold: 0 238 | ComparisonOperator: GreaterThanThreshold 239 | Statistic: Maximum 240 | Period: 60 241 | EvaluationPeriods: 1 242 | AlarmActions: 243 | - !Ref Topic 244 | OKActions: 245 | - !Ref Topic 246 | TreatMissingData: notBreaching 247 | HighCPUUtilizationAlarm: 248 | Condition: HasCPUUtilizationThreshold 249 | DependsOn: TopicEndpointSubscription 250 | Type: 'AWS::CloudWatch::Alarm' 251 | Properties: 252 | AlarmDescription: 'Redshift cluster experiences high CPU load for more than 15 minutes. (created by marbot)' 253 | Namespace: 'AWS/Redshift' 254 | MetricName: CPUUtilization 255 | Dimensions: 256 | - Name: ClusterIdentifier 257 | Value: !Ref ClusterIdentifier 258 | Threshold: !Ref CPUUtilizationThreshold 259 | ComparisonOperator: GreaterThanThreshold 260 | Statistic: Average 261 | Period: 900 262 | EvaluationPeriods: 1 263 | AlarmActions: 264 | - !Ref Topic 265 | OKActions: 266 | - !Ref Topic 267 | TreatMissingData: notBreaching 268 | HighIOUtilizationAlarm: 269 | DependsOn: TopicEndpointSubscription 270 | Type: 'AWS::CloudWatch::Alarm' 271 | Properties: 272 | AlarmDescription: 'Redshift cluster experiences high IO load for more than 15 minutes. (created by marbot)' 273 | Metrics: 274 | - Id: read 275 | Label: ReadThroughput 276 | MetricStat: 277 | Metric: 278 | Namespace: 'AWS/Redshift' 279 | MetricName: ReadThroughput 280 | Dimensions: 281 | - Name: ClusterIdentifier 282 | Value: !Ref ClusterIdentifier 283 | Period: 300 284 | Stat: Average 285 | # Unit: Bytes 286 | ReturnData: false 287 | - Id: write 288 | Label: WriteThroughput 289 | MetricStat: 290 | Metric: 291 | Namespace: 'AWS/Redshift' 292 | MetricName: WriteThroughput 293 | Dimensions: 294 | - Name: ClusterIdentifier 295 | Value: !Ref ClusterIdentifier 296 | Period: 300 297 | Stat: Average 298 | # Unit: Bytes 299 | ReturnData: false 300 | - Id: total 301 | Label: 'TotalThroughput' 302 | Expression: '(read+write)/1000/1000/1000' # GB/s 303 | ReturnData: true 304 | Threshold: !FindInMap [NodeTypes, !Ref NodeType, IOCapacity] 305 | ComparisonOperator: GreaterThanThreshold 306 | EvaluationPeriods: 3 307 | AlarmActions: 308 | - !Ref Topic 309 | OKActions: 310 | - !Ref Topic 311 | TreatMissingData: notBreaching 312 | ConcurrencyScalingAlarm: 313 | Condition: HasConcurrencyScalingThreshold 314 | DependsOn: TopicEndpointSubscription 315 | Type: 'AWS::CloudWatch::Alarm' 316 | Properties: 317 | AlarmDescription: 'Redshift cluster causes additional costs due to concurrency scaling. (created by marbot)' 318 | Namespace: 'AWS/Redshift' 319 | MetricName: ConcurrencyScalingSeconds 320 | Dimensions: 321 | - Name: ClusterIdentifier 322 | Value: !Ref ClusterIdentifier 323 | Threshold: !Ref ConcurrencyScalingThreshold 324 | ComparisonOperator: GreaterThanThreshold 325 | Statistic: Sum 326 | Period: 86400 327 | EvaluationPeriods: 1 328 | AlarmActions: 329 | - !Ref Topic 330 | OKActions: 331 | - !Ref Topic 332 | TreatMissingData: notBreaching 333 | Outputs: 334 | StackName: 335 | Description: 'Stack name.' 336 | Value: !Sub '${AWS::StackName}' 337 | StackTemplate: 338 | Description: 'Stack template.' 339 | Value: 'marbot-redshift' 340 | StackVersion: 341 | Description: 'Stack version.' 342 | Value: '1.1.1' 343 | -------------------------------------------------------------------------------- /marbot-repeated-task.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: Repeated task (https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | - Label: 26 | default: 'Task' 27 | Parameters: 28 | - ScheduleExpression 29 | - Message 30 | Parameters: 31 | EndpointId: 32 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 33 | Type: String 34 | Stage: 35 | Description: 'marbot stage (never change this!).' 36 | Type: String 37 | Default: v1 38 | AllowedValues: [v1, dev] 39 | ScheduleExpression: 40 | Description: 'The scheduling expression determines when and how often a task runs (see https://docs.aws.amazon.com/eventbridge/latest/userguide/scheduled-events.html).' 41 | Type: String 42 | Default: 'cron(0 12 * * ? *)' 43 | Message: 44 | Description: 'What needs to be done?' 45 | Type: String 46 | Default: '' 47 | Resources: 48 | ########################################################################## 49 | # # 50 | # TOPIC # 51 | # # 52 | ########################################################################## 53 | Topic: 54 | Type: 'AWS::SNS::Topic' 55 | Properties: {} 56 | TopicPolicy: 57 | Type: 'AWS::SNS::TopicPolicy' 58 | Properties: 59 | PolicyDocument: 60 | Id: Id1 61 | Version: '2012-10-17' 62 | Statement: 63 | - Sid: Sid1 64 | Effect: Allow 65 | Principal: 66 | Service: 'events.amazonaws.com' # Allow EventBridge 67 | Action: 'sns:Publish' 68 | Resource: !Ref Topic 69 | Topics: 70 | - !Ref Topic 71 | TopicEndpointSubscription: 72 | DependsOn: TopicPolicy 73 | Type: 'AWS::SNS::Subscription' 74 | Properties: 75 | DeliveryPolicy: 76 | healthyRetryPolicy: 77 | minDelayTarget: 1 78 | maxDelayTarget: 60 79 | numRetries: 100 80 | numNoDelayRetries: 0 81 | backoffFunction: exponential 82 | throttlePolicy: 83 | maxReceivesPerSecond: 1 84 | Endpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 85 | Protocol: https 86 | TopicArn: !Ref Topic 87 | MonitoringJumpStartEvent: 88 | DependsOn: TopicEndpointSubscription 89 | Type: 'AWS::Events::Rule' 90 | Properties: 91 | Description: 'Monitoring Jump Start connection. (created by marbot)' 92 | ScheduleExpression: 'rate(30 days)' 93 | State: ENABLED 94 | Targets: 95 | - Arn: !Ref Topic 96 | Id: marbot 97 | Input: !Sub | 98 | { 99 | "Type": "monitoring-jump-start-connection", 100 | "StackTemplate": "marbot-repeated-task", 101 | "StackVersion": "1.1.1", 102 | "Partition": "${AWS::Partition}", 103 | "AccountId": "${AWS::AccountId}", 104 | "Region": "${AWS::Region}", 105 | "StackId": "${AWS::StackId}", 106 | "StackName": "${AWS::StackName}" 107 | } 108 | ########################################################################## 109 | # # 110 | # EVENTS # 111 | # # 112 | ########################################################################## 113 | Task: 114 | DependsOn: TopicEndpointSubscription 115 | Type: 'AWS::Events::Rule' 116 | Properties: 117 | Description: 'Repeated task. (created by marbot)' 118 | ScheduleExpression: !Ref ScheduleExpression 119 | State: ENABLED 120 | Targets: 121 | - Arn: !Ref Topic 122 | Id: marbot 123 | Input: !Sub | 124 | { 125 | "Type": "repeated-task", 126 | "Message": "${Message}" 127 | } 128 | Outputs: 129 | StackName: 130 | Description: 'Stack name.' 131 | Value: !Sub '${AWS::StackName}' 132 | StackTemplate: 133 | Description: 'Stack template.' 134 | Value: 'marbot-repeated-task' 135 | StackVersion: 136 | Description: 'Stack version.' 137 | Value: '1.1.1' 138 | -------------------------------------------------------------------------------- /marbot-sqs-queue.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: SQS queue monitoring (https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | - Label: 26 | default: 'SQS' 27 | Parameters: 28 | - QueueName 29 | - Label: 30 | default: 'Thresholds' 31 | Parameters: 32 | - ApproximateAgeOfOldestMessageThreshold 33 | - ApproximateNumberOfMessagesVisibleThreshold 34 | Parameters: 35 | EndpointId: 36 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 37 | Type: String 38 | Stage: 39 | Description: 'marbot stage (never change this!).' 40 | Type: String 41 | Default: v1 42 | AllowedValues: [v1, dev] 43 | QueueName: 44 | Description: 'The SQS queue name that you want to monitor.' 45 | Type: String 46 | ApproximateAgeOfOldestMessageThreshold: 47 | Description: 'The maximum age (in seconds) of a message in the queue (set to -1 to disable).' 48 | Type: Number 49 | Default: 600 # 10 minutes 50 | MinValue: -1 51 | ApproximateNumberOfMessagesVisibleThreshold: 52 | Description: 'The maximum number of messages in the queue waiting for processing (set to -1 to disable).' 53 | Type: Number 54 | Default: 10 55 | MinValue: -1 56 | Conditions: 57 | HasApproximateAgeOfOldestMessageThreshold: !Not [!Equals [!Ref ApproximateAgeOfOldestMessageThreshold, '-1']] 58 | HasApproximateNumberOfMessagesVisibleThreshold: !Not [!Equals [!Ref ApproximateNumberOfMessagesVisibleThreshold, '-1']] 59 | Resources: 60 | ########################################################################## 61 | # # 62 | # TOPIC # 63 | # # 64 | ########################################################################## 65 | Topic: 66 | Type: 'AWS::SNS::Topic' 67 | Properties: {} 68 | TopicPolicy: 69 | Type: 'AWS::SNS::TopicPolicy' 70 | Properties: 71 | PolicyDocument: 72 | Id: Id1 73 | Version: '2012-10-17' 74 | Statement: 75 | - Sid: Sid1 76 | Effect: Allow 77 | Principal: 78 | Service: 'events.amazonaws.com' # Allow EventBridge 79 | Action: 'sns:Publish' 80 | Resource: !Ref Topic 81 | - Sid: Sid2 82 | Effect: Allow 83 | Principal: 84 | AWS: '*' # Allow CloudWatch Alarms 85 | Action: 'sns:Publish' 86 | Resource: !Ref Topic 87 | Condition: 88 | StringEquals: 89 | 'AWS:SourceOwner': !Ref 'AWS::AccountId' 90 | Topics: 91 | - !Ref Topic 92 | TopicEndpointSubscription: 93 | DependsOn: TopicPolicy 94 | Type: 'AWS::SNS::Subscription' 95 | Properties: 96 | DeliveryPolicy: 97 | healthyRetryPolicy: 98 | minDelayTarget: 1 99 | maxDelayTarget: 60 100 | numRetries: 100 101 | numNoDelayRetries: 0 102 | backoffFunction: exponential 103 | throttlePolicy: 104 | maxReceivesPerSecond: 1 105 | Endpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 106 | Protocol: https 107 | TopicArn: !Ref Topic 108 | MonitoringJumpStartEvent: 109 | DependsOn: TopicEndpointSubscription 110 | Type: 'AWS::Events::Rule' 111 | Properties: 112 | Description: 'Monitoring Jump Start connection. (created by marbot)' 113 | ScheduleExpression: 'rate(30 days)' 114 | State: ENABLED 115 | Targets: 116 | - Arn: !Ref Topic 117 | Id: marbot 118 | Input: !Sub | 119 | { 120 | "Type": "monitoring-jump-start-connection", 121 | "StackTemplate": "marbot-sqs-queue", 122 | "StackVersion": "1.4.1", 123 | "Partition": "${AWS::Partition}", 124 | "AccountId": "${AWS::AccountId}", 125 | "Region": "${AWS::Region}", 126 | "StackId": "${AWS::StackId}", 127 | "StackName": "${AWS::StackName}" 128 | } 129 | ########################################################################## 130 | # # 131 | # ALARMS # 132 | # # 133 | ########################################################################## 134 | ApproximateAgeOfOldestMessageAlarm: 135 | Condition: HasApproximateAgeOfOldestMessageThreshold 136 | DependsOn: TopicEndpointSubscription 137 | Type: 'AWS::CloudWatch::Alarm' 138 | Properties: 139 | AlarmActions: 140 | - !Ref Topic 141 | AlarmDescription: 'Queue contains old messages. Is message processing failing or is the message procesing capacity too low?' 142 | ComparisonOperator: GreaterThanThreshold 143 | Dimensions: 144 | - Name: QueueName 145 | Value: !Ref 'QueueName' 146 | EvaluationPeriods: 1 147 | MetricName: ApproximateAgeOfOldestMessage 148 | Namespace: 'AWS/SQS' 149 | OKActions: 150 | - !Ref Topic 151 | Period: 60 152 | Statistic: Maximum 153 | Threshold: !Ref ApproximateAgeOfOldestMessageThreshold 154 | TreatMissingData: notBreaching 155 | ApproximateNumberOfMessagesVisibleAlarm: 156 | Condition: HasApproximateNumberOfMessagesVisibleThreshold 157 | DependsOn: TopicEndpointSubscription 158 | Type: 'AWS::CloudWatch::Alarm' 159 | Properties: 160 | AlarmActions: 161 | - !Ref Topic 162 | AlarmDescription: 'Queue contains too many messages. Is message processing failing or is the message procesing capacity too low?' 163 | ComparisonOperator: GreaterThanThreshold 164 | Dimensions: 165 | - Name: QueueName 166 | Value: !Ref 'QueueName' 167 | EvaluationPeriods: 1 168 | MetricName: ApproximateNumberOfMessagesVisible 169 | Namespace: 'AWS/SQS' 170 | OKActions: 171 | - !Ref Topic 172 | Period: 60 173 | Statistic: Maximum 174 | Threshold: !Ref ApproximateNumberOfMessagesVisibleThreshold 175 | TreatMissingData: notBreaching 176 | Outputs: 177 | StackName: 178 | Description: 'Stack name.' 179 | Value: !Sub '${AWS::StackName}' 180 | StackTemplate: 181 | Description: 'Stack template.' 182 | Value: 'marbot-sqs-queue' 183 | StackVersion: 184 | Description: 'Stack version.' 185 | Value: '1.4.1' 186 | -------------------------------------------------------------------------------- /marbot-standalone-topic.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: Standalone SNS topic (https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | Parameters: 26 | EndpointId: 27 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 28 | Type: String 29 | Stage: 30 | Description: 'marbot stage (never change this!).' 31 | Type: String 32 | Default: v1 33 | AllowedValues: [v1, dev] 34 | Resources: 35 | Topic: 36 | Type: 'AWS::SNS::Topic' 37 | Properties: {} 38 | TopicPolicy: 39 | Type: 'AWS::SNS::TopicPolicy' 40 | Properties: 41 | PolicyDocument: 42 | Id: Id1 43 | Version: '2012-10-17' 44 | Statement: 45 | - Sid: Sid1 46 | Effect: Allow 47 | Principal: 48 | Service: 49 | - 'events.amazonaws.com' # Allow EventBridge 50 | - 'budgets.amazonaws.com' # Allow Budget Notifications 51 | - 'rds.amazonaws.com' # Allow RDS Events 52 | - 's3.amazonaws.com' # Allow S3 Event Notifications 53 | - 'backup.amazonaws.com' # Allow Backup Events 54 | - 'codestar-notifications.amazonaws.com' # Allow CodeStar Notifications 55 | - 'devops-guru.amazonaws.com' # Allow DevOps Guru Notifications 56 | Action: 'sns:Publish' 57 | Resource: !Ref Topic 58 | - Sid: Sid2 59 | Effect: Allow 60 | Principal: 61 | AWS: '*' # Allow CloudWatch Alarms, ElastiCache Notifications, Elastic Beanstalk Notifications, Auto Scaling Notification 62 | Action: 'sns:Publish' 63 | Resource: !Ref Topic 64 | Condition: 65 | StringEquals: 66 | 'AWS:SourceOwner': !Ref 'AWS::AccountId' 67 | - Sid: Sid3 68 | Effect: Allow 69 | Principal: 70 | Service: 'ses.amazonaws.com' # Allow SES Notifications & Events 71 | Action: 'sns:Publish' 72 | Resource: !Ref Topic 73 | Condition: 74 | StringEquals: 75 | 'AWS:Referer': !Ref 'AWS::AccountId' 76 | - Sid: Sid4 # Allow Amazon Inspector (https://docs.aws.amazon.com/inspector/latest/userguide/inspector_assessments.html#sns-topic) 77 | Effect: Allow 78 | Principal: 79 | AWS: 80 | - 'arn:aws:iam::646659390643:root' 81 | - 'arn:aws:iam::316112463485:root' 82 | - 'arn:aws:iam::166987590008:root' 83 | - 'arn:aws:iam::758058086616:root' 84 | - 'arn:aws:iam::162588757376:root' 85 | - 'arn:aws:iam::526946625049:root' 86 | - 'arn:aws:iam::454640832652:root' 87 | - 'arn:aws:iam::406045910587:root' 88 | - 'arn:aws:iam::537503971621:root' 89 | - 'arn:aws:iam::357557129151:root' 90 | - 'arn:aws:iam::146838936955:root' 91 | - 'arn:aws:iam::453420244670:root' 92 | Action: 'sns:Publish' 93 | Resource: !Ref Topic 94 | Topics: 95 | - !Ref Topic 96 | TopicEndpointSubscription: 97 | DependsOn: TopicPolicy 98 | Type: 'AWS::SNS::Subscription' 99 | Properties: 100 | DeliveryPolicy: 101 | healthyRetryPolicy: 102 | minDelayTarget: 1 103 | maxDelayTarget: 60 104 | numRetries: 100 105 | numNoDelayRetries: 0 106 | backoffFunction: exponential 107 | throttlePolicy: 108 | maxReceivesPerSecond: 1 109 | Endpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 110 | Protocol: https 111 | TopicArn: !Ref Topic 112 | MonitoringJumpStartEvent: 113 | DependsOn: TopicEndpointSubscription 114 | Type: 'AWS::Events::Rule' 115 | Properties: 116 | Description: 'Monitoring Jump Start connection. (created by marbot)' 117 | ScheduleExpression: 'rate(30 days)' 118 | State: ENABLED 119 | Targets: 120 | - Arn: !Ref Topic 121 | Id: marbot 122 | Input: !Sub | 123 | { 124 | "Type": "monitoring-jump-start-connection", 125 | "StackTemplate": "marbot-standalone-topic", 126 | "StackVersion": "1.3.0", 127 | "Partition": "${AWS::Partition}", 128 | "AccountId": "${AWS::AccountId}", 129 | "Region": "${AWS::Region}", 130 | "StackId": "${AWS::StackId}", 131 | "StackName": "${AWS::StackName}" 132 | } 133 | Outputs: 134 | StackName: 135 | Description: 'Stack name.' 136 | Value: !Sub '${AWS::StackName}' 137 | StackTemplate: 138 | Description: 'Stack template.' 139 | Value: 'marbot-standalone-topic' 140 | StackVersion: 141 | Description: 'Stack version.' 142 | Value: '1.3.0' 143 | TopicName: 144 | Description: 'The name of the SNS topic.' 145 | Value: !GetAtt 'Topic.TopicName' 146 | Export: 147 | Name: !Sub '${AWS::StackName}-TopicName' 148 | TopicArn: 149 | Description: 'The ARN of the SNS topic.' 150 | Value: !Ref Topic 151 | Export: 152 | Name: !Sub '${AWS::StackName}-TopicArn' 153 | -------------------------------------------------------------------------------- /marbot-synthetics-website.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: Synthetics Website (https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | - Label: 26 | default: 'Canary' 27 | Parameters: 28 | - URL 29 | - Name 30 | - Rate 31 | - Label: 32 | default: 'Expectations' 33 | Parameters: 34 | - ExpectedTitle 35 | - ExpectedElement 36 | - Label: 37 | default: 'Thresholds' 38 | Parameters: 39 | - SuccessPercentThreshold 40 | Parameters: 41 | EndpointId: 42 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 43 | Type: String 44 | Stage: 45 | Description: 'marbot stage (never change this!).' 46 | Type: String 47 | Default: v1 48 | AllowedValues: [v1, dev] 49 | URL: 50 | Description: 'The URL to monitor' 51 | Type: String 52 | Default: 'https://website.com' 53 | Name: 54 | Description: 'Canary name' 55 | Type: String 56 | AllowedPattern: '^[0-9a-z_\-]+$' 57 | MinLength: 1 58 | MaxLength: 21 59 | Rate: 60 | Description: 'How often should the test run?' 61 | Type: String 62 | AllowedValues: 63 | - 'rate(5 minutes)' 64 | - 'rate(10 minutes)' 65 | - 'rate(15 minutes)' 66 | - 'rate(30 minutes)' 67 | - 'rate(45 minutes)' 68 | - 'rate(1 hour)' 69 | Default: 'rate(15 minutes)' 70 | ExpectedTitle: 71 | Description: 'Search for the following string in the title (leave empty to disable)' 72 | Type: String 73 | Default: '' 74 | ExpectedElement: 75 | Description: 'Search for the following Element in the HTML document using a selector (e.g., .class, #id, h1; details https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors) (leave empty to disable)' 76 | Type: String 77 | Default: '' 78 | SuccessPercentThreshold: 79 | Description: 'The minimum percentage of succesful runs (set to -1 to disable).' 80 | Type: Number 81 | Default: 90 82 | MinValue: -1 83 | MaxValue: 100 84 | Conditions: 85 | HasExpectedTitle: !Not [!Equals [!Ref ExpectedTitle, '']] 86 | HasExpectedElement: !Not [!Equals [!Ref ExpectedElement, '']] 87 | HasSuccessPercentThreshold: !Not [!Equals [!Ref SuccessPercentThreshold, '-1']] 88 | Resources: 89 | ########################################################################## 90 | # # 91 | # TOPIC # 92 | # # 93 | ########################################################################## 94 | Topic: 95 | Type: 'AWS::SNS::Topic' 96 | Properties: {} 97 | TopicPolicy: 98 | Type: 'AWS::SNS::TopicPolicy' 99 | Properties: 100 | PolicyDocument: 101 | Id: Id1 102 | Version: '2012-10-17' 103 | Statement: 104 | - Sid: Sid1 105 | Effect: Allow 106 | Principal: 107 | Service: 'events.amazonaws.com' # Allow EventBridge 108 | Action: 'sns:Publish' 109 | Resource: !Ref Topic 110 | - Sid: Sid2 111 | Effect: Allow 112 | Principal: 113 | AWS: '*' # Allow CloudWatch Alarms 114 | Action: 'sns:Publish' 115 | Resource: !Ref Topic 116 | Condition: 117 | StringEquals: 118 | 'AWS:SourceOwner': !Ref 'AWS::AccountId' 119 | Topics: 120 | - !Ref Topic 121 | TopicEndpointSubscription: 122 | DependsOn: TopicPolicy 123 | Type: 'AWS::SNS::Subscription' 124 | Properties: 125 | DeliveryPolicy: 126 | healthyRetryPolicy: 127 | minDelayTarget: 1 128 | maxDelayTarget: 60 129 | numRetries: 100 130 | numNoDelayRetries: 0 131 | backoffFunction: exponential 132 | throttlePolicy: 133 | maxReceivesPerSecond: 1 134 | Endpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 135 | Protocol: https 136 | TopicArn: !Ref Topic 137 | MonitoringJumpStartEvent: 138 | DependsOn: TopicEndpointSubscription 139 | Type: 'AWS::Events::Rule' 140 | Properties: 141 | Description: 'Monitoring Jump Start connection. (created by marbot)' 142 | ScheduleExpression: 'rate(30 days)' 143 | State: ENABLED 144 | Targets: 145 | - Arn: !Ref Topic 146 | Id: marbot 147 | Input: !Sub | 148 | { 149 | "Type": "monitoring-jump-start-connection", 150 | "StackTemplate": "marbot-synthetics-website", 151 | "StackVersion": "1.1.1", 152 | "Partition": "${AWS::Partition}", 153 | "AccountId": "${AWS::AccountId}", 154 | "Region": "${AWS::Region}", 155 | "StackId": "${AWS::StackId}", 156 | "StackName": "${AWS::StackName}" 157 | } 158 | ########################################################################## 159 | # # 160 | # CANARY # 161 | # # 162 | ########################################################################## 163 | CanaryBucket: 164 | Type: 'AWS::S3::Bucket' 165 | Properties: {} 166 | CanaryRole: 167 | Type: 'AWS::IAM::Role' 168 | Properties: 169 | AssumeRolePolicyDocument: 170 | Version: '2012-10-17' 171 | Statement: 172 | - Effect: Allow 173 | Principal: 174 | Service: 'lambda.amazonaws.com' 175 | Action: 'sts:AssumeRole' 176 | Policies: 177 | - PolicyName: execution 178 | PolicyDocument: 179 | Version: '2012-10-17' 180 | Statement: 181 | - Effect: Allow 182 | Action: 's3:ListAllMyBuckets' 183 | Resource: '*' 184 | - Effect: Allow 185 | Action: 's3:PutObject' 186 | Resource: !Sub '${CanaryBucket.Arn}/*' 187 | - Effect: Allow 188 | Action: 's3:GetBucketLocation' 189 | Resource: !GetAtt 'CanaryBucket.Arn' 190 | - Effect: Allow 191 | Action: 'cloudwatch:PutMetricData' 192 | Resource: '*' 193 | Condition: 194 | StringEquals: 195 | 'cloudwatch:namespace': CloudWatchSynthetics 196 | CanaryLogGroup: 197 | Type: 'AWS::Logs::LogGroup' 198 | Properties: 199 | LogGroupName: !Sub '/aws/lambda/cwsyn-${Canary}-${Canary.Id}' 200 | RetentionInDays: 14 201 | CanaryPolicy: 202 | Type: 'AWS::IAM::Policy' 203 | Properties: 204 | PolicyDocument: 205 | Statement: 206 | - Effect: Allow 207 | Action: 208 | - 'logs:CreateLogStream' 209 | - 'logs:PutLogEvents' 210 | Resource: !GetAtt 'CanaryLogGroup.Arn' 211 | PolicyName: logs 212 | Roles: 213 | - !Ref CanaryRole 214 | Canary: 215 | Type: 'AWS::Synthetics::Canary' 216 | Properties: 217 | ArtifactS3Location: !Sub 's3://${CanaryBucket}' 218 | Code: 219 | Handler: 'index.handler' 220 | Script: !Sub 221 | - | 222 | const synthetics = require('Synthetics'); 223 | const log = require('SyntheticsLogger'); 224 | exports.handler = async () => { 225 | const page = await synthetics.getPage(); 226 | const response = await page.goto('${URL}', {waitUntil: 'domcontentloaded', timeout: 30000}); 227 | try { 228 | ${ElementCode} 229 | ${TitleCode} 230 | if (response.status() !== 200) { 231 | throw(new Error('Failed to load page!')); 232 | } 233 | } finally { 234 | await synthetics.takeScreenshot('loaded', 'result'); 235 | } 236 | }; 237 | - URL: !Ref URL 238 | ElementCode: !If [HasExpectedElement, !Sub 'await page.waitFor(''${ExpectedElement}'', {timeout: 15000});', 'await page.waitFor(15000);'] 239 | TitleCode: !If [HasExpectedTitle, !Sub 'const title = await page.title(); if (!title.includes(''${ExpectedTitle}'')) {throw new Error(''title not as expected'')}', ''] 240 | ExecutionRoleArn: !GetAtt 'CanaryRole.Arn' 241 | FailureRetentionPeriod: 30 242 | Name: !Ref Name 243 | RunConfig: 244 | TimeoutInSeconds: 60 245 | RuntimeVersion: 'syn-1.0' 246 | Schedule: 247 | DurationInSeconds: '0' # run forever 248 | Expression: !Ref Rate 249 | StartCanaryAfterCreation: true 250 | SuccessRetentionPeriod: 30 251 | ########################################################################## 252 | # # 253 | # ALARMS # 254 | # # 255 | ########################################################################## 256 | SuccessPercentAlarm: 257 | DependsOn: TopicEndpointSubscription 258 | Condition: HasSuccessPercentThreshold 259 | Type: 'AWS::CloudWatch::Alarm' 260 | Properties: 261 | AlarmActions: 262 | - !Ref Topic 263 | AlarmDescription: 'Canary is failing. (created by marbot)' 264 | ComparisonOperator: LessThanThreshold 265 | Dimensions: 266 | - Name: CanaryName 267 | Value: !Ref Canary 268 | EvaluationPeriods: 1 269 | MetricName: SuccessPercent 270 | Namespace: CloudWatchSynthetics 271 | OKActions: 272 | - !Ref Topic 273 | Period: 300 274 | Statistic: Minimum 275 | Threshold: !Ref SuccessPercentThreshold 276 | TreatMissingData: notBreaching 277 | Outputs: 278 | StackName: 279 | Description: 'Stack name.' 280 | Value: !Sub '${AWS::StackName}' 281 | StackTemplate: 282 | Description: 'Stack template.' 283 | Value: 'marbot-synthetics-website' 284 | StackVersion: 285 | Description: 'Stack version.' 286 | Value: '1.1.1' 287 | -------------------------------------------------------------------------------- /marbot-workspaces.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Copyright widdix GmbH 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | AWSTemplateFormatVersion: '2010-09-09' 16 | Description: 'marbot.io: Amazon WorkSpaces monitoring (https://github.com/marbot-io/monitoring-jump-start)' 17 | Metadata: 18 | 'AWS::CloudFormation::Interface': 19 | ParameterGroups: 20 | - Label: 21 | default: 'marbot endpoint' 22 | Parameters: 23 | - EndpointId 24 | - Stage 25 | - Label: 26 | default: 'WorkSpaces' 27 | Parameters: 28 | - DirectoryId 29 | Parameters: 30 | EndpointId: 31 | Description: 'Your marbot endpoint ID (to get this value: select a channel where marbot belongs to and send a message like this: "@marbot show me my endpoint id").' 32 | Type: String 33 | Stage: 34 | Description: 'marbot stage (never change this!).' 35 | Type: String 36 | Default: v1 37 | AllowedValues: [v1, dev] 38 | DirectoryId: 39 | Description: 'The identifier of the AWS Directory Service directory for the WorkSpace.' 40 | Type: String 41 | Resources: 42 | ########################################################################## 43 | # # 44 | # TOPIC # 45 | # # 46 | ########################################################################## 47 | Topic: 48 | Type: 'AWS::SNS::Topic' 49 | Properties: {} 50 | TopicPolicy: 51 | Type: 'AWS::SNS::TopicPolicy' 52 | Properties: 53 | PolicyDocument: 54 | Id: Id1 55 | Version: '2012-10-17' 56 | Statement: 57 | - Sid: Sid1 58 | Effect: Allow 59 | Principal: 60 | Service: 'events.amazonaws.com' # Allow EventBridge 61 | Action: 'sns:Publish' 62 | Resource: !Ref Topic 63 | - Sid: Sid2 64 | Effect: Allow 65 | Principal: 66 | AWS: '*' # Allow CloudWatch Alarms 67 | Action: 'sns:Publish' 68 | Resource: !Ref Topic 69 | Condition: 70 | StringEquals: 71 | 'AWS:SourceOwner': !Ref 'AWS::AccountId' 72 | Topics: 73 | - !Ref Topic 74 | TopicEndpointSubscription: 75 | DependsOn: TopicPolicy 76 | Type: 'AWS::SNS::Subscription' 77 | Properties: 78 | DeliveryPolicy: 79 | healthyRetryPolicy: 80 | minDelayTarget: 1 81 | maxDelayTarget: 60 82 | numRetries: 100 83 | numNoDelayRetries: 0 84 | backoffFunction: exponential 85 | throttlePolicy: 86 | maxReceivesPerSecond: 1 87 | Endpoint: !Sub 'https://api.marbot.io/${Stage}/endpoint/${EndpointId}' 88 | Protocol: https 89 | TopicArn: !Ref Topic 90 | MonitoringJumpStartEvent: 91 | DependsOn: TopicEndpointSubscription 92 | Type: 'AWS::Events::Rule' 93 | Properties: 94 | Description: 'Monitoring Jump Start connection. (created by marbot)' 95 | ScheduleExpression: 'rate(30 days)' 96 | State: ENABLED 97 | Targets: 98 | - Arn: !Ref Topic 99 | Id: marbot 100 | Input: !Sub | 101 | { 102 | "Type": "monitoring-jump-start-connection", 103 | "StackTemplate": "marbot-workspaces", 104 | "StackVersion": "1.1.1", 105 | "Partition": "${AWS::Partition}", 106 | "AccountId": "${AWS::AccountId}", 107 | "Region": "${AWS::Region}", 108 | "StackId": "${AWS::StackId}", 109 | "StackName": "${AWS::StackName}" 110 | } 111 | ########################################################################## 112 | # # 113 | # ALARMS # 114 | # # 115 | ########################################################################## 116 | UnhealthyAlarm: 117 | DependsOn: TopicEndpointSubscription 118 | Type: 'AWS::CloudWatch::Alarm' 119 | Properties: 120 | AlarmActions: 121 | - !Ref Topic 122 | AlarmDescription: 'Workspace has failed. (created by marbot)' 123 | ComparisonOperator: GreaterThanThreshold 124 | Dimensions: 125 | - Name: DirectoryId 126 | Value: !Ref DirectoryId 127 | EvaluationPeriods: 1 128 | MetricName: Unhealthy 129 | Namespace: 'AWS/WorkSpaces' 130 | OKActions: 131 | - !Ref Topic 132 | Period: 600 133 | Statistic: Sum 134 | Threshold: 0 135 | TreatMissingData: notBreaching 136 | ########################################################################## 137 | # # 138 | # EVENTS # 139 | # # 140 | ########################################################################## 141 | UnsuccessfulEvent: 142 | DependsOn: TopicEndpointSubscription 143 | Type: 'AWS::Events::Rule' 144 | Properties: 145 | Description: 'User successfully logged in to a WorkSpace. (created by marbot)' 146 | EventPattern: 147 | source: 148 | - 'aws.workspaces' 149 | 'detail-type': 150 | - 'WorkSpaces Access' 151 | detail: 152 | directoryId: 153 | - !Ref DirectoryId 154 | State: ENABLED 155 | Targets: 156 | - Arn: !Ref Topic 157 | Id: marbot 158 | Outputs: 159 | StackName: 160 | Description: 'Stack name.' 161 | Value: !Sub '${AWS::StackName}' 162 | StackTemplate: 163 | Description: 'Stack template.' 164 | Value: 'marbot-workspaces' 165 | StackVersion: 166 | Description: 'Stack version.' 167 | Value: '1.1.1' 168 | --------------------------------------------------------------------------------