├── .github └── PULL_REQUEST_TEMPLATE.md ├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README-SAR.md ├── README.md ├── images ├── personalize-monitor-architecture.png ├── personalize-monitor-cloudwatch-alarms.png ├── personalize-monitor-cloudwatch-dashboard.png └── personalize-monitor-cloudwatch-metrics.png ├── samconfig.toml ├── sar-publish.sh ├── src ├── cleanup_resources_function │ ├── README.md │ ├── __init__.py │ ├── cleanup_resources.py │ └── requirements.txt ├── dashboard_mgmt_function │ ├── README.md │ ├── __init__.py │ ├── dashboard-template.mustache │ ├── dashboard_mgmt.py │ └── requirements.txt ├── layer │ ├── README.md │ ├── __init__.py │ ├── common.py │ └── requirements.txt ├── personalize_delete_campaign_function │ ├── README.md │ ├── __init__.py │ ├── personalize_delete_campaign.py │ └── requirements.txt ├── personalize_monitor_function │ ├── README.md │ ├── __init__.py │ ├── personalize_monitor.py │ └── requirements.txt ├── personalize_stop_recommender_function │ ├── README.md │ ├── __init__.py │ ├── personalize_stop_recommender.py │ └── requirements.txt └── personalize_update_tps_function │ ├── README.md │ ├── __init__.py │ ├── personalize_update_tps.py │ └── requirements.txt └── template.yaml /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | *Issue #, if available:* 2 | 3 | *Description of changes:* 4 | 5 | 6 | By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. 7 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/ 2 | .DS_Store 3 | .vscode 4 | .aws-sam 5 | env -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *main* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 7 | the Software, and to permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 15 | 16 | -------------------------------------------------------------------------------- /README-SAR.md: -------------------------------------------------------------------------------- 1 | # Amazon Personalize Monitor 2 | 3 | This project contains the source code and supporting files for deploying a serverless application that adds monitoring, alerting, and optimzation capabilities for [Amazon Personalize](https://aws.amazon.com/personalize/), an AI service from AWS that allows you to create custom ML recommenders based on your data. Highlights include: 4 | 5 | - Generation of additional [CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) to track the average RPS and `minRecommendationRequestsPerSecond` for [recommenders](https://docs.aws.amazon.com/personalize/latest/dg/creating-recommenders.html), average TPS and `minProvisionedTPS` for [campaigns](https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html), and utilization of recommenders and campaigns over time. 6 | - [CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) to alert you via SNS/email when recommender or campaign utilization drops below a configurable threshold or has been idle for a configurable length of time (optional). 7 | - [CloudWatch dashboard](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html) populated with graph widgets for average (actual) vs provisioned TPS/RPS, recommender and campaign utilization, recommender and campaign latency, and the number of recommenders and campaigns being monitored. 8 | - Capable of monitoring campaigns and recommenders across multiple regions in the same AWS account. 9 | - Automatically [stop recommenders](https://docs.aws.amazon.com/personalize/latest/dg/stopping-starting-recommender.html) and delete campaigns that have been idle more than a configurable number of hours (optional). 10 | - Automatically reduce the `minRecommendationRequestsPerSecond` for over-provisioned recommenders and `minProvisionedTPS` for over-provisioned campaigns to optimize cost (optional). 11 | 12 | ## Why is this important? 13 | 14 | Before you can retrieve real-time recommendations from Amazon Personalize, you must create a [recommender](https://docs.aws.amazon.com/personalize/latest/dg/creating-recommenders.html) or [campaign](https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html). Often times multiple recommenders and/or campaigns are needed to provide recommendations targeting different use cases for an pplication such as user-personalization, related items, and personalized ranking. Recommenders and campaigns represent resources that are auto-scaled by Personalize to meet the demand from requests from your application. This is typically how Personalize is integrated into your applications. When an application needs to display personalized recommendations to a user, a [GetRecommendations](https://docs.aws.amazon.com/personalize/latest/dg/getting-real-time-recommendations.html#recommendations) or [GetPersonalizedRanking](https://docs.aws.amazon.com/personalize/latest/dg/getting-real-time-recommendations.html#rankings) API call is made to a recommender or campaign to retrieve recommendations. Just like monitoring your own application components is important, monitoring your Personalize recommenders and campaigns is also important and considered a best practice. This application is designed to help you do just that. 15 | 16 | When you provision a recommender using the [CreateRecommender](https://docs.aws.amazon.com/personalize/latest/dg/API_CreateRecommender.html) API or a campaign using the [CreateCampaign](https://docs.aws.amazon.com/personalize/latest/dg/API_CreateCampaign.html) API, you can optionally specify a value for `minRecommendationRequestsPerSecond` and `minProvisionedTPS`, respectively. This value specifies the requested _minimum_ requests/transactions (calls) per second that Amazon Personalize will support for that recommender or campaign. As your actual request volume to a recommender or campaign approaches its `minRecommendationRequestsPerSecond` or `minProvisionedTPS`, Personalize will automatically provision additional resources to support your request volume. Then when request volume drops, Personalize will automatically scale back down **no lower** than `minRecommendationRequestsPerSecond` or `minProvisionedTPS`. **Since you are billed based on the higher of actual TPS and `minRecommendationRequestsPerSecond`/`minProvisionedTPS`, it is therefore important to not over-provision your recommenders or campaigns to optimize cost.** This also means that leaving a recommender or campaign idle (active but no longer in-use) will result in unnecessary charges. This application gives you the tools to visualize your recommender and campaign utilization, to be notified when there is an opportunity to tune your recommender or campaign provisioning, and even take action to reduce and eliminate over-provisioning. 17 | 18 | > General best practice is to set `minRecommendationRequestsPerSecond` and `minProvisionedTPS` to `1`, or your low watermark for recommendations requests, and let Personalize auto-scale recommender or campaign resources to meet actual demand. 19 | 20 | See the Amazon Personalize [pricing page](https://aws.amazon.com/personalize/pricing/) for full details on costs. 21 | 22 | ### CloudWatch Dashboard 23 | 24 | When you deploy this project, a [CloudWatch dashboard](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html) is built with widgets for Actual vs Provisioned TPS/RPS, recommender/campaign utilization, and recommender/campaign latency for the recommenders and campaigns you wish to monitor. The dashboard gives you critical visual information to assess how your recommenders and campaigns are performing and being utilized. The data in these graphs can help you properly tune your recommender's `minRecommendationRequestsPerSecond` and campaign's `minProvisionedTPS`. 25 | 26 | ![Personalize Monitor CloudWatch Dashboard](https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/master/images/personalize-monitor-cloudwatch-dashboard.png) 27 | 28 | For more details on the CloudWatch dashboard created and maintained by this application, see the [dashboard_mgmt](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/dashboard_mgmt_function/) function page. 29 | 30 | ### CloudWatch Alarms 31 | 32 | At deployment time, you can optionally have this application automatically create [CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) that will alert you when a monitored recommender's or campaign's utilization drops below a threshold you define for nine out of twelve evaluation periods. Since the intervals are 5 minutes, that means that nine of the 5 minute evaluations over a 1 hour span must be below the threshold to enter an alarm status. The same rule applies to transition from alarm to OK status. Similarly, the idle recommender/campaign alarm will alert you when there has been no request activity for a recommender/campaign for a configurable amount of time. The alarms will be setup to alert you via email through an SNS topic in each region where resources are monitored. Once the alarms are setup, you can alternatively link them to any operations and messaging tools you already use (i.e. Slack, PagerDuty, etc). 33 | 34 | ![Personalize Monitor CloudWatch Alarms](https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/master/images/personalize-monitor-cloudwatch-alarms.png) 35 | 36 | For more details on the CloudWatch alarms created by this application, see the [personalize_monitor](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/personalize_monitor_function/) function page. 37 | 38 | ### CloudWatch Metrics 39 | 40 | To support the CloudWatch dashboard and alarms described above, a few new custom [CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) are added for the monitored recommenders and campaigns. These metrics are populated by the [personalize_monitor](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/personalize_monitor_function/) Lambda function that is setup to run every 5 minutes in your account. You can find these metrics in CloudWatch under Metrics in the "PersonalizeMonitor" namespace. 41 | 42 | ![Personalize Monitor CloudWatch Metrics](https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/master/images/personalize-monitor-cloudwatch-metrics.png) 43 | 44 | For more details on the custom metrics created by this application, see the [personalize_monitor](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/personalize_monitor_function/) function page. 45 | 46 | ### Cost optimization (optional) 47 | 48 | This application can be optionally configured to automatically perform cost optimization actions for your Amazon Personalize recommenders and campaigns. 49 | 50 | #### Idle recommenders/campaigns 51 | Idle recommenders/campaigns are those that have been provisioned but are not receiving any `GetRecommendations`/`GetPersonalizedRanking` calls. Since costs are incurred while a recommender/campaign is active regardless of whether it receives any requests, detecting and eliminating these idle recommenders/campaigns can be an important cost optimization activity. This can be particularly useful in non-production AWS accounts such as development and testing where you are more likely to have abandoned recommenders/campaigns. 52 | 53 | Note that this is where an important difference between recommenders and campaigns comes into play. Recommenders can be started and stopped to provision and de-provision the resources needed for real-time inference. When a recommender is stopped, the real-time inference resources are deleted (which pauses ongoing recommender charges) but the underlying model artifacts are preserved. This allows you to later start the recommender without having to train the model again. Campaigns, on the other hand, represent only the resources needed for real-time inference for a solution version. Therefore, you must delete a campaign to release the real-time resources and pause campaign charges. Since the solution version is not being deleted, model artifacts are preserved similar to when a recommender is stopped. 54 | 55 | See the `AutoDeleteOrStopIdleResources` and `IdleThresholdHours` deployment parameters in the installation instructions below and the [personalize_monitor](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/personalize_monitor_function#automatically-deleting-idle-campaigns-optional) function for details. 56 | 57 | #### Over-provisioned recommenders/campaigns 58 | 59 | Properly provisioning recommenders and campaigns, as described earlier, is also an important cost optimization activity. This application can be configured to automatically reduce a recommender's `minRecommendationRequestsPerSecond` or a campaign's `minProvisionedTPS` based on actual request volume. This will optimize recommender/campaign utilization when request volume is lower while relying on Personalize to auto-scale based on actual activity. See the `AutoAdjustMinTPS` deployment parameter below and the [personalize_monitor](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/personalize_monitor_function#automatically-adjusting-campaign-minprovisionedtps-optional) function for details. 60 | 61 | ### Architecture 62 | 63 | The following diagram depicts how the Lambda functions in this application work together using an event-driven approach built on [Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/what-is-amazon-eventbridge.html). The [personalize_monitor](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/personalize_monitor_function/) function is invoked every five minutes to generate CloudWatch metric data based on the monitored recommenders/campaigns and create alarms (if configured). It also generates events which are published to EventBridge that trigger activities such as optimizing `minRecommendationRequestsPerSecond`/`minProvisionedTPS`, stopping idle recommenders, deleting idle campaigns, updating the Personalize Monitor CloudWatch dashboard, and sending notifications. This approach allows you to more easily integrate these functions into your own operations by sending your own events, say, to trigger the dashboard to be rebuilt after you create a campaign or register your own targets to events generated by this application. 64 | 65 | ![Personalize Monitor Architecture](https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/master/images/personalize-monitor-architecture.png) 66 | 67 | See the readme pages for each function for details on the events that they produce and consume. 68 | 69 | ## Installing the application 70 | 71 | ***IMPORTANT NOTE:** Deploying this application in your AWS account will create and consume AWS resources, which will cost money. For example, the CloudWatch dashboard, the Lambda function that collects additional monitoring metrics is run every 5 minutes, CloudWatch alarms, logging, and so on. Therefore, if after installing this application you choose not to use it as part of your monitoring strategy, be sure to follow the Uninstall instructions in the next section to avoid ongoing charges and to clean up all data.* 72 | 73 | | Parameter | Description | Default | 74 | | --- | --- | --- | 75 | | CampaignARNs | Comma separated list of Personalize campaign ARNs to monitor or `all` to monitor all active campaigns. It is recommended to use `all` so that any new campaigns that are added after deployment will be automatically detected, monitored, and have alarms created (optional) | `all` | 76 | | RecommenderARNs | Comma separated list of Personalize recommender ARNs to monitor or `all` to monitor all active recommenders. It is recommended to use `all` so that any new recommenders that are added after deployment will be automatically detected, monitored, and have alarms created (optional) | `all` | 77 | | Regions | Comma separated list of AWS regions to monitor recommenders/campaigns. Only applicable when `all` is used for `CampaignARNs` or `RecommenderARNs`. Leaving this value blank will default to the region where this application is deployed (i.e. `AWS Region` parameter above). | | 78 | | AutoCreateUtilizationAlarms | Whether to automatically create a utilization CloudWatch alarm for each monitored recommender or campaign. | `Yes` | 79 | | UtilizationThresholdAlarmLowerBound | Minimum threshold value (in percent) to enter alarm state for recommender/campaign utilization. This value is only relevant if `AutoCreateAlarms` is `Yes`. | `100` | 80 | | AutoAdjustMinTPS | Whether to automatically compare recommender/campaign request activity against the configured `minRecommendationRequestsPerSecond`/`minProvisionedTPS` to determine if `minRecommendationRequestsPerSecond`/`minProvisionedTPS` can be reduced to optimize utilization. | `Yes` | 81 | | AutoCreateIdleAlarms | Whether to automatically create a idle detection CloudWatch alarm for each monitored recommender/campaign. | `Yes` | 82 | | IdleThresholdHours | Number of hours that a recommender/campaign must be idle (i.e. no requests) before it is automatically stopped (recommender) or deleted (campaign). `AutoDeleteOrStopIdleResources` must be `Yes` for idle recommender stop or campaign deletion to occur. | `24` | 83 | | AutoDeleteOrStopIdleResources | Whether to automatically stop idle recommenders or delete idle campaigns. An idle recommender/campaign is one that has not had any requests in `IdleThresholdHours` hours. | `No` | 84 | | NotificationEndpoint | Email address to receive alarm and ok notifications, recommender stop/update, campaign delete/update events (optional). An [SNS](https://aws.amazon.com/sns/) topic is created in each region where resources are monitored and this email address will be added as a subscriber to the topic(s). You will receive a confirmation email for the SNS topic subscription in each region so be sure to click the confirmation link in that email to ensure you receive notifications. | | 85 | 86 | ## Uninstalling the application 87 | 88 | To remove the resources created by this application in your AWS account, be sure to uninstall the application. 89 | 90 | ## FAQs 91 | 92 | ***Q: Can I use this application to determine my accumulated inference charges during the month?*** 93 | 94 | ***A:*** No! Although the `averageRPS`/`averageTPS` and `minRecommendationRequestsPerSecond`/`minProvisionedTPS` custom metrics generated by this application may be used to calculate an approximation of your accumulated inference charges, they should not be used as a substitute or proxy for actual Personalize inference costs. Always consult your AWS Billing Dashboard for actual service charges. 95 | 96 | ***Q: What is an ideal recommender/campaign utilization percentage? Is it okay if my recommender/campaign utilization is over 100%?*** 97 | 98 | ***A:*** The recommender/campaign utilization metric is a measure of your actual usage compared against the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` for the recommender/campaign. Any utilization value >= 100% is ideal since that means you are not over-provisioning, and therefore not over-paying, for resources. You're letting Personalize handle the scaling in/out of the recommender/campaign. Anytime your utilization is below 100%, more resources are provisioned than are needed to satisfy the volume of requests at that time. 99 | 100 | ***Q: How can I tell if Personalize is scaling out fast enough?*** 101 | 102 | ***A:*** Compare the "Actual vs Provisioned RPS/TPS" graph to the "Recommender/Campaign Latency" graph on the Personalize Monitor CloudWatch dashboard. When your Actual RPS/TPS increases/spikes, does the latency for the same recommender/campaign at the same time stay consistent? If so, this tells you that Personalize is maintaining response time as request volume increases and therefore scaling fast enough to meet demand. However, if latency increases significantly and to an unacceptable level for your application, this is an indication that Personalize may not be scaling fast enough to meet your traffic patterns. See the answer to the following question for some options. 103 | 104 | ***Q: My workload is very spiky and Personalize is not scaling fast enough. What can I do?*** 105 | 106 | ***A:*** First, be sure to confirm that it is Personalize that is not scaling fast enough by reviewing the answer above. If the spikes are predictable or cyclical, you can pre-warm capacity in your recommender/campaign ahead of time by adjusting the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` using the [UpdateRecommender](https://docs.aws.amazon.com/personalize/latest/dg/API_UpdateRecommender.html) or [UpdateCampaign](https://docs.aws.amazon.com/personalize/latest/dg/API_UpdateCampaign.html) API and then dropping it back down after the traffic subsides. For example, increase capacity 30 minutes before a flash sale or marketing campaign is launched that brings a temporary surge in traffic. This can be done manually using the AWS console or automated by using [CloudWatch events](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/WhatIsCloudWatchEvents.html) based on a schedule or triggered based on an event in your application. The [personalize_update_tps](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/personalize_update_campaign_tps_function/) function that is deployed with this application can be used as the target for CloudWatch events or you can publish an `UpdatePersonalizeRecommenderMinRecommendationRPS` or `UpdatePersonalizeCampaignMinProvisionedTPS` event to EventBridge. If spikes in your workload are not predictable or known ahead of time, determining the optimal `minRecommendationRequestsPerSecond`/`minProvisionedTPS` to balance consistent latency vs cost is the best option. The metrics and dashboard graphs in this application can help you determine this value. 107 | 108 | ***Q: After deploying this application in my AWS account, I created some new Personalize recommenders or campaigns that I also want to monitor. How can I add them to be monitored and have them appear on my dashboard? Also, what about monitoried recommenders or campaigns that I delete?*** 109 | 110 | ***A:*** If you specified `all` for the `RecommenderARNs` or `CampaignARNs` deployment parameter (see installation instructions above), any new recommenders/campaigns you create will be automatically monitored and alarms created (if `AutoCreateAlarms` was set to `Yes`) when the recommenders/campaigns become active. Likewise, any recommenders/campaigns that are deleted will no longer be monitored. If you want this application to monitor recommenders/campaigns across multiple regions, be sure to specify the region names in the `Regions` deployment parameter. Note that this only applies when `RecommenderARNs` or `CampaignARNs` is set to `all`. The CloudWatch dashboard will be automatically rebuilt ever hour to add new recommenders and campaigns and drop deleted recommenders and campaigns. You can also trigger the dashboard to be rebuilt by publishing a `BuildPersonalizeMonitorDashboard` event to the default EventBridge event bus (see [dashboard_mgmt_function](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/dashboard_mgmt_function/)). 111 | 112 | ## Reporting issues 113 | 114 | If you encounter a bug, please create a new issue with as much detail as possible and steps for reproducing the bug. Similarly, if you have an idea for an improvement, please add an issue as well. Pull requests are also welcome! See the [Contributing Guidelines](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/CONTRIBUTING.md) for more details. 115 | 116 | ## License summary 117 | 118 | This sample code is made available under a modified MIT license. See the LICENSE file. 119 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Amazon Personalize Monitor 2 | 3 | 4 | * [Why is this important?](#Whyisthisimportant) 5 | * [Features](#Features) 6 | * [CloudWatch dashboard](#CloudWatchdashboard) 7 | * [CloudWatch alarms](#CloudWatchalarms) 8 | * [CloudWatch metrics](#CloudWatchmetrics) 9 | * [Cost optimization (optional)](#Costoptimizationoptional) 10 | * [Idle campaigns](#Idlecampaigns) 11 | * [Over-provisioned campaigns](#Over-provisionedcampaigns) 12 | * [Architecture](#Architecture) 13 | * [Installing the application](#Installingtheapplication) 14 | * [Option 1 - Install from Serverless Application Repository](#Option1-InstallfromServerlessApplicationRepository) 15 | * [Option 2 - Install using Serverless Application Model](#Option2-InstallusingServerlessApplicationModel) 16 | * [Application settings/parameters](#Applicationsettingsparameters) 17 | * [Uninstalling the application](#Uninstallingtheapplication) 18 | * [FAQs](#FAQs) 19 | * [Reporting issues](#Reportingissues) 20 | * [License summary](#Licensesummary) 21 | 22 | 26 | 27 | 28 | This project contains the source code and supporting files for deploying a serverless application that adds monitoring, alerting, and optimzation capabilities for [Amazon Personalize](https://aws.amazon.com/personalize/), an AI service from AWS that allows you to create custom ML recommenders based on your data. Highlights include: 29 | 30 | - Generation of additional [CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) to track the average RPS and `minRecommendationRequestsPerSecond` for [recommenders](https://docs.aws.amazon.com/personalize/latest/dg/creating-recommenders.html), average TPS and `minProvisionedTPS` for [campaigns](https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html), and utilization of recommenders and campaigns over time. 31 | - [CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) to alert you via SNS/email when recommender or campaign utilization drops below a configurable threshold or has been idle for a configurable length of time (optional). 32 | - [CloudWatch dashboard](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html) populated with graph widgets for average (actual) vs provisioned TPS/RPS, recommender and campaign utilization, recommender and campaign latency, and the number of recommenders and campaigns being monitored. 33 | - Capable of monitoring campaigns and recommenders across multiple regions in the same AWS account. 34 | - Automatically [stop recommenders](https://docs.aws.amazon.com/personalize/latest/dg/stopping-starting-recommender.html) and delete campaigns that have been idle more than a configurable number of hours (optional). 35 | - Automatically reduce the `minRecommendationRequestsPerSecond` for over-provisioned recommenders and `minProvisionedTPS` for over-provisioned campaigns to optimize cost (optional). 36 | 37 | ## Why is this important? 38 | 39 | Before you can retrieve real-time recommendations from Amazon Personalize, you must create a [recommender](https://docs.aws.amazon.com/personalize/latest/dg/creating-recommenders.html) or [campaign](https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html). Often times multiple recommenders and/or campaigns are needed to provide recommendations targeting different use cases for an pplication such as user-personalization, related items, and personalized ranking. Recommenders and campaigns represent resources that are auto-scaled by Personalize to meet the demand from requests from your application. This is typically how Personalize is integrated into your applications. When an application needs to display personalized recommendations to a user, a [GetRecommendations](https://docs.aws.amazon.com/personalize/latest/dg/getting-real-time-recommendations.html#recommendations) or [GetPersonalizedRanking](https://docs.aws.amazon.com/personalize/latest/dg/getting-real-time-recommendations.html#rankings) API call is made to a recommender or campaign to retrieve recommendations. Just like monitoring your own application components is important, monitoring your Personalize recommenders and campaigns is also important and considered a best practice. This application is designed to help you do just that. 40 | 41 | When you provision a recommender using the [CreateRecommender](https://docs.aws.amazon.com/personalize/latest/dg/API_CreateRecommender.html) API or a campaign using the [CreateCampaign](https://docs.aws.amazon.com/personalize/latest/dg/API_CreateCampaign.html) API, you can optionally specify a value for `minRecommendationRequestsPerSecond` and `minProvisionedTPS`, respectively. This value specifies the requested _minimum_ requests/transactions (calls) per second that Amazon Personalize will support for that recommender or campaign. As your actual request volume to a recommender or campaign approaches its `minRecommendationRequestsPerSecond` or `minProvisionedTPS`, Personalize will automatically provision additional resources to support your request volume. Then when request volume drops, Personalize will automatically scale back down **no lower** than `minRecommendationRequestsPerSecond` or `minProvisionedTPS`. **Since you are billed based on the higher of actual TPS and `minRecommendationRequestsPerSecond`/`minProvisionedTPS`, it is therefore important to not over-provision your recommenders or campaigns to optimize cost.** This also means that leaving a recommender or campaign idle (active but no longer in-use) will result in unnecessary charges. This application gives you the tools to visualize your recommender and campaign utilization, to be notified when there is an opportunity to tune your recommender or campaign provisioning, and even take action to reduce and eliminate over-provisioning. 42 | 43 | > General best practice is to set `minRecommendationRequestsPerSecond` and `minProvisionedTPS` to `1`, or your low watermark for recommendations requests, and let Personalize auto-scale recommender or campaign resources to meet actual demand. 44 | 45 | See the Amazon Personalize [pricing page](https://aws.amazon.com/personalize/pricing/) for full details on costs. 46 | 47 | ## Features 48 | 49 | ### CloudWatch dashboard 50 | 51 | When you deploy this project, a [CloudWatch dashboard](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html) is built with widgets for Actual vs Provisioned TPS/RPS, recommender/campaign utilization, and recommender/campaign latency for the recommenders and campaigns you wish to monitor. The dashboard gives you critical visual information to assess how your recommenders and campaigns are performing and being utilized. The data in these graphs can help you properly tune your recommender's `minRecommendationRequestsPerSecond` and campaign's `minProvisionedTPS`. 52 | 53 | ![Personalize Monitor CloudWatch Dashboard](./images/personalize-monitor-cloudwatch-dashboard.png) 54 | 55 | For more details on the CloudWatch dashboard created and maintained by this application, see the [dashboard_mgmt](./src/dashboard_mgmt_function/) function page. 56 | 57 | ### CloudWatch alarms 58 | 59 | At deployment time, you can optionally have this application automatically create [CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) that will alert you when a monitored recommender's or campaign's utilization drops below a threshold you define for nine out of twelve evaluation periods. Since the intervals are 5 minutes, that means that nine of the 5 minute evaluations over a 1 hour span must be below the threshold to enter an alarm status. The same rule applies to transition from alarm to OK status. Similarly, the idle recommender/campaign alarm will alert you when there has been no request activity for a recommender/campaign for a configurable amount of time. The alarms will be setup to alert you via email through an SNS topic in each region where resources are monitored. Once the alarms are setup, you can alternatively link them to any operations and messaging tools you already use (i.e. Slack, PagerDuty, etc). 60 | 61 | ![Personalize Monitor CloudWatch Alarms](./images/personalize-monitor-cloudwatch-alarms.png) 62 | 63 | For more details on the CloudWatch alarms created by this application, see the [personalize_monitor](./src/personalize_monitor_function/) function page. 64 | 65 | ### CloudWatch metrics 66 | 67 | To support the CloudWatch dashboard and alarms described above, a few new custom [CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) are added for the monitored recommenders and campaigns. These metrics are populated by the [personalize_monitor](./src/personalize_monitor_function/) Lambda function that is setup to run every 5 minutes in your account. You can find these metrics in CloudWatch under Metrics in the "PersonalizeMonitor" namespace. 68 | 69 | ![Personalize Monitor CloudWatch Metrics](./images/personalize-monitor-cloudwatch-metrics.png) 70 | 71 | For more details on the custom metrics created by this application, see the [personalize_monitor](./src/personalize_monitor_function/) function page. 72 | 73 | ### Cost optimization (optional) 74 | 75 | This application can be optionally configured to automatically perform cost optimization actions for your Amazon Personalize recommenders and campaigns. 76 | 77 | #### Idle recommenders/campaigns 78 | Idle recommenders/campaigns are those that have been provisioned but are not receiving any `GetRecommendations`/`GetPersonalizedRanking` calls. Since costs are incurred while a recommender/campaign is active regardless of whether it receives any requests, detecting and eliminating these idle recommenders/campaigns can be an important cost optimization activity. This can be particularly useful in non-production AWS accounts such as development and testing where you are more likely to have abandoned recommenders/campaigns. 79 | 80 | Note that this is where an important difference between recommenders and campaigns comes into play. Recommenders can be started and stopped to provision and de-provision the resources needed for real-time inference. When a recommender is stopped, the real-time inference resources are deleted (which pauses ongoing recommender charges) but the underlying model artifacts are preserved. This allows you to later start the recommender without having to train the model again. Campaigns, on the other hand, represent only the resources needed for real-time inference for a solution version. Therefore, you must delete a campaign to release the real-time resources and pause campaign charges. Since the solution version is not being deleted, model artifacts are preserved similar to when a recommender is stopped. 81 | 82 | See the `AutoDeleteOrStopIdleResources` and `IdleThresholdHours` deployment parameters in the installation instructions below and the [personalize_monitor](./src/personalize_monitor_function#automatically-deleting-idle-campaigns-optional) function for details. 83 | 84 | #### Over-provisioned recommenders/campaigns 85 | 86 | Properly provisioning recommenders and campaigns, as described earlier, is also an important cost optimization activity. This application can be configured to automatically reduce a recommender's `minRecommendationRequestsPerSecond` or a campaign's `minProvisionedTPS` based on actual request volume. This will optimize recommender/campaign utilization when request volume is lower while relying on Personalize to auto-scale based on actual activity. See the `AutoAdjustMinTPS` deployment parameter below and the [personalize_monitor](./src/personalize_monitor_function#automatically-adjusting-campaign-minprovisionedtps-optional) function for details. 87 | 88 | ## Architecture 89 | 90 | The following diagram depicts how the Lambda functions in this application work together using an event-driven approach built on [Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/what-is-amazon-eventbridge.html). The [personalize_monitor](./src/personalize_monitor_function/) function is invoked every five minutes to generate CloudWatch metric data based on the monitored recommenders/campaigns and create alarms (if configured). It also generates events which are published to EventBridge that trigger activities such as optimizing `minRecommendationRequestsPerSecond`/`minProvisionedTPS`, stopping idle recommenders, deleting idle campaigns, updating the Personalize Monitor CloudWatch dashboard, and sending notifications. This approach allows you to more easily integrate these functions into your own operations by sending your own events, say, to trigger the dashboard to be rebuilt after you create a campaign or register your own targets to events generated by this application. 91 | 92 | ![Personalize Monitor Architecture](./images/personalize-monitor-architecture.png) 93 | 94 | See the readme pages for each function for details on the events that they produce and consume. 95 | 96 | ## Installing the application 97 | 98 | ***IMPORTANT NOTE:** Deploying this application in your AWS account will create and consume AWS resources, which will cost money. For example, the CloudWatch dashboard, the Lambda function that collects additional monitoring metrics is run every 5 minutes, CloudWatch alarms, logging, and so on. Therefore, if after installing this application you choose not to use it as part of your monitoring strategy, be sure to follow the Uninstall instructions below to clean up all resources and avoid ongoing charges.* 99 | 100 | ### Option 1 - Install from Serverless Application Repository 101 | 102 | The easiest way to deploy this application is from the [Serverless Application Repository](https://aws.amazon.com/serverless/serverlessrepo/) (SAR). 103 | 104 | 1. Within the AWS account where you wish to deploy the application, browse to the [application's page](https://serverlessrepo.aws.amazon.com/applications/arn:aws:serverlessrepo:us-east-1:316031960777:applications~Amazon-Personalize-Monitor) in the Serverless Application Repository and click **"Deploy"**. 105 | 2. Enter/update values in the **"Application settings"** panel (described below) and click **"Deploy"** again. 106 | 107 | ### Option 2 - Install using Serverless Application Model 108 | 109 | If you'd rather install the application manually, you can use the AWS [Serverless Application Model](https://aws.amazon.com/serverless/sam/) (SAM) CLI to build and deploy the application into your AWS account. 110 | 111 | To use the SAM CLI, you need the following tools. 112 | 113 | * SAM CLI - [Install the SAM CLI](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html) 114 | * [Python 3 installed](https://www.python.org/downloads/) 115 | * Docker - [Install Docker community edition](https://hub.docker.com/search/?type=edition&offering=community) 116 | 117 | Then ensure you are logged in to `public.ecr.aws` in Docker so SAM can download the Docker build images by running the following command in your shell. 118 | 119 | ```bash 120 | aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws 121 | ``` 122 | 123 | To build and deploy the application for the first time, run the following in your shell: 124 | 125 | ```bash 126 | sam build --use-container --cached 127 | sam deploy --guided 128 | ``` 129 | 130 | The first command will build the source of the application. The second command will package and deploy the application to your AWS account with a series of prompts. The following section describes the supported application parameters. 131 | 132 | ### Application settings/parameters 133 | 134 | Whether you install this application from SAR or SAM, the following parameters can be used to control how the application monitors your Personalize deployments. 135 | 136 | | Prompt/Parameter | Description | Default | 137 | | --- | --- | --- | 138 | | Stack Name | The name of the stack to deploy to CloudFormation. This should be unique to your account and region. | `personalize-monitor` | 139 | | AWS Region | The AWS region you want to deploy this application to. Note that the CloudWatch metrics Lambda function in this application will still be able to monitor campaigns across multiple regions; you will be prompted for the region(s) to monitor below. | Your current region | 140 | | Parameter CampaignARNs | Comma separated list of Personalize campaign ARNs to monitor or `all` to monitor all active campaigns. It is recommended to use `all` so that any new campaigns that are added after deployment will be automatically detected, monitored, and have alarms created (optional) | `all` | 141 | | Parameter RecommenderARNs | Comma separated list of Personalize recommender ARNs to monitor or `all` to monitor all active recommenders. It is recommended to use `all` so that any new recommenders that are added after deployment will be automatically detected, monitored, and have alarms created (optional) | `all` | 142 | | Parameter Regions | Comma separated list of AWS regions to monitor recommenders/campaigns. Only applicable when `all` is used for `CampaignARNs` or `RecommenderARNs`. Leaving this value blank will default to the region where this application is deployed (i.e. `AWS Region` parameter above). | | 143 | | Parameter AutoCreateUtilizationAlarms | Whether to automatically create a utilization CloudWatch alarm for each monitored recommender or campaign. | `Yes` | 144 | | Parameter UtilizationThresholdAlarmLowerBound | Minimum threshold value (in percent) to enter alarm state for recommender/campaign utilization. This value is only relevant if `AutoCreateAlarms` is `Yes`. | `100` | 145 | | Parameter AutoAdjustMinTPS | Whether to automatically compare recommender/campaign request activity against the configured `minRecommendationRequestsPerSecond`/`minProvisionedTPS` to determine if `minRecommendationRequestsPerSecond`/`minProvisionedTPS` can be reduced to optimize utilization. | `Yes` | 146 | | Parameter AutoCreateIdleAlarms | Whether to automatically create a idle detection CloudWatch alarm for each monitored recommender/campaign. | `Yes` | 147 | | Parameter IdleThresholdHours | Number of hours that a recommender/campaign must be idle (i.e. no requests) before it is automatically stopped (recommender) or deleted (campaign). `AutoDeleteOrStopIdleResources` must be `Yes` for idle recommender stop or campaign deletion to occur. | `24` | 148 | | Parameter AutoDeleteOrStopIdleResources | Whether to automatically stop idle recommenders or delete idle campaigns. An idle recommender/campaign is one that has not had any requests in `IdleThresholdHours` hours. | `No` | 149 | | Parameter NotificationEndpoint | Email address to receive alarm and ok notifications, recommender stop/update, campaign delete/update events (optional). An [SNS](https://aws.amazon.com/sns/) topic is created in each region where resources are monitored and this email address will be added as a subscriber to the topic(s). You will receive a confirmation email for the SNS topic subscription in each region so be sure to click the confirmation link in that email to ensure you receive notifications. | | 150 | | Confirm changes before deploy | If set to yes, any CloudFormation change sets will be shown to you before execution for manual review. If set to no, the AWS SAM CLI will automatically deploy application changes. | | 151 | | Allow SAM CLI IAM role creation | Since this application creates IAM roles to allow the Lambda functions to access AWS services, this setting must be `Yes`. | | 152 | | Save arguments to samconfig.toml | If set to yes, your choices will be saved to a configuration file inside the application, so that in the future you can just re-run `sam deploy` without parameters to deploy changes to your application. | | 153 | 154 | ## Uninstalling the application 155 | 156 | If you installed the application from the Serverless Application Repository, you can delete the application from the Lambda console in your AWS account (under Applications). 157 | 158 | Alternatively, if you installed the application using SAM, you can delete the application using the AWS CLI. Assuming you used the default application name for the stack name (`personalize-monitor`), you can run the following: 159 | 160 | ```bash 161 | aws cloudformation delete-stack --stack-name personalize-monitor 162 | ``` 163 | 164 | You can also delete the application stack in CloudFormation in the AWS console. 165 | 166 | ## FAQs 167 | 168 | ***Q: Can I use this application to determine my accumulated inference charges during the month?*** 169 | 170 | ***A:*** No! Although the `averageRPS`/`averageTPS` and `minRecommendationRequestsPerSecond`/`minProvisionedTPS` custom metrics generated by this application may be used to calculate an approximation of your accumulated inference charges, they should not be used as a substitute or proxy for actual Personalize inference costs. Always consult your AWS Billing Dashboard for actual service charges. 171 | 172 | ***Q: What is an ideal recommender/campaign utilization percentage? Is it okay if my recommender/campaign utilization is over 100%?*** 173 | 174 | ***A:*** The recommender/campaign utilization metric is a measure of your actual usage compared against the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` for the recommender/campaign. Any utilization value >= 100% is ideal since that means you are not over-provisioning, and therefore not over-paying, for resources. You're letting Personalize handle the scaling in/out of the recommender/campaign. Anytime your utilization is below 100%, more resources are provisioned than are needed to satisfy the volume of requests at that time. 175 | 176 | ***Q: How can I tell if Personalize is scaling out fast enough?*** 177 | 178 | ***A:*** Compare the "Actual vs Provisioned RPS/TPS" graph to the "Recommender/Campaign Latency" graph on the Personalize Monitor CloudWatch dashboard. When your Actual RPS/TPS increases/spikes, does the latency for the same recommender/campaign at the same time stay consistent? If so, this tells you that Personalize is maintaining response time as request volume increases and therefore scaling fast enough to meet demand. However, if latency increases significantly and to an unacceptable level for your application, this is an indication that Personalize may not be scaling fast enough to meet your traffic patterns. See the answer to the following question for some options. 179 | 180 | ***Q: My workload is very spiky and Personalize is not scaling fast enough. What can I do?*** 181 | 182 | ***A:*** First, be sure to confirm that it is Personalize that is not scaling fast enough by reviewing the answer above. If the spikes are predictable or cyclical, you can pre-warm capacity in your recommender/campaign ahead of time by adjusting the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` using the [UpdateRecommender](https://docs.aws.amazon.com/personalize/latest/dg/API_UpdateRecommender.html) or [UpdateCampaign](https://docs.aws.amazon.com/personalize/latest/dg/API_UpdateCampaign.html) API and then dropping it back down after the traffic subsides. For example, increase capacity 30 minutes before a flash sale or marketing campaign is launched that brings a temporary surge in traffic. This can be done manually using the AWS console or automated by using [CloudWatch events](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/WhatIsCloudWatchEvents.html) based on a schedule or triggered based on an event in your application. The [personalize_update_tps](./src/personalize_update_tps_function/) function that is deployed with this application can be used as the target for CloudWatch events or you can publish an `UpdatePersonalizeRecommenderMinRecommendationRPS` or `UpdatePersonalizeCampaignMinProvisionedTPS` event to EventBridge. If spikes in your workload are not predictable or known ahead of time, determining the optimal `minRecommendationRequestsPerSecond`/`minProvisionedTPS` to balance consistent latency vs cost is the best option. The metrics and dashboard graphs in this application can help you determine this value. 183 | 184 | ***Q: After deploying this application in my AWS account, I created some new Personalize recommenders or campaigns that I also want to monitor. How can I add them to be monitored and have them appear on my dashboard? Also, what about monitoried recommenders or campaigns that I delete?*** 185 | 186 | ***A:*** If you specified `all` for the `RecommenderARNs` or `CampaignARNs` deployment parameter (see installation instructions above), any new recommenders/campaigns you create will be automatically monitored and alarms created (if `AutoCreateAlarms` was set to `Yes`) when the recommenders/campaigns become active. Likewise, any recommenders/campaigns that are deleted will no longer be monitored. If you want this application to monitor recommenders/campaigns across multiple regions, be sure to specify the region names in the `Regions` deployment parameter. Note that this only applies when `RecommenderARNs` or `CampaignARNs` is set to `all`. The CloudWatch dashboard will be automatically rebuilt ever hour to add new recommenders and campaigns and drop deleted recommenders and campaigns. You can also trigger the dashboard to be rebuilt by publishing a `BuildPersonalizeMonitorDashboard` event to the default EventBridge event bus (see [dashboard_mgmt_function](./src/dashboard_mgmt_function/)). 187 | 188 | If you want to change your deployment parameters that control what recommenders/campaigns are monitored, redeploy the application using the `--guided` parameter and follow the prompts. 189 | 190 | **IMPORTANT: Redeploying this application will fully rebuild and replace your Personalize Monitor dashboard so any changes you made manually to the dashboard will be lost.** 191 | 192 | ## Reporting issues 193 | 194 | If you encounter a bug, please create a new issue with as much detail as possible and steps for reproducing the bug. Similarly, if you have an idea for an improvement, please add an issue as well. Pull requests are also welcome! See the [Contributing Guidelines](./CONTRIBUTING.md) for more details. 195 | 196 | ## License summary 197 | 198 | This sample code is made available under a modified MIT license. See the LICENSE file. 199 | -------------------------------------------------------------------------------- /images/personalize-monitor-architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/images/personalize-monitor-architecture.png -------------------------------------------------------------------------------- /images/personalize-monitor-cloudwatch-alarms.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/images/personalize-monitor-cloudwatch-alarms.png -------------------------------------------------------------------------------- /images/personalize-monitor-cloudwatch-dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/images/personalize-monitor-cloudwatch-dashboard.png -------------------------------------------------------------------------------- /images/personalize-monitor-cloudwatch-metrics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/images/personalize-monitor-cloudwatch-metrics.png -------------------------------------------------------------------------------- /samconfig.toml: -------------------------------------------------------------------------------- 1 | version = 0.1 2 | [default] 3 | [default.deploy] 4 | [default.deploy.parameters] 5 | stack_name = "personalize-monitor" 6 | s3_prefix = "personalize-monitor" 7 | parameter_overrides = "CampaignARNs=\"all\" RecommenderARNs=\"all\" AutoCreateUtilizationAlarms=\"Yes\" UtilizationThresholdAlarmLowerBound=\"100\" AutoAdjustMinTPS=\"Yes\" AutoCreateIdleAlarms=\"Yes\" IdleThresholdHours=\"24\" AutoDeleteOrStopIdleResources=\"No\"" 8 | capabilities = "CAPABILITY_IAM" 9 | resolve_s3 = true 10 | image_repositories = [] 11 | -------------------------------------------------------------------------------- /sar-publish.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Utility script to deploy application to the Serverless Application Repository. 4 | 5 | set -e 6 | 7 | # Bucket must have policy to allow SAR access. 8 | # See https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-template-publishing-applications.html 9 | BUCKET=$1 10 | REGION=$2 11 | 12 | if [ "$BUCKET" == "" ] || [ "$REGION" == "" ]; then 13 | echo "Usage: $0 BUCKET REGION" 14 | echo " where BUCKET is the S3 bucket to deploy packaged resources for SAR and REGION is the AWS region where to publish the application" 15 | exit 1 16 | fi 17 | 18 | echo "Building application" 19 | sam build --use-container --cached 20 | 21 | cd .aws-sam/build 22 | echo "Packaging application" 23 | sam package --template-file template.yaml --output-template-file packaged.yaml --s3-bucket $BUCKET 24 | echo "Publishing application to the SAR" 25 | sam publish --template packaged.yaml --region $REGION 26 | cd - -------------------------------------------------------------------------------- /src/cleanup_resources_function/README.md: -------------------------------------------------------------------------------- 1 | # Amazon Personalize Monitor - Cleanup Function 2 | 3 | This Lambda function is called as a CloudFormation custom resource when the application is deleted/uninstalled so that resources created dynamically by the application, such as CloudWatch alarms and SNS topics, are also deleted. -------------------------------------------------------------------------------- /src/cleanup_resources_function/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/src/cleanup_resources_function/__init__.py -------------------------------------------------------------------------------- /src/cleanup_resources_function/cleanup_resources.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | """Cleans up resources created by this application outside of CloudFormation 5 | 6 | This function is called as a CloudFormation custom resource. 7 | """ 8 | 9 | import boto3 10 | 11 | from crhelper import CfnResource 12 | from aws_lambda_powertools import Logger 13 | 14 | from common import ( 15 | PROJECT_NAME, 16 | ALARM_NAME_PREFIX, 17 | SNS_TOPIC_NAME, 18 | NOTIFICATIONS_RULE, 19 | NOTIFICATIONS_RULE_TARGET_ID, 20 | extract_region, 21 | get_client, 22 | determine_campaign_arns, 23 | determine_recommender_arns 24 | ) 25 | 26 | logger = Logger() 27 | helper = CfnResource() 28 | 29 | sts = boto3.client('sts') 30 | account_id = sts.get_caller_identity()['Account'] 31 | 32 | @helper.delete 33 | def delete_resources(event, _): 34 | campaign_arns = determine_campaign_arns(event.get('ResourceProperties')) 35 | recommender_arns = determine_recommender_arns(event.get('ResourceProperties')) 36 | 37 | logger.debug('Campaigns to check for resources to delete: %s', campaign_arns) 38 | logger.debug('Recommenders to check for resources to delete: %s', recommender_arns) 39 | 40 | regions = set() 41 | 42 | for campaign_arn in campaign_arns: 43 | regions.add(extract_region(campaign_arn)) 44 | 45 | for recommender_arn in recommender_arns: 46 | regions.add(extract_region(recommender_arn)) 47 | 48 | logger.debug('Regions to check for resources to delete: %s', regions) 49 | 50 | alarms_deleted = 0 51 | 52 | for region in regions: 53 | cw = get_client(service_name = 'cloudwatch', region_name = region) 54 | 55 | alarm_names_to_delete = set() 56 | 57 | alarms_paginator = cw.get_paginator('describe_alarms') 58 | for alarms_page in alarms_paginator.paginate(AlarmNamePrefix = ALARM_NAME_PREFIX, AlarmTypes=['MetricAlarm']): 59 | for alarm in alarms_page['MetricAlarms']: 60 | tags_response = cw.list_tags_for_resource(ResourceARN = alarm['AlarmArn']) 61 | 62 | for tag in tags_response['Tags']: 63 | if tag['Key'] == 'CreatedBy' and tag['Value'] == PROJECT_NAME: 64 | alarm_names_to_delete.add(alarm['AlarmName']) 65 | break 66 | 67 | if alarm_names_to_delete: 68 | # FUTURE: max check of 100 69 | logger.info('Deleting CloudWatch alarms in %s for campaigns %s and recommenders %s: %s', region, campaign_arns, recommender_arns, alarm_names_to_delete) 70 | cw.delete_alarms(AlarmNames=list(alarm_names_to_delete)) 71 | alarms_deleted += len(alarm_names_to_delete) 72 | 73 | events = get_client(service_name = 'events', region_name = region) 74 | try: 75 | logger.info('Removing targets from EventBridge notification rule %s for region %s', NOTIFICATIONS_RULE, region) 76 | events.remove_targets( 77 | Rule = NOTIFICATIONS_RULE, 78 | Ids = [ NOTIFICATIONS_RULE_TARGET_ID ] 79 | ) 80 | except events.exceptions.ResourceNotFoundException: 81 | logger.warn('EventBridge notification rule targets not found') 82 | 83 | try: 84 | logger.info('Deleting EventBridge notification rule %s for region %s', NOTIFICATIONS_RULE, region) 85 | events.delete_rule(Name = NOTIFICATIONS_RULE) 86 | except events.exceptions.ResourceNotFoundException: 87 | logger.warn('EventBridge notification rule %s does not exist', NOTIFICATIONS_RULE) 88 | 89 | sns = get_client(service_name = 'sns', region_name = region) 90 | topic_arn = f'arn:aws:sns:{region}:{account_id}:{SNS_TOPIC_NAME}' 91 | logger.info('Deleting SNS topic %s', topic_arn) 92 | # This API is idempotent so will not fail if topic does not exist 93 | sns.delete_topic(TopicArn = topic_arn) 94 | 95 | logger.info('Deleted %d alarms', alarms_deleted) 96 | 97 | @logger.inject_lambda_context(log_event=True) 98 | def lambda_handler(event, context): 99 | helper(event, context) -------------------------------------------------------------------------------- /src/cleanup_resources_function/requirements.txt: -------------------------------------------------------------------------------- 1 | # Note: AWS Lambda Power Tools dependency is satisfied by Lambda layer at runtime (part of deployment). 2 | crhelper==2.0.6 3 | -------------------------------------------------------------------------------- /src/dashboard_mgmt_function/README.md: -------------------------------------------------------------------------------- 1 | # Amazon Personalize Monitor - CloudWatch Dashboard Create/Update/Delete Function 2 | 3 | The [dashboard_mgmt.py](./dashboard_mgmt.py) Lambda function is responsible for creating, updating/refreshing, and deleting the [CloudWatch dashboard](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html) for this application. It is called in the following contexts: 4 | 5 | - As part of the CloudFormation deployment process for this application as a [custom resource](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/template-custom-resources.html) (create, update, delete). 6 | - In response to the `BuildPersonalizeMonitorDashboard` CloudWatch event being handled. This event is published to the default [Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/what-is-amazon-eventbridge.html) event bus when a monitored campaign is automatically deleted so that the dashboard can be rebuilt. An EventBridge rule is used to trigger this function to be invoked when the event is received. 7 | - At the top of every hour, triggered by a scheduled CloudWatch event. This ensures that any campaigns that are created or deleted (outside of this application) that meet the monitoring criteria are added to the dashboard. 8 | 9 | The dashboard will include line graph widgets for actual vs provisioned TPS, recommender/campaign utilization, and recommender/campaign latency for the Personalize recommenders/campaigns you wish to monitor. Here is an example of a dashboard. 10 | 11 | ![Personalize Monitor CloudWatch Dashboard](../../images/personalize-monitor-cloudwatch-dashboard.png) 12 | 13 | ## How it works 14 | 15 | The EventBridge event structure that triggers this function looks something like this: 16 | 17 | ```javascript 18 | { 19 | "source": "personalize.monitor", 20 | "detail-type": "BuildPersonalizeMonitorDashboard", 21 | "resources": [ CAMPAIGN_OR_RECOMMENDER_ARN_THAT_TRIGGERED ], 22 | "detail": { 23 | "Reason": DESCRIPTIVE_REASON_FOR_UPDATE 24 | } 25 | } 26 | ``` 27 | 28 | This function can also be invoked directly as part of your own operational process. The `Reason` is optional and just used for logging. 29 | 30 | ```javascript 31 | { 32 | "Reason": DESCRIPTIVE_REASON_FOR_UPDATE 33 | } 34 | ``` 35 | 36 | ### Create/Update 37 | 38 | When called as part of this application's create or update deployment process or as a result of the `BuildPersonalizeMonitorDashboard`, the function first determines what Personalize recommenders/campaigns should be monitored based on the CloudFormation template parameters you specify when you [installed](../README.md#installing-the-application) the application. The monitored recommenders/campaigns are grouped by [dataset group](https://docs.aws.amazon.com/personalize/latest/dg/data-prep-ds-group.html) and placed in a dictionary that is passed to the python [chevron](https://github.com/noahmorrison/chevron) library to render the [dashboard template](./dashboard-template.mustache) file. The template uses the [mustache templating language](http://mustache.github.io/) to build the widgets. 39 | 40 | Once the template is rendered as dashboard source (JSON), the dashboard source is used to create or update the CloudWatch dashboard by calling the [PutDashboard API](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutDashboard.html). 41 | 42 | Therefore, if you want to change what recommenders/campaigns are monitored, just re-deploy this application and the current dashboard will be overwritten with your recommender/campaign changes or wait for the dashboard to automatically update itself (subject to monitoring configuration). **This also means that any manual changes you make to the Personalize Monitor dashboard will be lost.** If you want to add your own widgets to the dashboard or change the existing widgets, you can fork this repository, change the [dashboard-template.mustache](./dashboard-template.mustache) template file, and deploy into your AWS account. 43 | 44 | ### Delete 45 | 46 | When the CloudFormation stack is deleted for this application, this function will delete the dashboard. 47 | 48 | ## Calling from your own code 49 | 50 | You can trigger the CloudWatch dashboard to be rebuilt by publishing the `BuildPersonalizeMonitorDashboard` detail-type from own code. Here is an example in python. 51 | 52 | ```python 53 | import boto3 54 | import json 55 | 56 | event_bridge = boto3.client('events') 57 | 58 | event_bridge.put_events( 59 | Entries=[ 60 | { 61 | 'Source': 'personalize.monitor', 62 | 'DetailType': 'BuildPersonalizeMonitorDashboard', 63 | 'Detail': json.dumps({ 64 | 'Reason': 'Rebuild the dashboard because I said so' 65 | }) 66 | } 67 | ] 68 | ) 69 | ``` -------------------------------------------------------------------------------- /src/dashboard_mgmt_function/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/src/dashboard_mgmt_function/__init__.py -------------------------------------------------------------------------------- /src/dashboard_mgmt_function/dashboard-template.mustache: -------------------------------------------------------------------------------- 1 | { 2 | "widgets": [{ 3 | "type": "metric", 4 | "width": 4, 5 | "height": 4, 6 | "properties": { 7 | "metrics": [ 8 | ["{{namespace}}", "monitoredResourceCount"] 9 | ], 10 | "view": "singleValue", 11 | "region": "{{current_region}}", 12 | "title": "Resources Monitored", 13 | "stat": "Average", 14 | "period": 300 15 | } 16 | }, 17 | { 18 | "type": "text", 19 | "width": 20, 20 | "height": 4, 21 | "properties": { 22 | "markdown": "\n## Amazon Personalize Monitor Dashboard\n*This dashboard and its widgets are automatically managed by the [Personalize Monitor](https://github.com/aws-samples/amazon-personalize-monitor/) application. This is an open-source project. Please submit bugs/fixes/ideas [here](https://github.com/aws-samples/personalization-apis/issues).*\n\nFor best practices on integrating with and operating [Amazon Personalize](https://aws.amazon.com/personalize/), please see our [Cheat Sheet](https://github.com/aws-samples/amazon-personalize-samples/blob/master/PersonalizeCheatSheet2.0.md).\n\nAmazon Personalize resources: [Service Documentation](https://docs.aws.amazon.com/personalize/latest/dg/what-is-personalize.html) | [Personalize Blog](https://aws.amazon.com/blogs/machine-learning/category/artificial-intelligence/amazon-personalize/) | [Samples on GitHub](https://github.com/aws-samples/amazon-personalize-samples)\n" 23 | } 24 | } 25 | {{#dataset_groups}} 26 | ,{ 27 | "type": "text", 28 | "width": 24, 29 | "height": 1, 30 | "properties": { 31 | "markdown": "\n### Dataset Group: **{{name}}** ({{region}}) | [Manage](https://console.aws.amazon.com/personalize/home?region={{region}}#arn:aws:personalize:{{region}}:{{account_id}}:dataset-group${{name}}/setup)\n" 32 | } 33 | }, 34 | { 35 | "type": "metric", 36 | "width": 8, 37 | "height": 8, 38 | "properties": { 39 | "metrics": [ 40 | {{#inference_resources}} 41 | ["{{namespace}}", "{{resource_min_tps_name}}", "{{resource_arn_name}}", "{{inference_arn}}", { 42 | "label": "{{name}} {{resource_min_tps_name}}" 43 | }], 44 | ["{{namespace}}", "{{resource_avg_tps_name}}", "{{resource_arn_name}}", "{{inference_arn}}", { 45 | "label": "{{name}} {{resource_avg_tps_name}}" 46 | }]{{^last_resource}}, {{/last_resource}} 47 | {{/inference_resources}} 48 | ], 49 | "region": "{{region}}", 50 | "view": "timeSeries", 51 | "stacked": false, 52 | "stat": "Average", 53 | "period": 300, 54 | "title": "Actual vs Provisioned TPS/RPS", 55 | "yAxis": { 56 | "left": { 57 | "label": "TPS/RPS", 58 | "min": 0, 59 | "showUnits": false 60 | }, 61 | "right": { 62 | "showUnits": true, 63 | "label": "" 64 | } 65 | }, 66 | "annotations": { 67 | "horizontal": [{ 68 | "label": "Lowest TPS/RPS Allowed", 69 | "value": 1 70 | }] 71 | } 72 | } 73 | }, 74 | { 75 | "type": "metric", 76 | "width": 8, 77 | "height": 8, 78 | "properties": { 79 | "view": "timeSeries", 80 | "stacked": false, 81 | "metrics": [ 82 | {{#inference_resources}} 83 | ["{{namespace}}", "{{resource_utilization_name}}", "{{resource_arn_name}}", "{{inference_arn}}", { 84 | "label": "{{name}} {{resource_utilization_name}}" 85 | }]{{^last_resource}}, {{/last_resource}} 86 | {{/inference_resources}} 87 | ], 88 | "region": "{{region}}", 89 | "title": "Campaign/Recommender Utilization" 90 | } 91 | }, 92 | { 93 | "type": "metric", 94 | "width": 8, 95 | "height": 8, 96 | "properties": { 97 | "view": "timeSeries", 98 | "stacked": false, 99 | "metrics": [ 100 | {{#inference_resources}} 101 | ["AWS/Personalize", "{{latency_metric_name}}", "{{resource_arn_name}}", "{{inference_arn}}", { 102 | "label": "{{name}} {{latency_metric_name}}" 103 | }]{{^last_resource}}, {{/last_resource}} 104 | {{/inference_resources}} 105 | ], 106 | "region": "{{region}}", 107 | "title": "Campaign/Recommender Latency" 108 | } 109 | } 110 | {{/dataset_groups}} 111 | ] 112 | } -------------------------------------------------------------------------------- /src/dashboard_mgmt_function/dashboard_mgmt.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | """Manages create/update/delete of the Personalize Monitor CloudWatch dashboard 5 | 6 | This function is called two ways: 7 | 8 | 1. From CloudFormation when the application is deployed, updated, or deleted in an AWS 9 | account. When the resource is created, this function will create the Personalize 10 | Monitor Dashboard in CloudWatch populated with widgets for monitoring Personalize 11 | resources configured as deployment parameters. 12 | 13 | When this resource is updated (i.e. redeployed), the dashboard will be rebuilt and 14 | updated/replaced. 15 | 16 | When this resource is deleted, this function will delete the CloudWatch Dashboard. 17 | 18 | 2. As the target of an EventBridge rule that signals that the dashboard should be 19 | rebuilt as a result of an event occurring. The event could be after a campaign has 20 | been deleted and therefore a good point to rebuild the dashboard. It could also 21 | be setup to periodically rebuild the dashboard on a schedule so it picks up new 22 | campaigns too. 23 | 24 | See the layer_dashboard Lambda Laye for details on how the dashboard is built. 25 | """ 26 | 27 | import json 28 | import os 29 | import boto3 30 | import chevron 31 | 32 | from crhelper import CfnResource 33 | from aws_lambda_powertools import Logger 34 | from common import ( 35 | extract_region, 36 | extract_account_id, 37 | get_client, 38 | get_configured_active_campaigns, 39 | get_configured_active_recommenders 40 | ) 41 | 42 | logger = Logger() 43 | helper = CfnResource() 44 | 45 | cloudwatch = boto3.client('cloudwatch') 46 | 47 | DASHBOARD_NAME = 'Amazon-Personalize-Monitor' 48 | 49 | def build_dashboard(event): 50 | # Will hold the data used to render the template. 51 | template_data = {} 52 | 53 | template_data['namespace'] = 'PersonalizeMonitor' 54 | template_data['current_region'] = os.environ['AWS_REGION'] 55 | 56 | logger.debug('Loading active campaigns and recommenders') 57 | 58 | campaigns = get_configured_active_campaigns(event) 59 | template_data['active_campaign_count'] = len(campaigns) 60 | 61 | recommenders = get_configured_active_recommenders(event) 62 | template_data['active_recommender_count'] = len(recommenders) 63 | 64 | # Group campaigns/recommenders by dataset group so we can create DSG specific widgets in rows 65 | resources_by_dsg_arn = {} 66 | # Holds DSG info so we only have describe once per DSG 67 | dsgs_by_arn = {} 68 | 69 | for campaign in campaigns: 70 | logger.info('Campaign %s will be added to the dashboard', campaign['campaignArn']) 71 | 72 | campaign_region = extract_region(campaign['campaignArn']) 73 | 74 | personalize = get_client('personalize', campaign_region) 75 | 76 | response = personalize.describe_solution_version(solutionVersionArn = campaign['solutionVersionArn']) 77 | 78 | dsg_arn = response['solutionVersion']['datasetGroupArn'] 79 | recipe_arn = response['solutionVersion']['recipeArn'] 80 | 81 | dsg = dsgs_by_arn.get(dsg_arn) 82 | if not dsg: 83 | response = personalize.describe_dataset_group(datasetGroupArn = dsg_arn) 84 | dsg = response['datasetGroup'] 85 | dsgs_by_arn[dsg_arn] = dsg 86 | 87 | inference_resource_datas = resources_by_dsg_arn.get(dsg_arn) 88 | if not inference_resource_datas: 89 | inference_resource_datas = [] 90 | resources_by_dsg_arn[dsg_arn] = inference_resource_datas 91 | 92 | campaign_data = { 93 | 'name': campaign['name'], 94 | 'resource_arn_name': 'CampaignArn', 95 | 'resource_min_tps_name': 'minProvisionedTPS', 96 | 'resource_avg_tps_name': 'averageTPS', 97 | 'resource_utilization_name': 'campaignUtilization', 98 | 'inference_arn': campaign['campaignArn'], 99 | 'region': campaign_region 100 | } 101 | 102 | if recipe_arn == 'arn:aws:personalize:::recipe/aws-personalized-ranking': 103 | campaign_data['latency_metric_name'] = 'GetPersonalizedRankingLatency' 104 | else: 105 | campaign_data['latency_metric_name'] = 'GetRecommendationsLatency' 106 | 107 | inference_resource_datas.append(campaign_data) 108 | 109 | for recommender in recommenders: 110 | logger.info('Recommender %s will be added to the dashboard', recommender['recommenderArn']) 111 | 112 | recommender_region = extract_region(recommender['recommenderArn']) 113 | 114 | dsg_arn = recommender['datasetGroupArn'] 115 | 116 | dsg = dsgs_by_arn.get(dsg_arn) 117 | if not dsg: 118 | response = personalize.describe_dataset_group(datasetGroupArn = dsg_arn) 119 | dsg = response['datasetGroup'] 120 | dsgs_by_arn[dsg_arn] = dsg 121 | 122 | inference_resource_datas = resources_by_dsg_arn.get(dsg_arn) 123 | if not inference_resource_datas: 124 | inference_resource_datas = [] 125 | resources_by_dsg_arn[dsg_arn] = inference_resource_datas 126 | 127 | recommender_data = { 128 | 'name': recommender['name'], 129 | 'resource_arn_name': 'RecommenderArn', 130 | 'resource_min_tps_name': 'minRecommendationRequestsPerSecond', 131 | 'resource_avg_tps_name': 'averageRPS', 132 | 'resource_utilization_name': 'recommenderUtilization', 133 | 'latency_metric_name': 'GetRecommendationsLatency', 134 | 'inference_arn': recommender['recommenderArn'], 135 | 'region': recommender_region 136 | } 137 | 138 | inference_resource_datas.append(recommender_data) 139 | 140 | dsgs_for_template = [] 141 | 142 | for dsg_arn, inference_resource_datas in resources_by_dsg_arn.items(): 143 | dsg = dsgs_by_arn[dsg_arn] 144 | 145 | # Minor hack to know when we're on the last item in list when iterating in template. 146 | inference_resource_datas[len(inference_resource_datas) - 1]['last_resource'] = True 147 | 148 | dsgs_for_template.append({ 149 | 'name': dsg['name'], 150 | 'region': extract_region(dsg_arn), 151 | 'account_id': extract_account_id(dsg_arn), 152 | 'inference_resources': inference_resource_datas 153 | }) 154 | 155 | template_data['dataset_groups'] = sorted(dsgs_for_template, key = lambda dsg: dsg['region'] + dsg['name']) 156 | 157 | # Render template and use as dashboard body. 158 | with open('dashboard-template.mustache', 'r') as f: 159 | dashboard = chevron.render(f, template_data) 160 | 161 | logger.debug(json.dumps(dashboard, indent = 2, default = str)) 162 | 163 | logger.info('Adding/updating dashboard') 164 | 165 | cloudwatch.put_dashboard( 166 | DashboardName = DASHBOARD_NAME, 167 | DashboardBody = dashboard 168 | ) 169 | 170 | def delete_dashboard(): 171 | logger.info('Deleting dashboard') 172 | 173 | cloudwatch.delete_dashboards( 174 | DashboardNames = [ DASHBOARD_NAME ] 175 | ) 176 | 177 | @helper.create 178 | @helper.update 179 | def create_or_update_resource(event, _): 180 | build_dashboard(event) 181 | 182 | @helper.delete 183 | def delete_resource(event, _): 184 | delete_dashboard() 185 | 186 | @logger.inject_lambda_context(log_event=True) 187 | def lambda_handler(event, context): 188 | # If the event has a RequestType, we're being called by CFN as custom resource 189 | if event.get('RequestType'): 190 | logger.info('Called via CloudFormation as a custom resource; letting CfnResource route request') 191 | helper(event, context) 192 | else: 193 | logger.info('Called via Invoke; assuming caller wants to build dashboard') 194 | 195 | if event.get('detail'): 196 | reason = event['detail'].get('Reason') 197 | else: 198 | reason = event.get('Reason') 199 | 200 | if reason: 201 | logger.info('Reason for dashboard build: %s', reason) 202 | 203 | build_dashboard(event) -------------------------------------------------------------------------------- /src/dashboard_mgmt_function/requirements.txt: -------------------------------------------------------------------------------- 1 | # Note: AWS Lambda Power Tools dependency is satisfied by Lambda layer at runtime (part of deployment). 2 | chevron==0.13.1 3 | crhelper==2.0.6 4 | -------------------------------------------------------------------------------- /src/layer/README.md: -------------------------------------------------------------------------------- 1 | # Amazon Personalize Monitor - Common Lambda Layer 2 | 3 | This [Lambda Layer](https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html) includes dependencies shared across all/most functions in this application. In addition, the [common.py](./common.py) file includes utility functions that are also shared across the Lambda functions in this application. -------------------------------------------------------------------------------- /src/layer/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/src/layer/__init__.py -------------------------------------------------------------------------------- /src/layer/common.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | """ 5 | Lambda layer functions shared across Lambda functions in this application 6 | """ 7 | 8 | import boto3 9 | import os 10 | import json 11 | import logging 12 | import random 13 | from typing import Dict, List 14 | 15 | from botocore.exceptions import ClientError 16 | from aws_lambda_powertools import Logger 17 | from expiring_dict import ExpiringDict 18 | 19 | logger = Logger(child=True) 20 | 21 | _clients_by_region = {} 22 | # Since the DescribeCampaign and DescribeRecommender APIs easily throttle, 23 | # use a cache to help smooth out periods where we get throttled. 24 | _resource_cache = ExpiringDict(max_age_seconds = 22 * 60) 25 | 26 | PROJECT_NAME = 'PersonalizeMonitor' 27 | ALARM_NAME_PREFIX = PROJECT_NAME + '-' 28 | SNS_TOPIC_NAME = 'PersonalizeMonitorNotifications' 29 | NOTIFICATIONS_RULE = 'PersonalizeMonitor-NotificationsRule' 30 | NOTIFICATIONS_RULE_TARGET_ID = 'PersonalizeMonitorNotificationsId' 31 | 32 | def put_event(detail_type, detail, resources = []): 33 | event_bridge = get_client('events') 34 | 35 | logger.info({ 36 | 'detail_type': detail_type, 37 | 'detail': detail, 38 | 'resources': resources 39 | }) 40 | 41 | event_bridge.put_events( 42 | Entries=[ 43 | { 44 | 'Source': 'personalize.monitor', 45 | 'Resources': resources, 46 | 'DetailType': detail_type, 47 | 'Detail': detail 48 | } 49 | ] 50 | ) 51 | 52 | def extract_region(arn: str) -> str: 53 | ''' Extracts region from an AWS ARN ''' 54 | region = None 55 | elements = arn.split(':') 56 | if len(elements) > 3: 57 | region = elements[3] 58 | 59 | return region 60 | 61 | def extract_resource_type(arn: str) -> str: 62 | ''' Extracts resource type from an AWS ARN ''' 63 | resource = None 64 | elements = arn.split(':') 65 | if len(elements) > 5: 66 | resource = elements[5].split('/')[0] 67 | 68 | return resource 69 | 70 | def is_campaign(arn: str) -> bool: 71 | return extract_resource_type(arn) == 'campaign' 72 | 73 | def is_recommender(arn: str) -> bool: 74 | return extract_resource_type(arn) == 'recommender' 75 | 76 | def extract_account_id(arn: str) -> str: 77 | ''' Extracts account ID from an AWS ARN ''' 78 | account_id = None 79 | elements = arn.split(':') 80 | if len(elements) > 4: 81 | account_id = elements[4] 82 | 83 | return account_id 84 | 85 | def get_client(service_name: str, region_name: str = None): 86 | if not region_name: 87 | region_name = os.environ['AWS_REGION'] 88 | 89 | ''' Returns boto3 client for a service and region ''' 90 | clients_by_service = _clients_by_region.get(region_name) 91 | 92 | if not clients_by_service: 93 | clients_by_service = {} 94 | _clients_by_region[region_name] = clients_by_service 95 | 96 | client = clients_by_service.get(service_name) 97 | 98 | if not client: 99 | client = boto3.client(service_name = service_name, region_name = region_name) 100 | clients_by_service[service_name] = client 101 | 102 | return client 103 | 104 | def determine_regions(event: Dict) -> List[str]: 105 | ''' Determines regions from function event or environment ''' 106 | # Check event first (list of region names) 107 | regions = None 108 | if event: 109 | regions = event.get('Regions') 110 | 111 | if not regions: 112 | # Check environment variable next for list of region names as CSV 113 | regions = os.environ.get('Regions') 114 | 115 | if not regions: 116 | # Lastly, use current region from environment. 117 | regions = os.environ['AWS_REGION'] 118 | 119 | if regions and isinstance(regions, str): 120 | regions = [exp.strip(' ') for exp in regions.split(',')] 121 | 122 | return regions 123 | 124 | def _determine_arns(event: Dict, arn_param_name: str, arn_list_type: str) -> List[str]: 125 | ''' Determines Personalize campaign ARNs based on function event or environment ''' 126 | 127 | # Check event first (list of ARNs) 128 | arns_spec = None 129 | if event: 130 | arns_spec = event.get(arn_param_name) 131 | 132 | if not arns_spec: 133 | # Check environment variable next for list of ARNs as CSV 134 | arns_spec = os.environ.get(arn_param_name) 135 | 136 | if not arns_spec: 137 | raise Exception(f'"{arn_param_name}" expression required in event or environment') 138 | 139 | if isinstance(arns_spec, str): 140 | arns_spec = [exp.strip(' ') for exp in arns_spec.split(',')] 141 | 142 | logger.debug('%s expression: %s', arn_param_name, arns_spec) 143 | 144 | # Look for magic value of "all" to mean all active campaigns/recommenders in configured region(s) 145 | if len(arns_spec) == 1 and arns_spec[0].lower() == 'all': 146 | logger.debug('Retrieving all active ARNs') 147 | arns = [] 148 | 149 | # Determine regions we need to consider 150 | regions = determine_regions(event) 151 | logger.debug('Regions to scan for active resources: %s', regions) 152 | 153 | for region in regions: 154 | personalize = get_client(service_name = 'personalize', region_name = region) 155 | 156 | arns_for_region = 0 157 | 158 | resources_paginator = personalize.get_paginator(arn_list_type) 159 | for resources_page in resources_paginator.paginate(): 160 | if resources_page.get('campaigns'): 161 | for resource in resources_page['campaigns']: 162 | arns.append(resource['campaignArn']) 163 | arns_for_region += 1 164 | elif resources_page.get('recommenders'): 165 | for resource in resources_page['recommenders']: 166 | arns.append(resource['recommenderArn']) 167 | arns_for_region += 1 168 | 169 | logger.debug('Region %s has %d resources', region, arns_for_region) 170 | else: 171 | arns = arns_spec 172 | 173 | return arns 174 | 175 | def determine_campaign_arns(event: Dict) -> List[str]: 176 | ''' Determines Personalize campaign ARNs based on function event or environment ''' 177 | return _determine_arns(event, 'CampaignARNs', 'list_campaigns') 178 | 179 | def determine_recommender_arns(event: Dict) -> List[str]: 180 | ''' Determines Personalize recommender ARNs based on function event or environment ''' 181 | return _determine_arns(event, 'RecommenderARNs', 'list_recommenders') 182 | 183 | def get_configured_active_campaigns(event: Dict) -> List[Dict]: 184 | ''' Returns list of active campaigns as configured by function event and/or environment ''' 185 | campaign_arns = determine_campaign_arns(event) 186 | 187 | # Shuffle the list of arns so we don't try to describe campaigns in the same order each 188 | # time and potentially use cached campaign details for the same campaigns further down 189 | # the list due to rare but possible API throttling. 190 | random.shuffle(campaign_arns) 191 | 192 | campaigns = [] 193 | 194 | for campaign_arn in campaign_arns: 195 | campaign_region = extract_region(campaign_arn) 196 | personalize = get_client(service_name = 'personalize', region_name = campaign_region) 197 | campaign = None 198 | 199 | try: 200 | # Always try the DescribeCampaign API directly first. 201 | campaign = personalize.describe_campaign(campaignArn = campaign_arn)['campaign'] 202 | if logger.isEnabledFor(logging.DEBUG): 203 | logger.debug('Campaign: %s', json.dumps(campaign, indent = 2, default = str)) 204 | _resource_cache[campaign_arn] = campaign 205 | except ClientError as e: 206 | error_code = e.response['Error']['Code'] 207 | if error_code == 'ThrottlingException': 208 | logger.error('ThrottlingException trapped when calling DescribeCampaign API for %s', campaign_arn) 209 | 210 | # Fallback to see if we have a cached Campaign to use instead. 211 | campaign = _resource_cache.get(campaign_arn) 212 | if campaign: 213 | logger.warn('Using cached campaign object for %s', campaign_arn) 214 | else: 215 | logger.warn('Campaign %s NOT found found in cache; skipping this time', campaign_arn) 216 | elif error_code == 'ResourceNotFoundException': 217 | # Campaign has been deleted; log and skip. 218 | logger.error('Campaign %s no longer exists; skipping', campaign_arn) 219 | else: 220 | raise e 221 | 222 | if campaign: 223 | if campaign['status'] == 'ACTIVE': 224 | latest_status = None 225 | if campaign.get('latestCampaignUpdate'): 226 | latest_status = campaign['latestCampaignUpdate']['status'] 227 | 228 | if not latest_status or (latest_status != 'DELETE PENDING' and latest_status != 'DELETE IN_PROGRESS'): 229 | campaigns.append(campaign) 230 | else: 231 | logger.info('Campaign %s latestCampaignUpdate.status is %s and cannot be monitored in this state; skipping', campaign_arn, latest_status) 232 | else: 233 | logger.info('Campaign %s status is %s and cannot be monitored in this state; skipping', campaign_arn, campaign['status']) 234 | 235 | return campaigns 236 | 237 | def get_configured_active_recommenders(event: Dict) -> List[Dict]: 238 | ''' Returns list of active recommenders as configured by function event and/or environment ''' 239 | recommender_arns = determine_recommender_arns(event) 240 | 241 | # Shuffle the list of arns so we don't try to describe recommenders in the same order each 242 | # time and potentially use cached recommender details for the same recommenders further down 243 | # the list due to rare but possible API throttling. 244 | random.shuffle(recommender_arns) 245 | 246 | recommenders = [] 247 | 248 | for recommender_arn in recommender_arns: 249 | region = extract_region(recommender_arn) 250 | personalize = get_client(service_name = 'personalize', region_name = region) 251 | recommender = None 252 | 253 | try: 254 | # Always try the DescribeRecommender API directly first. 255 | recommender = personalize.describe_recommender(recommenderArn = recommender_arn)['recommender'] 256 | if logger.isEnabledFor(logging.DEBUG): 257 | logger.debug('Recommender: %s', json.dumps(recommender, indent = 2, default = str)) 258 | _resource_cache[recommender_arn] = recommender 259 | except ClientError as e: 260 | error_code = e.response['Error']['Code'] 261 | if error_code == 'ThrottlingException': 262 | logger.error('ThrottlingException trapped when calling DescribeRecommender API for %s', recommender_arn) 263 | 264 | # Fallback to see if we have a cached Recommender to use instead. 265 | recommender = _resource_cache.get(recommender_arn) 266 | if recommender: 267 | logger.warn('Using cached recommender object for %s', recommender_arn) 268 | else: 269 | logger.warn('Recommender %s NOT found found in cache; skipping this time', recommender_arn) 270 | elif error_code == 'ResourceNotFoundException': 271 | # Recommender has been deleted; log and skip. 272 | logger.error('Recommender %s no longer exists; skipping', recommender_arn) 273 | else: 274 | raise e 275 | 276 | if recommender: 277 | if recommender['status'] == 'ACTIVE': 278 | latest_status = None 279 | if recommender.get('latestRecommenderUpdate'): 280 | latest_status = recommender['latestRecommenderUpdate']['status'] 281 | 282 | if not latest_status or (latest_status != 'DELETE PENDING' and latest_status != 'DELETE IN_PROGRESS'): 283 | recommenders.append(recommender) 284 | else: 285 | logger.info('Recommender %s latestRecommenderUpdate.status is %s and cannot be monitored in this state; skipping', recommender_arn, latest_status) 286 | else: 287 | logger.info('Recommender %s status is %s and cannot be monitored in this state; skipping', recommender_arn, recommender['status']) 288 | 289 | return recommenders -------------------------------------------------------------------------------- /src/layer/requirements.txt: -------------------------------------------------------------------------------- 1 | # Runtime requirements: 2 | # Note: the following dependency must be provided at runtime as Lambda layer: 3 | # - AWS Lambda Power Tools as a Lambda layer. 4 | # Explicitly bring in a more recent boto3 to get latest API defs for Personalize that include recommender support. 5 | boto3==1.26.104 6 | expiring-dict==1.1.0 -------------------------------------------------------------------------------- /src/personalize_delete_campaign_function/README.md: -------------------------------------------------------------------------------- 1 | # Amazon Personalize Monitor - Delete Campaign Function 2 | 3 | This Lambda function deletes a Personalize campaign. It is called as the target of an EventBridge rule that matches events with the `DeletePersonalizeCampaign` detail-type. The [personalize-monitor](../personalize_monitor_function/) function publishes this event when the `AutoDeleteOrStopIdleResources` deployment parameter is `Yes` AND a monitored campaign has been idle more than `IdleThresholdHours` hours. Therefore, an idle campaign is one that has not had any `GetRecommendations` or `GetPersonalizedRanking` calls in the last `IdleThresholdHours` hours. 4 | 5 | This function will also delete any CloudWatch alarms that were dynamically created by this application for the deleted campaign. Alarms can be created for idle campaigns and low utilization campaigns via the `AutoCreateIdleCampaignAlarms` and `AutoCreateCampaignUtilizationAlarms` deployment parameters. 6 | 7 | > Note that Personalize recommenders are stopped and not deleted by this application so that the underlying model artifacts are retained. See the [personalize_stop_recommender](../personalize_stop_recommender_function/) function for details. 8 | 9 | ## How it works 10 | 11 | The EventBridge event structure that triggers this function looks something like this: 12 | 13 | ```javascript 14 | { 15 | "source": "personalize.monitor", 16 | "detail-type": "DeletePersonalizeCampaign", 17 | "resources": [ CAMPAIGN_ARN_TO_DELETE ], 18 | "detail": { 19 | 'ARN': CAMPAIGN_ARN_TO_DELETE, 20 | 'Utilization': CURRENT_UTILIZATION, 21 | 'AgeHours': CAMPAIGN_AGE_IN_HOURS, 22 | 'IdleThresholdHours': CAMPAIGN_IDLE_HOURS, 23 | 'TotalRequestsDuringIdleThresholdHours': 0, 24 | 'Reason': DESCRIPTIVE_REASON_FOR_DELETE 25 | } 26 | } 27 | ``` 28 | 29 | This function can also be invoked directly as part of your own operational process. The event you pass to the function only requires the campaign ARN as follows. 30 | 31 | ```javascript 32 | { 33 | "ARN": CAMPAIGN_ARN_TO_DELETE, 34 | "Reason": OPTIONAL_DESCRIPTIVE_REASON_FOR_DELETE 35 | } 36 | ``` 37 | 38 | The Personalize [DeleteCampaign](https://docs.aws.amazon.com/personalize/latest/dg/API_DeleteCampaign.html) API is used to delete the campaign. 39 | 40 | ## Published events 41 | 42 | When the deletion of a campaign and any dynamically created CloudWatch alarms for the campaign have been successfully initiated by this function, two events are published to EventBridge. One event will trigger a notification to the SNS topic for this application and the other trigger the CloudWatch dashboard to be rebuilt. 43 | 44 | ### Delete notification 45 | 46 | The following event is published to EventBridge to signal that a campaign has been deleted. 47 | 48 | ```javascript 49 | { 50 | "source": "personalize.monitor", 51 | "detail_type": "PersonalizeCampaignDeleted", 52 | "resources": [ CAMPAIGN_ARN_DELETED ], 53 | "detail": { 54 | "ARN": CAMPAIGN_ARN_DELETED, 55 | "Reason": DESCRIPTIVE_REASON_FOR_DELETE 56 | } 57 | } 58 | ``` 59 | 60 | An EventBridge rule is setup that will target an SNS topic with `NotificationEndpoint` as the subscriber. This is the email address you provided at deployment time. If you'd like, you can customize how these notification events are handled in the EventBridge and SNS consoles. 61 | 62 | ### Rebuild CloudWatch dashboard 63 | 64 | Since a monitored campaign has been deleted, the CloudWatch dashboard needs to be rebuilt so that the campaign is removed from the widgets. This is accomplished by publishing a `BuildPersonalizeMonitorDashboard` event that is processed by the [dashboard_mgmt](../dashboard_mgmt_function/) function. 65 | 66 | ```javascript 67 | { 68 | "source": "personalize.monitor", 69 | "detail_type": "BuildPersonalizeMonitorDashboard", 70 | "resources": [ CAMPAIGN_ARN_DELETED ], 71 | "detail": { 72 | "ARN": CAMPAIGN_ARN_DELETED, 73 | "Reason": DESCRIPTIVE_REASON_FOR_REBUILD 74 | } 75 | } 76 | ``` 77 | -------------------------------------------------------------------------------- /src/personalize_delete_campaign_function/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/src/personalize_delete_campaign_function/__init__.py -------------------------------------------------------------------------------- /src/personalize_delete_campaign_function/personalize_delete_campaign.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | """ 5 | Lambda function that is used to delete a Personalize campaign based on prolonged idle time 6 | and according to configuration to automatically delete campaigns under these conditions. 7 | """ 8 | 9 | import json 10 | import logging 11 | 12 | from aws_lambda_powertools import Logger 13 | 14 | from common import ( 15 | PROJECT_NAME, 16 | ALARM_NAME_PREFIX, 17 | extract_region, 18 | get_client, 19 | put_event 20 | ) 21 | 22 | logger = Logger() 23 | 24 | def delete_alarms_for_campaign(campaign_arn): 25 | cw = get_client(service_name = 'cloudwatch', region_name = extract_region(campaign_arn)) 26 | 27 | alarm_names_to_delete = set() 28 | 29 | alarms_paginator = cw.get_paginator('describe_alarms') 30 | for alarms_page in alarms_paginator.paginate(AlarmNamePrefix = ALARM_NAME_PREFIX, AlarmTypes=['MetricAlarm']): 31 | for alarm in alarms_page['MetricAlarms']: 32 | for dim in alarm['Dimensions']: 33 | if dim['Name'] == 'CampaignArn' and dim['Value'] == campaign_arn: 34 | tags_response = cw.list_tags_for_resource(ResourceARN = alarm['AlarmArn']) 35 | 36 | for tag in tags_response['Tags']: 37 | if tag['Key'] == 'CreatedBy' and tag['Value'] == PROJECT_NAME: 38 | alarm_names_to_delete.add(alarm['AlarmName']) 39 | break 40 | 41 | if alarm_names_to_delete: 42 | # FUTURE: max check of 100 43 | logger.info('Deleting CloudWatch alarms for campaign %s: %s', campaign_arn, alarm_names_to_delete) 44 | cw.delete_alarms(AlarmNames=list(alarm_names_to_delete)) 45 | alarms_deleted += len(alarm_names_to_delete) 46 | else: 47 | logger.info('No CloudWatch alarms to delete for campaign %s', campaign_arn) 48 | 49 | @logger.inject_lambda_context(log_event=True) 50 | def lambda_handler(event, _): 51 | ''' Initiates the delete of a Personalize campaign ''' 52 | if event.get('detail'): 53 | campaign_arn = event['detail']['ARN'] 54 | reason = event['detail'].get('Reason') 55 | else: 56 | campaign_arn = event['ARN'] 57 | reason = event.get('Reason') 58 | 59 | region = extract_region(campaign_arn) 60 | if not region: 61 | raise Exception('Region could not be extracted from campaign_arn') 62 | 63 | personalize = get_client(service_name = 'personalize', region_name = region) 64 | 65 | response = personalize.delete_campaign(campaignArn = campaign_arn) 66 | 67 | if logger.isEnabledFor(logging.DEBUG): 68 | logger.debug(json.dumps(response, indent = 2, default = str)) 69 | 70 | if not reason: 71 | reason = f'Amazon Personalize campaign {campaign_arn} deletion initiated (reason unspecified)' 72 | 73 | put_event( 74 | detail_type = 'PersonalizeCampaignDeleted', 75 | detail = json.dumps({ 76 | 'ARN': campaign_arn, 77 | 'Reason': reason 78 | }), 79 | resources = [ campaign_arn ] 80 | ) 81 | 82 | put_event( 83 | detail_type = 'BuildPersonalizeMonitorDashboard', 84 | detail = json.dumps({ 85 | 'ARN': campaign_arn, 86 | 'Reason': reason 87 | }), 88 | resources = [ campaign_arn ] 89 | ) 90 | 91 | logger.info({ 92 | 'campaignArn': campaign_arn 93 | }) 94 | 95 | delete_alarms_for_campaign(campaign_arn) 96 | 97 | return f'Successfully initiated delete of campaign {campaign_arn}' -------------------------------------------------------------------------------- /src/personalize_delete_campaign_function/requirements.txt: -------------------------------------------------------------------------------- 1 | # Note: AWS Lambda Power Tools dependency is satisfied by Lambda layer at runtime (part of deployment). 2 | -------------------------------------------------------------------------------- /src/personalize_monitor_function/README.md: -------------------------------------------------------------------------------- 1 | # Amazon Personalize Monitor - Core Monitor Function 2 | 3 | The [personalize_monitor.py](./personalize_monitor.py) Lambda is called every 5 minutes by a CloudWatch scheduled event rule to generate the CloudWatch metrics needed to populate the Personalize Monitor dashboard line graph widgets and to trigger the CloudWatch alarms for low recommender/campaign utilization and idle recommender/campaign detection (if configured). Also, if the `AutoDeleteOrStopIdleResources` deployment parameter is `Yes` AND a monitored campaign has been idle more than `IdleThresholdHours` hours, this function will publish a `DeletePersonalizeCampaign` event to EventBridge that is handled by the [personalize_delete_campaign](../personalize_delete_campaign_function/) function. An idle campaign is one that has not had any `GetRecommendations` or `GetPersonalizedRanking` calls in the last `IdleThresholdHours` hours. Finally, this function will adjust a campaign's `minProvisionedTPS` (down only) if the `AutoAdjustMinTPS` deployment parameter is `Yes`. 4 | 5 | ## How it works 6 | 7 | The function first determines what Personalize campaigns should be monitored based on the CloudFormation template parameters you specify when you [install](../README.md#installing-the-application) the application. 8 | 9 | ## CloudWatch Metrics 10 | 11 | The following custom CloudWatch metrics are generated by this function on 5 minute intervals. You can find these metrics in the AWS console under CloudWatch and then Metrics or you can query them using the CloudWatch API. 12 | 13 | | Namespace | MetricName | Dimensions | Unit | Description | 14 | | --- | --- | --- | --- | --- | 15 | | PersonalizeMonitor | monitoredResourceCount | | Count | Number of recommenders and campaigns currently being monitored at interval | 16 | | PersonalizeMonitor | minRecommendationRequestsPerSecond | RecommenderArn | Count/Second | `minRecommendationRequestsPerSecond` value for the recommender at interval | 17 | | PersonalizeMonitor | averageRPS | RecommenderArn | Count/Second | Average RPS for the recommender at interval | 18 | | PersonalizeMonitor | recommenderUtilization | RecommenderArn | Percent | Utilization percentage of `averageRPS` vs `minRecommendationRequestsPerSecond` at interval | 19 | | PersonalizeMonitor | minProvisionedTPS | CampaignArn | Count/Second | `minProvisionedTPS` value for the campaign at interval | 20 | | PersonalizeMonitor | averageTPS | CampaignArn | Count/Second | Average TPS for the campaign at interval | 21 | | PersonalizeMonitor | campaignUtilization | CampaignArn | Percent | Utilization percentage of `averageTPS` vs `minProvisionedTPS` at interval | 22 | 23 | ### How is averageRPS/averageTPS calculated? 24 | 25 | The `averageRPS` and `averageTPS` metric value for each monitored recommender and campaign is calculated by first determining the number of requests made to the recommender or campaign during the 5 minute interval and dividing by 300 (the number of seconds in 5 minutes). The number of requests is pulled from the `GetRecommendations` or `GetPersonalizedRanking` metric (depending on the underlying recipe) for the recommender/campaign from the `AWS/Personalize` namespace. The request count metric is automatically updated by Personalize itself. 26 | 27 | ## CloudWatch Alarms (optional) 28 | 29 | You can optionally have CloudWatch alarms dynamically created for monitored recommenders/campaigns for low utilization and idle recommenders/campaigns. 30 | 31 | ### Low Recommender/Campaign Utilization Alarm 32 | 33 | If you set the `AutoCreateUtilizationAlarms` CloudFormation template parameter to `Yes` when you installed this application, this function will automatically create a CloudWatch alarm for every recommender and campaign that it monitors. The alarm will trigger when the `recommenderUtilization` or `campaignUtilization` custom metric described above drops below the `UtilizationThresholdAlarmLowerBound` installation parameter for 9 out of 12 evaluation periods. Since the intervals are 5 minutes, that means that 9 of the 12 five minute evaluations over a 60 minute span must be below the threshold to enter an alarm status. The same rule applies to transition from alarm to OK status. The alarm will be created in the region where the recommender/campaign was created. An [SNS](https://aws.amazon.com/sns/) topic created by this application will be used as the alarm and ok actions and the `NotificationEndpoint` (email address) deployment parameter will be setup as a subscriber to the topic. **Be sure to confirm the subscription sent when this application creates SNS topics and subscribes the email address you provided. You will receive a confirmation email for a topic created in each region where resources are monitored.** 34 | 35 | The alarm will have its actions disabled when the `minRecommendationRequestsPerSecond` or `minProvisionedTPS` is 1 and enabled with `minRecommendationRequestsPerSecond` or `minProvisionedTPS` is > 1 so that notifications are only sent when utilization can be impacted by adjusting `minRecommendationRequestsPerSecond`/`minProvisionedTPS`. 36 | 37 | ### Idle Recommender/Campaign Alarm 38 | 39 | If you set the `AutoCreateIdleAlarms` CloudFormation template parameter to `Yes` when you installed this application, this function will automatically create a CloudWatch alarm for every monitored recommender/campaign that is idle for at least `IdleThresholdHours` hours. The actions for the alarm will be enabled only after the recommender/campaign has existed for `IdleThresholdHours` as well. The `GetRecommendations` or `GetPersonalizedRanking` (depending on the resource's recipe) will be used to assess the resource's idle state. The alarm will be created in the region where the recommender/campaign was created. An [SNS](https://aws.amazon.com/sns/) topic created by this application will be used as the alarm and ok actions and the `NotificationEndpoint` (email address) deployment parameter will be setup as a subscriber to the topic. **Be sure to confirm the subscription sent when this application creates SNS topics and subscribes the email address you provided. You will receive a confirmation email for a topic created in each region where resources are monitored.** 40 | 41 | ## Automatically adjusting minRecommendationRequestsPerSecond (recommenders) and minProvisionedTPS (campaigns) (optional) 42 | 43 | If the `AutoAdjustMinTPS` deployment parameter is `Yes`, this function will check the actual hourly RPS/TPS over the last 14 days against the currently configured `minRecommendationRequestsPerSecond`/`minProvisionedTPS` and look for opportunities to reduce the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` to optimize utilization and reduce costs. It does this by checking the recommender's or campaign's request volume for the previous 14 days on hourly intervals and finding the hour with the lowest average RPS/TPS (low watermark). If the low watermark average is less than `minRecommendationRequestsPerSecond`/`minProvisionedTPS` AND the recommender/campaign is more than 1 day old, it will drop the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` by 25%. This process will be repeated each hour until either the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` meets the low watermark RPS/TPS or the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` reaches 1 (the lowest allowed value). **This function will NOT increase the `minRecommendationRequestsPerSecond`/`minProvisionedTPS`.** Instead it will rely on Personalize to auto-scale recommenders/campaigns up and back down to `minRecommendationRequestsPerSecond`/`minProvisionedTPS` to meet demand. 44 | 45 | > Since it can take several minutes for a recommender/campaign to redeploy after updating its `minRecommendationRequestsPerSecond`/`minProvisionedTPS`, you will receive the notification when the redeploy starts. The recommender/campaign will continue to respond to `GetRecommendations`/`GetPersonalizedRanking` API requests while it is redeploying. There will be no interruption of service. 46 | 47 | See the [personalize_update_tps](../personalize_update_tps_function/) function for details on the update function. 48 | 49 | ## Automatically stopping recommenders and deleting idle campaigns (optional) 50 | 51 | If the `AutoDeleteOrStopIdleResources` deployment parameter is `Yes`, this function will perform additional checks once per hour for each monitored recommender/campaign to see if it has been idle for more than `IdleThresholdHours` hours. The purpose of this feature is to prevent abandoned recommenders/campaigns from continuing to incur inference costs when they are no longer being used. Recommender/campaign checks are distributed across each hour in 10 minute blocks in an attempt to spread out the API calls needed to check and update recommenders/campaigns. 52 | 53 | To avoid too aggressively stopping recommenders or deleting campaigns, new recommenders/campaigns that are not more than `IdleThresholdHours` hours old are exempt from being stopped/deleted. Similarly, if a recommender/campaign has been updated within `IdleThresholdHours`, it will also be exempt from being automatically stopped/deleted. The idea is that new or actively updated recommenders/campaigns are likely not safe to delete. 54 | -------------------------------------------------------------------------------- /src/personalize_monitor_function/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/src/personalize_monitor_function/__init__.py -------------------------------------------------------------------------------- /src/personalize_monitor_function/personalize_monitor.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | """Lambda function that records Personalize resource metrics 5 | 6 | Lambda function designed to be called every five minutes to record campaign TPS 7 | utilization metrics and recommender RRPS in CloudWatch. The metrics are used for 8 | alarms and on the CloudWatch dashboard created by this application. 9 | """ 10 | 11 | import json 12 | import os 13 | import datetime 14 | import sys 15 | import math 16 | import logging 17 | 18 | from typing import Dict 19 | from aws_lambda_powertools import Logger 20 | 21 | from common import ( 22 | PROJECT_NAME, 23 | ALARM_NAME_PREFIX, 24 | SNS_TOPIC_NAME, 25 | NOTIFICATIONS_RULE, 26 | NOTIFICATIONS_RULE_TARGET_ID, 27 | extract_region, 28 | get_client, 29 | get_configured_active_campaigns, 30 | get_configured_active_recommenders, 31 | put_event 32 | ) 33 | 34 | logger = Logger() 35 | 36 | MAX_METRICS_PER_CALL = 20 37 | MIN_IDLE_THRESHOLD_HOURS = 1 38 | 39 | ALARM_PERIOD_SECONDS = 300 40 | ALARM_NAME_PREFIX_LOW_CAMPAIGN_UTILIZATION = ALARM_NAME_PREFIX + 'LowCampaignUtilization-' 41 | ALARM_NAME_PREFIX_LOW_RECOMMENDER_UTILIZATION = ALARM_NAME_PREFIX + 'LowRecommenderUtilization-' 42 | ALARM_NAME_PREFIX_IDLE_CAMPAIGN = ALARM_NAME_PREFIX + 'IdleCampaign-' 43 | ALARM_NAME_PREFIX_IDLE_RECOMMENDER = ALARM_NAME_PREFIX + 'IdleRecommender-' 44 | 45 | _topic_arn_by_region = {} 46 | 47 | def get_recipe_arn(resource: Dict): 48 | recipe_arn = resource.get('recipeArn') 49 | if not recipe_arn and 'campaignArn' in resource: 50 | campaign_region = extract_region(resource['campaignArn']) 51 | personalize = get_client('personalize', campaign_region) 52 | 53 | response = personalize.describe_solution_version(solutionVersionArn = resource['solutionVersionArn']) 54 | 55 | recipe_arn = response['solutionVersion']['recipeArn'] 56 | resource['recipeArn'] = recipe_arn 57 | 58 | return recipe_arn 59 | 60 | def get_inference_metric_name(resource): 61 | metric_name = 'GetRecommendations' 62 | if 'campaignArn' in resource and get_recipe_arn(resource) == 'arn:aws:personalize:::recipe/aws-personalized-ranking': 63 | metric_name = 'GetPersonalizedRanking' 64 | 65 | return metric_name 66 | 67 | def get_sum_requests_datapoints(resource, start_time, end_time, period): 68 | if 'campaignArn' in resource: 69 | arn_key = 'campaignArn' 70 | dim_name = 'CampaignArn' 71 | else: 72 | arn_key = 'recommenderArn' 73 | dim_name = 'RecommenderArn' 74 | 75 | resource_region = extract_region(resource[arn_key]) 76 | cw = get_client(service_name = 'cloudwatch', region_name = resource_region) 77 | 78 | metric_name = get_inference_metric_name(resource) 79 | 80 | response = cw.get_metric_data( 81 | MetricDataQueries = [ 82 | { 83 | 'Id': 'm1', 84 | 'MetricStat': { 85 | 'Metric': { 86 | 'Namespace': 'AWS/Personalize', 87 | 'MetricName': metric_name, 88 | 'Dimensions': [ 89 | { 90 | 'Name': dim_name, 91 | 'Value': resource[arn_key] 92 | } 93 | ] 94 | }, 95 | 'Period': period, 96 | 'Stat': 'Sum' 97 | }, 98 | 'ReturnData': True 99 | } 100 | ], 101 | StartTime = start_time, 102 | EndTime = end_time, 103 | ScanBy = 'TimestampDescending' 104 | ) 105 | 106 | datapoints = [] 107 | 108 | if response.get('MetricDataResults') and len(response['MetricDataResults']) > 0: 109 | results = response['MetricDataResults'][0] 110 | 111 | for idx, ts in enumerate(results['Timestamps']): 112 | datapoints.append({ 113 | 'Timestamp': ts, 114 | 'Value': results['Values'][idx] 115 | }) 116 | 117 | return datapoints 118 | 119 | def get_sum_requests_by_hour(resource, start_time, end_time): 120 | datapoints = get_sum_requests_datapoints(resource, start_time, end_time, 3600) 121 | return datapoints 122 | 123 | def get_total_requests(resource, start_time, end_time, period): 124 | datapoints = get_sum_requests_datapoints(resource, start_time, end_time, period) 125 | 126 | sum_requests = 0 127 | if datapoints: 128 | for datapoint in datapoints: 129 | sum_requests += datapoint['Value'] 130 | 131 | return sum_requests 132 | 133 | def get_average_tps(resource, start_time, end_time, period = ALARM_PERIOD_SECONDS): 134 | sum_requests = get_total_requests(resource, start_time, end_time, period) 135 | return sum_requests / period 136 | 137 | def get_age_hours(resource): 138 | diff = datetime.datetime.now(datetime.timezone.utc) - resource['creationDateTime'] 139 | days, seconds = diff.days, diff.seconds 140 | 141 | hours_age = days * 24 + seconds // 3600 142 | return hours_age 143 | 144 | def get_last_update_age_hours(resource): 145 | hours_age = None 146 | if resource.get('lastUpdatedDateTime'): 147 | diff = datetime.datetime.now(datetime.timezone.utc) - resource['lastUpdatedDateTime'] 148 | days, seconds = diff.days, diff.seconds 149 | 150 | hours_age = days * 24 + seconds // 3600 151 | return hours_age 152 | 153 | def is_resource_updatable(resource): 154 | status = resource['status'] 155 | updatable = status == 'ACTIVE' or status == 'CREATE FAILED' 156 | 157 | if updatable: 158 | if resource.get('latestCampaignUpdate'): 159 | status = resource['latestCampaignUpdate']['status'] 160 | updatable = status == 'ACTIVE' or status == 'CREATE FAILED' 161 | elif resource.get('latestRecommenderUpdate'): 162 | status = resource['latestRecommenderUpdate']['status'] 163 | updatable = status == 'ACTIVE' or status == 'CREATE FAILED' 164 | 165 | return updatable 166 | 167 | def put_metrics(client, metric_datas): 168 | metric = { 169 | 'Namespace': PROJECT_NAME, 170 | 'MetricData': metric_datas 171 | } 172 | 173 | client.put_metric_data(**metric) 174 | logger.debug('Put data for %d metrics', len(metric_datas)) 175 | 176 | def append_metric(metric_datas_by_region, region, metric): 177 | metric_datas = metric_datas_by_region.get(region) 178 | 179 | if not metric_datas: 180 | metric_datas = [] 181 | metric_datas_by_region[region] = metric_datas 182 | 183 | metric_datas.append(metric) 184 | 185 | def notifications_rule_exists(events_client) -> bool: 186 | try: 187 | events_client.describe_rule(Name = NOTIFICATIONS_RULE) 188 | return True 189 | except events_client.exceptions.ResourceNotFoundException: 190 | return False 191 | 192 | def get_notification_subscription(sns_client, topic_arn, endpoint: str) -> Dict: 193 | subs_paginator = sns_client.get_paginator('list_subscriptions_by_topic') 194 | for subs_page in subs_paginator.paginate(TopicArn = topic_arn): 195 | if subs_page.get('Subscriptions'): 196 | for sub in subs_page['Subscriptions']: 197 | if endpoint == sub.get('Endpoint'): 198 | return sns_client.get_subscription_attributes(SubscriptionArn=sub['SubscriptionArn'])['Attributes'] 199 | return None 200 | 201 | def get_topic_arn(resource_region: str) -> str: 202 | # If the ARN has already been created/fetched, return it from cache. 203 | if resource_region in _topic_arn_by_region: 204 | logger.debug('Returning cached SNS topic ARN for region %s', resource_region) 205 | return _topic_arn_by_region[resource_region] 206 | 207 | sns = get_client(service_name = 'sns', region_name = resource_region) 208 | 209 | logger.info('Creating/fetching SNS topic ARN for topic %s in region %s', SNS_TOPIC_NAME, resource_region) 210 | response = sns.create_topic(Name = SNS_TOPIC_NAME) 211 | topic_arn = response['TopicArn'] 212 | 213 | logger.info('Setting topic policy for SNS topic %s', topic_arn) 214 | sns.set_topic_attributes( 215 | TopicArn = topic_arn, 216 | AttributeName = 'Policy', 217 | AttributeValue = '''{ 218 | "Version": "2008-10-17", 219 | "Id": "PublishPolicy", 220 | "Statement": [{ 221 | "Effect": "Allow", 222 | "Principal": { 223 | "Service": [ 224 | "cloudwatch.amazonaws.com", 225 | "events.amazonaws.com" 226 | ] 227 | }, 228 | "Action": [ "sns:Publish" ], 229 | "Resource": "%s" 230 | }] 231 | }''' % (topic_arn) 232 | ) 233 | 234 | # Cache it so we avoid repeat calls while function is resident. 235 | _topic_arn_by_region[resource_region] = topic_arn 236 | 237 | events = get_client(service_name = 'events', region_name = resource_region) 238 | 239 | if not notifications_rule_exists(events): 240 | logger.info('EventBridge notifications rule %s does not exist; creating', NOTIFICATIONS_RULE) 241 | 242 | response = events.put_rule( 243 | Name = NOTIFICATIONS_RULE, 244 | EventPattern = '''{ 245 | "detail-type": ["PersonalizeCampaignMinProvisionedTPSUpdated", "PersonalizeCampaignDeleted", "PersonalizeRecommenderMinRecommendationRPSUpdated", "PersonalizeRecommenderStopped"], 246 | "source": ["personalize.monitor"] 247 | }''', 248 | State = 'ENABLED', 249 | Description = 'Routes Personalize Monitor notifications to notification SNS topic' 250 | ) 251 | 252 | logger.info('Setting target on notification rule') 253 | events.put_targets( 254 | Rule = NOTIFICATIONS_RULE, 255 | Targets = [{ 256 | 'Id': NOTIFICATIONS_RULE_TARGET_ID, 257 | 'Arn': topic_arn 258 | }] 259 | ) 260 | else: 261 | logger.info('EventBridge notification rule %s already exists', NOTIFICATIONS_RULE) 262 | 263 | notification_endpoint = os.environ.get('NotificationEndpoint') 264 | 265 | if notification_endpoint: 266 | logger.info('Verifying SNS topic subscription for %s', notification_endpoint) 267 | subscription = get_notification_subscription(sns, topic_arn, notification_endpoint) 268 | if subscription == None: 269 | logger.info('Subscribing endpoint %s to SNS topic %s', notification_endpoint, topic_arn) 270 | sns.subscribe( 271 | TopicArn = topic_arn, 272 | Protocol = 'email', 273 | Endpoint = notification_endpoint 274 | ) 275 | elif subscription['PendingConfirmation'] == 'true': 276 | logger.warn('SNS topic subscription is still pending confirmation') 277 | else: 278 | logger.info('Endpoint is subscribed and confirmed for SNS topic') 279 | else: 280 | logger.warn('No notification endpoint specified at deployment so not adding subscriber') 281 | 282 | return topic_arn 283 | 284 | def create_utilization_alarm(resource_region, resource, utilization_threshold_lower_bound): 285 | cw = get_client(service_name = 'cloudwatch', region_name = resource_region) 286 | 287 | if 'campaignArn' in resource: 288 | metric_name = 'campaignUtilization' 289 | arn_key = 'campaignArn' 290 | dim_name = 'CampaignArn' 291 | alarm_prefix = ALARM_NAME_PREFIX_LOW_CAMPAIGN_UTILIZATION 292 | # Only enable alarm actions when minTPS > 1 since we can't really do 293 | # anything to impact utilization by dropping minTPS. Let the idle 294 | # alarm handle abandoned campaigns/recommenders. 295 | enable_actions = resource['minProvisionedTPS'] > 1 296 | else: 297 | metric_name = 'recommenderUtilization' 298 | arn_key = 'recommenderArn' 299 | dim_name = 'RecommenderArn' 300 | alarm_prefix = ALARM_NAME_PREFIX_LOW_RECOMMENDER_UTILIZATION 301 | # Only enable alarm actions when minRPS > 1 since we can't really do 302 | # anything to impact utilization by dropping minTPS. Let the idle 303 | # alarm handle abandoned campaigns/recommenders. 304 | enable_actions = resource['recommenderConfig']['minRecommendationRequestsPerSecond'] > 1 305 | 306 | response = cw.describe_alarms_for_metric( 307 | MetricName = metric_name, 308 | Namespace = PROJECT_NAME, 309 | Dimensions=[ 310 | { 311 | 'Name': dim_name, 312 | 'Value': resource[arn_key] 313 | }, 314 | ] 315 | ) 316 | 317 | alarm_name = alarm_prefix + resource['name'] 318 | 319 | low_utilization_alarm_exists = False 320 | actions_currently_enabled = False 321 | 322 | for alarm in response['MetricAlarms']: 323 | if (alarm['AlarmName'].startswith(alarm_prefix) and 324 | alarm['ComparisonOperator'] in [ 'LessThanThreshold', 'LessThanOrEqualToThreshold' ]): 325 | alarm_name = alarm['AlarmName'] 326 | low_utilization_alarm_exists = True 327 | actions_currently_enabled = alarm['ActionsEnabled'] 328 | break 329 | 330 | alarm_created = False 331 | 332 | if not low_utilization_alarm_exists: 333 | logger.info('Creating lower bound utilization alarm for %s', resource[arn_key]) 334 | 335 | topic_arn = get_topic_arn(resource_region) 336 | 337 | cw.put_metric_alarm( 338 | AlarmName = alarm_name, 339 | AlarmDescription = 'Alarms when utilization falls below threshold indicating possible over provisioning condition', 340 | ActionsEnabled = enable_actions, 341 | OKActions = [ topic_arn ], 342 | AlarmActions = [ topic_arn ], 343 | MetricName = metric_name, 344 | Namespace = PROJECT_NAME, 345 | Statistic = 'Average', 346 | Dimensions = [ 347 | { 348 | 'Name': dim_name, 349 | 'Value': resource[arn_key] 350 | } 351 | ], 352 | Period = ALARM_PERIOD_SECONDS, 353 | EvaluationPeriods = 12, # last 60 minutes 354 | DatapointsToAlarm = 9, # alarm state for 45 of last 60 minutes 355 | Threshold = utilization_threshold_lower_bound, 356 | ComparisonOperator = 'LessThanThreshold', 357 | TreatMissingData = 'missing', 358 | Tags=[ 359 | { 360 | 'Key': 'CreatedBy', 361 | 'Value': PROJECT_NAME 362 | } 363 | ] 364 | ) 365 | 366 | alarm_created = True 367 | elif enable_actions != actions_currently_enabled: 368 | # Toggle enable/disable actions for existing alarm. 369 | if enable_actions: 370 | cw.enable_alarm_actions(AlarmNames = [ alarm_name ]) 371 | else: 372 | cw.disable_alarm_actions(AlarmNames = [ alarm_name ]) 373 | 374 | return alarm_created 375 | 376 | def create_idle_resource_alarm(resource_region, resource, idle_threshold_hours): 377 | cw = get_client(service_name = 'cloudwatch', region_name = resource_region) 378 | topic_arn = get_topic_arn(resource_region) 379 | 380 | metric_name = get_inference_metric_name(resource) 381 | 382 | if 'campaignArn' in resource: 383 | arn_key = 'campaignArn' 384 | dim_name = 'CampaignArn' 385 | alarm_prefix = ALARM_NAME_PREFIX_IDLE_CAMPAIGN 386 | else: 387 | arn_key = 'recommenderArn' 388 | dim_name = 'RecommenderArn' 389 | alarm_prefix = ALARM_NAME_PREFIX_IDLE_RECOMMENDER 390 | 391 | response = cw.describe_alarms_for_metric( 392 | MetricName = metric_name, 393 | Namespace = 'AWS/Personalize', 394 | Dimensions=[ 395 | { 396 | 'Name': dim_name, 397 | 'Value': resource[arn_key] 398 | }, 399 | ] 400 | ) 401 | 402 | alarm_name = alarm_prefix + resource['name'] 403 | 404 | idle_alarm_exists = False 405 | # Only enable actions when the campaign/recommender has existed at least as 406 | # long as the idle threshold. This is necessary since the alarm treats missing 407 | # data as breaching. 408 | enable_actions = get_age_hours(resource) >= idle_threshold_hours 409 | actions_currently_enabled = False 410 | 411 | for alarm in response['MetricAlarms']: 412 | if (alarm['AlarmName'].startswith(alarm_prefix) and 413 | alarm['ComparisonOperator'] == 'LessThanOrEqualToThreshold' and 414 | int(alarm['Threshold']) == 0): 415 | alarm_name = alarm['AlarmName'] 416 | idle_alarm_exists = True 417 | actions_currently_enabled = alarm['ActionsEnabled'] 418 | break 419 | 420 | alarm_created = False 421 | 422 | if not idle_alarm_exists: 423 | logger.info('Creating idle utilization alarm for %s', resource[arn_key]) 424 | 425 | cw.put_metric_alarm( 426 | AlarmName = alarm_name, 427 | AlarmDescription = 'Alarms when utilization is idle for continguous length of time indicating potential abandoned campaign/recommender', 428 | ActionsEnabled = enable_actions, 429 | OKActions = [ topic_arn ], 430 | AlarmActions = [ topic_arn ], 431 | MetricName = metric_name, 432 | Namespace = 'AWS/Personalize', 433 | Statistic = 'Sum', 434 | Dimensions = [ 435 | { 436 | 'Name': dim_name, 437 | 'Value': resource[arn_key] 438 | } 439 | ], 440 | Period = ALARM_PERIOD_SECONDS, 441 | EvaluationPeriods = int(((60 * 60) / ALARM_PERIOD_SECONDS) * idle_threshold_hours), 442 | Threshold = 0, 443 | ComparisonOperator = 'LessThanOrEqualToThreshold', 444 | TreatMissingData = 'breaching', # Won't get metric data for idle campaigns 445 | Tags=[ 446 | { 447 | 'Key': 'CreatedBy', 448 | 'Value': PROJECT_NAME 449 | } 450 | ] 451 | ) 452 | 453 | alarm_created = True 454 | elif enable_actions != actions_currently_enabled: 455 | # Toggle enable/disable actions for existing alarm. 456 | if enable_actions: 457 | cw.enable_alarm_actions(AlarmNames = [ alarm_name ]) 458 | else: 459 | cw.disable_alarm_actions(AlarmNames = [ alarm_name ]) 460 | 461 | return alarm_created 462 | 463 | def divide_chunks(l, n): 464 | for i in range(0, len(l), n): 465 | yield l[i:i + n] 466 | 467 | def perform_hourly_checks(resource_arn): 468 | ''' Hashes resource_arn across 10 minute intervals of the current hour so we spread out hourly checks ''' 469 | num_slots = 6 # 60 mins / 10 470 | slot = sum(bytearray(resource_arn.encode('utf-8'))) % num_slots 471 | # Allow for match on first two minutes of 10 minute slot to account for CW event lag (assumes current schedule of every 5 mins). 472 | return datetime.datetime.now().minute in range(slot * 10, slot * 10 + 2) 473 | 474 | @logger.inject_lambda_context(log_event=True) 475 | def lambda_handler(event, _): 476 | auto_create_utilization_alarms = event.get('AutoCreateUtilizationAlarms') 477 | if not auto_create_utilization_alarms: 478 | auto_create_utilization_alarms = os.environ.get('AutoCreateUtilizationAlarms', 'yes').lower() in [ 'true', 'yes', '1' ] 479 | 480 | utilization_threshold_lower_bound = event.get('UtilizationThresholdAlarmLowerBound') 481 | if not utilization_threshold_lower_bound: 482 | utilization_threshold_lower_bound = float(os.environ.get('UtilizationThresholdAlarmLowerBound', '100.0')) 483 | 484 | auto_create_idle_alarms = event.get('AutoCreateIdleAlarms') 485 | if not auto_create_idle_alarms: 486 | auto_create_idle_alarms = os.environ.get('AutoCreateIdleAlarms', 'yes').lower() in [ 'true', 'yes', '1' ] 487 | 488 | auto_delete_idle_resources = event.get('AutoDeleteOrStopIdleResources') 489 | if not auto_delete_idle_resources: 490 | auto_delete_idle_resources = os.environ.get('AutoDeleteOrStopIdleResources', 'false').lower() in [ 'true', 'yes', '1' ] 491 | 492 | idle_resource_threshold_hours = event.get('IdleThresholdHours') 493 | if not idle_resource_threshold_hours: 494 | idle_resource_threshold_hours = int(os.environ.get('IdleThresholdHours', '24')) 495 | 496 | if idle_resource_threshold_hours < MIN_IDLE_THRESHOLD_HOURS: 497 | raise ValueError(f'"IdleThresholdHours" must be >= {MIN_IDLE_THRESHOLD_HOURS} hours') 498 | 499 | auto_adjust_min_tps = event.get('AutoAdjustMinTPS') 500 | if not auto_adjust_min_tps: 501 | auto_adjust_min_tps = os.environ.get('AutoAdjustMinTPS', 'yes').lower() in [ 'true', 'yes', '1' ] 502 | 503 | campaigns = get_configured_active_campaigns(event) 504 | recommenders = get_configured_active_recommenders(event) 505 | 506 | current_region = os.environ['AWS_REGION'] 507 | 508 | metric_datas_by_region = {} 509 | 510 | append_metric(metric_datas_by_region, current_region, { 511 | 'MetricName': 'monitoredResourceCount', 512 | 'Value': len(campaigns) + len(recommenders), 513 | 'Unit': 'Count' 514 | }) 515 | 516 | resource_metrics_written = 0 517 | all_metrics_written = 0 518 | alarms_created = 0 519 | 520 | # Define our 5 minute window, ensuring it's on prior 5 minute boundary. 521 | end_time = datetime.datetime.now(datetime.timezone.utc) 522 | end_time = end_time.replace(microsecond=0,second=0, minute=end_time.minute - end_time.minute % 5) 523 | start_time = end_time - datetime.timedelta(minutes=5) 524 | 525 | logger.info('Retrieving minProvisionedTPS for %d active campaigns', len(campaigns)) 526 | logger.info('Retrieving minRecommendationRequestsPerSecond for %d active recommenders', len(recommenders)) 527 | 528 | for resource in campaigns + recommenders: 529 | if logger.isEnabledFor(logging.DEBUG): 530 | logger.debug('Resource: %s', json.dumps(resource, indent = 2, default = str)) 531 | 532 | is_campaign = 'campaignArn' in resource 533 | 534 | resource_arn = resource['campaignArn'] if is_campaign else resource['recommenderArn'] 535 | resource_region = extract_region(resource_arn) 536 | 537 | min_tps = resource['minProvisionedTPS'] if is_campaign else resource['recommenderConfig']['minRecommendationRequestsPerSecond'] 538 | 539 | append_metric(metric_datas_by_region, resource_region, { 540 | 'MetricName': 'minProvisionedTPS' if is_campaign else 'minRecommendationRequestsPerSecond', 541 | 'Dimensions': [ 542 | { 543 | 'Name': 'CampaignArn' if is_campaign else 'RecommenderArn', 544 | 'Value': resource_arn 545 | } 546 | ], 547 | 'Value': min_tps, 548 | 'Unit': 'Count/Second' 549 | }) 550 | 551 | tps = get_average_tps(resource, start_time, end_time) 552 | utilization = 0 553 | 554 | if tps: 555 | append_metric(metric_datas_by_region, resource_region, { 556 | 'MetricName': 'averageTPS' if is_campaign else 'averageRPS', 557 | 'Dimensions': [ 558 | { 559 | 'Name': 'CampaignArn' if is_campaign else 'RecommenderArn', 560 | 'Value': resource_arn 561 | } 562 | ], 563 | 'Value': tps, 564 | 'Unit': 'Count/Second' 565 | }) 566 | 567 | utilization = tps / min_tps * 100 568 | 569 | append_metric(metric_datas_by_region, resource_region, { 570 | 'MetricName': 'campaignUtilization' if is_campaign else 'recommenderUtilization', 571 | 'Dimensions': [ 572 | { 573 | 'Name': 'CampaignArn' if is_campaign else 'RecommenderArn', 574 | 'Value': resource_arn 575 | } 576 | ], 577 | 'Value': utilization, 578 | 'Unit': 'Percent' 579 | }) 580 | 581 | logger.debug( 582 | 'Resource %s has current minTPS of %d and actual TPS of %s yielding %.2f%% utilization', 583 | resource_arn, min_tps, tps, utilization 584 | ) 585 | resource_metrics_written += 1 586 | 587 | # Only do idle resource and minTPS adjustment checks once per hour for each campaign/recommender. 588 | perform_hourly_checks_this_run = perform_hourly_checks(resource_arn) 589 | 590 | # Determine how old the resource is and time since last update. 591 | resource_age_hours = get_age_hours(resource) 592 | resource_update_age_hours = get_last_update_age_hours(resource) 593 | 594 | resource_delete_stop_event_fired = False 595 | 596 | if utilization == 0 and perform_hourly_checks_this_run and auto_delete_idle_resources: 597 | # Resource is currently idle. Let's see if it's old enough and not being updated recently. 598 | logger.info( 599 | 'Performing idle stop/delete check for %s; resource is %d hours old; last updated %s hours ago', 600 | resource_arn, resource_age_hours, resource_update_age_hours 601 | ) 602 | 603 | if (resource_age_hours >= idle_resource_threshold_hours): 604 | 605 | # Resource has been around long enough. Let's see how long it's been idle. 606 | end_time_idle_check = datetime.datetime.now(datetime.timezone.utc) 607 | start_time_idle_check = end_time_idle_check - datetime.timedelta(hours = idle_resource_threshold_hours) 608 | period_idle_check = idle_resource_threshold_hours * 60 * 60 609 | 610 | total_requests = get_total_requests(resource, start_time_idle_check, end_time_idle_check, period_idle_check) 611 | 612 | if total_requests == 0: 613 | if is_resource_updatable(resource): 614 | if is_campaign: 615 | detail_type = 'DeletePersonalizeCampaign' 616 | reason = f'Campaign {resource_arn} has been idle for at least {idle_resource_threshold_hours} hours so initiating delete according to configuration.' 617 | else: 618 | detail_type = 'StopPersonalizeRecommender' 619 | reason = f'Recommender {resource_arn} has been idle for at least {idle_resource_threshold_hours} hours so initiating stop according to configuration.' 620 | 621 | logger.info(reason) 622 | 623 | put_event( 624 | detail_type = detail_type, 625 | detail = json.dumps({ 626 | 'ARN': resource_arn, 627 | 'Utilization': utilization, 628 | 'AgeHours': resource_age_hours, 629 | 'IdleThresholdHours': idle_resource_threshold_hours, 630 | 'TotalRequestsDuringIdleThresholdHours': total_requests, 631 | 'Reason': reason 632 | }), 633 | resources = [ resource_arn ] 634 | ) 635 | 636 | resource_delete_stop_event_fired = True 637 | else: 638 | logger.warn( 639 | 'Resource %s has been idle for at least %d hours but its status will not allow it to be deleted/stopped on this run', 640 | resource_arn, idle_resource_threshold_hours 641 | ) 642 | else: 643 | logger.warn( 644 | 'Resource %s is currently idle but has had %d requests within the last %d hours so does not meet idle criteria for auto-deletion/auto-stop', 645 | resource_arn, total_requests, idle_resource_threshold_hours 646 | ) 647 | else: 648 | logger.info( 649 | 'Resource %s is only %d hours old and last update %s hours old; too new to consider for auto-deletion/auto-stop', 650 | resource_arn, resource_age_hours, resource_update_age_hours 651 | ) 652 | 653 | if (not resource_delete_stop_event_fired and 654 | perform_hourly_checks_this_run and 655 | auto_adjust_min_tps and 656 | min_tps > 1): 657 | 658 | days_back = 14 659 | end_time_tps_check = datetime.datetime.now(datetime.timezone.utc).replace(minute=0, second=0, microsecond=0) 660 | start_time_tps_check = end_time_tps_check - datetime.timedelta(days = days_back) 661 | 662 | datapoints = get_sum_requests_by_hour(resource, start_time_tps_check, end_time_tps_check) 663 | min_reqs = sys.maxsize 664 | max_reqs = total_reqs = total_avg_tps = min_avg_tps = max_avg_tps = 0 665 | 666 | for datapoint in datapoints: 667 | total_reqs += datapoint['Value'] 668 | min_reqs = min(min_reqs, datapoint['Value']) 669 | max_reqs = max(max_reqs, datapoint['Value']) 670 | 671 | if len(datapoints) > 0: 672 | total_avg_tps = int(total_reqs / (len(datapoints) * 3600)) 673 | min_avg_tps = int(min_reqs / 3600) 674 | max_avg_tps = int(max_reqs / 3600) 675 | 676 | logger.info( 677 | 'Performing minTPS/minRPS adjustment check for %s; min/max/avg hourly TPS over last %d days for %d datapoints: %d/%d/%.2f', 678 | resource_arn, days_back, len(datapoints), min_avg_tps, max_avg_tps, total_avg_tps 679 | ) 680 | 681 | min_age_to_update_hours = 24 682 | 683 | age_eligible = True 684 | 685 | if resource_age_hours < min_age_to_update_hours: 686 | logger.info( 687 | 'Resource %s is less than %d hours old so not eligible for minTPS/minRPS adjustment yet', 688 | resource_arn, min_age_to_update_hours 689 | ) 690 | age_eligible = False 691 | 692 | if age_eligible and min_avg_tps < min_tps: 693 | # Incrementally drop minTPS/minRPS. 694 | new_min_tps = max(1, int(math.floor(min_tps * .75))) 695 | 696 | if is_resource_updatable(resource): 697 | reason = f'Step down adjustment of minTPS/minRPS for {resource_arn} down from {min_tps} to {new_min_tps} based on average hourly TPS low watermark of {min_avg_tps} over last {days_back} days' 698 | logger.info(reason) 699 | 700 | put_event( 701 | detail_type = 'UpdatePersonalizeCampaignMinProvisionedTPS' if is_campaign else 'UpdatePersonalizeRecommenderMinRecommendationRPS', 702 | detail = json.dumps({ 703 | 'ARN': resource_arn, 704 | 'Utilization': utilization, 705 | 'AgeHours': resource_age_hours, 706 | 'CurrentMinTPS': min_tps, 707 | 'NewMinTPS': new_min_tps, 708 | 'MinAverageTPS': min_avg_tps, 709 | 'MaxAverageTPS': max_avg_tps, 710 | 'Datapoints': datapoints, 711 | 'Reason': reason 712 | }, default = str), 713 | resources = [ resource_arn ] 714 | ) 715 | else: 716 | logger.warn( 717 | 'Resource %s could have its minTPS/minRPS adjusted down from %d to %d based on average hourly TPS low watermark over last %d days but its status will not allow it to be updated on this run', 718 | resource_arn, min_tps, new_min_tps, days_back 719 | ) 720 | 721 | if not resource_delete_stop_event_fired: 722 | if auto_create_utilization_alarms: 723 | if create_utilization_alarm(resource_region, resource, utilization_threshold_lower_bound): 724 | alarms_created += 1 725 | 726 | if auto_create_idle_alarms: 727 | if create_idle_resource_alarm(resource_region, resource, idle_resource_threshold_hours): 728 | alarms_created += 1 729 | 730 | for region, metric_datas in metric_datas_by_region.items(): 731 | cw = get_client(service_name = 'cloudwatch', region_name = region) 732 | 733 | metric_datas_chunks = divide_chunks(metric_datas, MAX_METRICS_PER_CALL) 734 | 735 | for metrics_datas_chunk in metric_datas_chunks: 736 | put_metrics(cw, metrics_datas_chunk) 737 | all_metrics_written += len(metrics_datas_chunk) 738 | 739 | outcome = f'Logged {all_metrics_written} TPS utilization metrics for {resource_metrics_written} active campaigns and recommenders; {alarms_created} alarms created' 740 | logger.info(outcome) 741 | 742 | if alarms_created > 0: 743 | # At least one new alarm was created so that likely means new campaigns were created too. Let's trigger the dashboard to be rebuilt. 744 | logger.info('Triggering rebuild of the CloudWatch dashboard since %d new alarm(s) were created', alarms_created) 745 | put_event( 746 | detail_type = 'BuildPersonalizeMonitorDashboard', 747 | detail = json.dumps({ 748 | 'Reason': f'Triggered rebuild due to {alarms_created} new alarm(s) being created' 749 | }) 750 | ) 751 | 752 | return outcome 753 | -------------------------------------------------------------------------------- /src/personalize_monitor_function/requirements.txt: -------------------------------------------------------------------------------- 1 | # Note: AWS Lambda Power Tools dependency is satisfied by Lambda layer at runtime (part of deployment). 2 | -------------------------------------------------------------------------------- /src/personalize_stop_recommender_function/README.md: -------------------------------------------------------------------------------- 1 | # Amazon Personalize Monitor - Stop Recommender Function 2 | 3 | This Lambda function stops a Personalize recommender. It is called as the target of an EventBridge rule that matches events with the `StopPersonalizeRecommender` detail-type. The [personalize-monitor](../personalize_monitor_function/) function publishes this event when the `AutoDeleteOrStopIdleResources` deployment parameter is `Yes` AND a monitored recommender has been idle more than `IdleThresholdHours` hours. Therefore, an idle recommender is one that has not had any `GetRecommendations` calls in the last `IdleThresholdHours` hours. 4 | 5 | This function will also delete any CloudWatch alarms that were dynamically created by this application for the stopped recommender. Alarms can be created for idle recommenders and low utilization recommenders via the `AutoCreateIdleAlarms` and `AutoCreateUtilizationAlarms` deployment parameters. 6 | 7 | > Note that Personalize campaigns are deleted and not stopped by this application. Since model artifacts are associated with a solution version, deleting a campaign does **not** delete the actual model artifacts. See the [personalize_delete_campaign](../personalize_delete_campaign_function/) function for details. 8 | 9 | ## How it works 10 | 11 | The EventBridge event structure that triggers this function looks something like this: 12 | 13 | ```javascript 14 | { 15 | "source": "personalize.monitor", 16 | "detail-type": "StopPersonalizeRecommender", 17 | "resources": [ RECOMMENDER_ARN_TO_STOP ], 18 | "detail": { 19 | 'ARN': RECOMMENDER_ARN_TO_STOP, 20 | 'Utilization': CURRENT_UTILIZATION, 21 | 'AgeHours': RECOMMENDER_AGE_IN_HOURS, 22 | 'IdleThresholdHours': RECOMMENDER_IDLE_HOURS, 23 | 'TotalRequestsDuringIdleThresholdHours': 0, 24 | 'Reason': DESCRIPTIVE_REASON_FOR_DELETE 25 | } 26 | } 27 | ``` 28 | 29 | This function can also be invoked directly as part of your own operational process. The event you pass to the function only requires the recommender ARN as follows. 30 | 31 | ```javascript 32 | { 33 | "ARN": RECOMMENDER_ARN_TO_STOP, 34 | "Reason": OPTIONAL_DESCRIPTIVE_REASON_FOR_DELETE 35 | } 36 | ``` 37 | 38 | The Personalize [StopRecommender](https://docs.aws.amazon.com/personalize/latest/dg/API_StopRecommender.html) API is used to stop the recommender. 39 | 40 | ## Published events 41 | 42 | When the recommender stop request and the deletion of any dynamically created CloudWatch alarms for the recommender have been successfully initiated by this function, two events are published to EventBridge. One event will trigger a notification to the SNS topic for this application and the other trigger the CloudWatch dashboard to be rebuilt. 43 | 44 | ### Delete notification 45 | 46 | The following event is published to EventBridge to signal that a campaign has been deleted. 47 | 48 | ```javascript 49 | { 50 | "source": "personalize.monitor", 51 | "detail_type": "PersonalizeRecommenderStopped", 52 | "resources": [ RECOMMENDER_ARN_STOPPED ], 53 | "detail": { 54 | "ARN": RECOMMENDER_ARN_STOPPED, 55 | "Reason": DESCRIPTIVE_REASON_FOR_STOP 56 | } 57 | } 58 | ``` 59 | 60 | An EventBridge rule is setup that will target an SNS topic with `NotificationEndpoint` as the subscriber. This is the email address you provided at deployment time. If you'd like, you can customize how these notification events are handled in the EventBridge and SNS consoles. 61 | 62 | ### Rebuild CloudWatch dashboard 63 | 64 | Since a monitored recommender has been stopped, the CloudWatch dashboard needs to be rebuilt so that the recommender is removed from the widgets. This is accomplished by publishing a `BuildPersonalizeMonitorDashboard` event that is processed by the [dashboard_mgmt](../dashboard_mgmt_function/) function. 65 | 66 | ```javascript 67 | { 68 | "source": "personalize.monitor", 69 | "detail_type": "BuildPersonalizeMonitorDashboard", 70 | "resources": [ RECOMMENDER_ARN_STOPPED ], 71 | "detail": { 72 | "ARN": RECOMMENDER_ARN_STOPPED, 73 | "Reason": DESCRIPTIVE_REASON_FOR_REBUILD 74 | } 75 | } 76 | ``` 77 | -------------------------------------------------------------------------------- /src/personalize_stop_recommender_function/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/src/personalize_stop_recommender_function/__init__.py -------------------------------------------------------------------------------- /src/personalize_stop_recommender_function/personalize_stop_recommender.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | """ 5 | Lambda function that is used to stop a Personalize recommender based on prolonged idle time 6 | and according to configuration to automatically stop recommenders under these conditions. 7 | Note that this function just stops the recommender; it does NOT delete the recommender. The 8 | idea is to stop ongoing charges for an idle recommender. 9 | """ 10 | 11 | import json 12 | import logging 13 | 14 | from aws_lambda_powertools import Logger 15 | 16 | from common import ( 17 | PROJECT_NAME, 18 | ALARM_NAME_PREFIX, 19 | extract_region, 20 | get_client, 21 | put_event 22 | ) 23 | 24 | logger = Logger() 25 | 26 | def delete_alarms_for_recommender(recommender_arn): 27 | cw = get_client(service_name = 'cloudwatch', region_name = extract_region(recommender_arn)) 28 | 29 | alarm_names_to_delete = set() 30 | 31 | alarms_paginator = cw.get_paginator('describe_alarms') 32 | for alarms_page in alarms_paginator.paginate(AlarmNamePrefix = ALARM_NAME_PREFIX, AlarmTypes=['MetricAlarm']): 33 | for alarm in alarms_page['MetricAlarms']: 34 | for dim in alarm['Dimensions']: 35 | if dim['Name'] == 'RecommenderArn' and dim['Value'] == recommender_arn: 36 | tags_response = cw.list_tags_for_resource(ResourceARN = alarm['AlarmArn']) 37 | 38 | for tag in tags_response['Tags']: 39 | if tag['Key'] == 'CreatedBy' and tag['Value'] == PROJECT_NAME: 40 | alarm_names_to_delete.add(alarm['AlarmName']) 41 | break 42 | 43 | if alarm_names_to_delete: 44 | # FUTURE: max check of 100 45 | logger.info('Deleting CloudWatch alarms for recommender %s: %s', recommender_arn, alarm_names_to_delete) 46 | cw.delete_alarms(AlarmNames=list(alarm_names_to_delete)) 47 | alarms_deleted += len(alarm_names_to_delete) 48 | else: 49 | logger.info('No CloudWatch alarms to delete for recommender %s', recommender_arn) 50 | 51 | @logger.inject_lambda_context(log_event=True) 52 | def lambda_handler(event, _): 53 | ''' Initiates stopping a Personalize recommender ''' 54 | if event.get('detail'): 55 | recommender_arn = event['detail']['ARN'] 56 | reason = event['detail'].get('Reason') 57 | else: 58 | recommender_arn = event['ARN'] 59 | reason = event.get('Reason') 60 | 61 | region = extract_region(recommender_arn) 62 | if not region: 63 | raise Exception('Region could not be extracted from ARN') 64 | 65 | personalize = get_client(service_name = 'personalize', region_name = region) 66 | 67 | response = personalize.stop_recommender(recommenderArn = recommender_arn) 68 | 69 | if logger.isEnabledFor(logging.DEBUG): 70 | logger.debug(json.dumps(response, indent = 2, default = str)) 71 | 72 | if not reason: 73 | reason = f'Amazon Personalize recommender {recommender_arn} stop initiated (reason unspecified)' 74 | 75 | put_event( 76 | detail_type = 'PersonalizeRecommenderStopped', 77 | detail = json.dumps({ 78 | 'ARN': recommender_arn, 79 | 'Reason': reason 80 | }), 81 | resources = [ recommender_arn ] 82 | ) 83 | 84 | put_event( 85 | detail_type = 'BuildPersonalizeMonitorDashboard', 86 | detail = json.dumps({ 87 | 'ARN': recommender_arn, 88 | 'Reason': reason 89 | }), 90 | resources = [ recommender_arn ] 91 | ) 92 | 93 | logger.info({ 94 | 'recommenderArn': recommender_arn 95 | }) 96 | 97 | delete_alarms_for_recommender(recommender_arn) 98 | 99 | return f'Successfully initiated stop of recommender {recommender_arn}' -------------------------------------------------------------------------------- /src/personalize_stop_recommender_function/requirements.txt: -------------------------------------------------------------------------------- 1 | # Note: AWS Lambda Power Tools dependency is satisfied by Lambda layer at runtime (part of deployment). -------------------------------------------------------------------------------- /src/personalize_update_tps_function/README.md: -------------------------------------------------------------------------------- 1 | # Amazon Personalize Monitor - Campaign Provisioned TPS Update Function 2 | 3 | This Lambda function adjusts the `minProvisionedTPS` value for a Personalize campaign or the `minRecommendationRequestsPerSecond` for a Personalize recommender. It is called as the target of EventBridge rules for events emitted by the [personalize_monitor](../personalize_monitor_function/) function when configured to update campaigns and recommenders based on actual TPS activity. You can also incorporate this function into your own operations to scale campaigns and recommenders up and down. For example, if you know your campaign or recommender will experience a massive spike in requests at a certain time (i.e. flash sale) and you want to pre-warm your campaign or recommender capacity, you can create a [CloudWatch event](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html) to call this function 30 minutes before the expected spike in traffic to increase endpoint capacity and then again after the traffic event to lower the capacity. Alternatively, if there are certain events that occur in your application that you know will generate a predictably higher or lower volume of requests than the current `minProvisionedTPS`/`minRecommendationRequestsPerSecond` **AND** Personalize's auto-scaling will not suffice, you can use this function as a trigger to adjust `minProvisionedTPS`/`minRecommendationRequestsPerSecond` accordingly. 4 | 5 | ## How it works 6 | 7 | The EventBridge event structure that triggers this function for a camapaign looks something like this: 8 | 9 | ```javascript 10 | { 11 | "source": "personalize.monitor", 12 | "detail-type": "UpdatePersonalizeCampaignMinProvisionedTPS", 13 | "resources": [ CAMPAIGN_ARN_TO_UPDATE ], 14 | "detail": { 15 | "ARN": CAMPAIGN_ARN_TO_UPDATE, 16 | "Utilization": CURRENT_UTILIZATION, 17 | "AgeHours": CAMPAIGN_AGE_IN_HOURS, 18 | "CurrentMinTPS": CURRENT_MIN_PROVISIONED_TPS, 19 | "NewMinTPS": NEW_MIN_PROVISIONED_TPS, 20 | "MinAverageTPS": MIN_AVERAGE_TPS_LAST_24_HOURS, 21 | "MaxAverageTPS": MAX_AVERATE_TPS_LAST_24_HOURS, 22 | "Datapoints": [ CW_METRIC_DATAPOINTS_LAST_24_HOURS ], 23 | "Reason": DESCRIPTIVE_REASON_FOR_UPDATE 24 | } 25 | } 26 | ``` 27 | 28 | Similarly, the EventBridge event structure that triggers this function for a recommender looks something like this: 29 | 30 | ```javascript 31 | { 32 | "source": "personalize.monitor", 33 | "detail-type": "UpdatePersonalizeRecommenderMinRecommendationRPS", 34 | "resources": [ RECOMMENDER_ARN_TO_UPDATE ], 35 | "detail": { 36 | "ARN": RECOMMENDER_ARN_TO_UPDATE, 37 | "Utilization": CURRENT_UTILIZATION, 38 | "AgeHours": RECOMMENDER_AGE_IN_HOURS, 39 | "CurrentMinTPS": CURRENT_MIN_RECOMMENDATION_RPS, 40 | "NewMinTPS": NEW_MIN_RECOMMENDATION_RPS, 41 | "MinAverageTPS": MIN_AVERAGE_TPS_LAST_24_HOURS, 42 | "MaxAverageTPS": MAX_AVERATE_TPS_LAST_24_HOURS, 43 | "Datapoints": [ CW_METRIC_DATAPOINTS_LAST_24_HOURS ], 44 | "Reason": DESCRIPTIVE_REASON_FOR_UPDATE 45 | } 46 | } 47 | ``` 48 | 49 | This function can also be invoked directly as part of your own operational process. The event you pass to the function only requires the campaign ARN and new `minProvisionedTPS` as follows. 50 | 51 | ```javascript 52 | { 53 | "ARN": "CAMPAIGN_OR_RECOMMENDER_ARN_HERE", 54 | "NewMinTPS": NEW_MIN_TPS_HERE, 55 | "Reason": DESCRIPTIVE_REASON_FOR_UPDATE 56 | } 57 | ``` 58 | 59 | For Personalize campaigns, the [UpdateCampaign](https://docs.aws.amazon.com/personalize/latest/dg/API_UpdateCampaign.html) API is used to update the `minProvisionedTPS` value. For Personalize recommenders, the [UpdateRecommender](https://docs.aws.amazon.com/personalize/latest/dg/API_UpdateRecommender.html) API is used to update the `minRecommendationRequestsPerSecond` value. 60 | 61 | ## Published events 62 | 63 | When an update of a campaign's `minProvisionedTPS` or recommender's `minRecommendationRequestsPerSecond` has been successfully initiated by this function, an event is published to EventBridge to trigger a notification. 64 | 65 | > Since it can take several minutes for a campaign or recommender to redeploy after updating its `minProvisionedTPS` or `minRecommendationRequestsPerSecond`, you will receive the notification when the redeploy starts. The campaign/recommender will continue to respond to `GetRecommendations`/`GetPersonalizedRanking` API requests while it is redeploying. **Therefore, there will be no interruption of service while it's redeploying.** 66 | 67 | ### Update minProvisionedTPS notification 68 | 69 | The following event is published to EventBridge to signal that an update to a campaign has been initiated. 70 | 71 | ```javascript 72 | { 73 | "source": "personalize.monitor", 74 | "detail_type": "PersonalizeCampaignMinProvisionedTPSUpdated", 75 | "resources": [ CAMPAIGN_ARN_UPDATED ], 76 | "detail": { 77 | "ARN": CAMPAIGN_ARN_UPDATED, 78 | "NewMinTPS": NEW_TPS, 79 | "Reason": DESCRIPTIVE_REASON_FOR_DELETE 80 | } 81 | } 82 | ``` 83 | 84 | ### Update minRecommendationRequestsPerSecond notification 85 | 86 | The following event is published to EventBridge to signal that an update to a recommender has been initiated. 87 | 88 | ```javascript 89 | { 90 | "source": "personalize.monitor", 91 | "detail_type": "PersonalizeRecommenderMinRecommendationRPSUpdated", 92 | "resources": [ RECOMMENDER_ARN_UPDATED ], 93 | "detail": { 94 | "ARN": RECOMMENDER_ARN_UPDATED, 95 | "NewMinTPS": NEW_TPS, 96 | "Reason": DESCRIPTIVE_REASON_FOR_DELETE 97 | } 98 | } 99 | ``` 100 | 101 | An EventBridge rule is setup that will target an SNS topic with `NotificationEndpoint` as the subscriber. This is the email address you provided at deployment time. If you'd like, you can customize how these notification events are handled or add your own targets in the EventBridge and SNS consoles. 102 | -------------------------------------------------------------------------------- /src/personalize_update_tps_function/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/src/personalize_update_tps_function/__init__.py -------------------------------------------------------------------------------- /src/personalize_update_tps_function/personalize_update_tps.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | """ 5 | Utility Lambda function that can be used to update a Personalize campaign's minProvisionedTPS value 6 | based on triggers such as CloudWatch event rules (i.e. cron) or application events. 7 | """ 8 | 9 | import json 10 | import json 11 | import logging 12 | 13 | from aws_lambda_powertools import Logger 14 | 15 | from common import ( 16 | extract_region, 17 | extract_resource_type, 18 | get_client, 19 | put_event 20 | ) 21 | 22 | logger = Logger() 23 | 24 | @logger.inject_lambda_context(log_event=True) 25 | def lambda_handler(event, _): 26 | ''' Updates the minProvisionedTPS value for an existing Personalize campaign ''' 27 | 28 | if event.get('detail'): 29 | arn = event['detail']['ARN'] 30 | min_tps = event['detail']['NewMinTPS'] 31 | reason = event['detail']['Reason'] 32 | else: 33 | arn = event['ARN'] 34 | min_tps = event['NewMinTPS'] 35 | reason = event.get('Reason') 36 | 37 | region = extract_region(arn) 38 | if not region: 39 | raise Exception('Region could not be extracted from ARN in event') 40 | 41 | resource_type = extract_resource_type(arn) 42 | if not resource_type: 43 | raise Exception('Resource type could not be extracted from ARN in event') 44 | 45 | if resource_type not in ['campaign', 'recommender']: 46 | raise Exception('Resource type represented by ARN in event is not "campaign" or "recommender"') 47 | 48 | if min_tps < 1: 49 | raise ValueError(f'"NewMinTPS" must be >= 1') 50 | 51 | personalize = get_client(service_name = 'personalize', region_name = region) 52 | 53 | if resource_type == 'campaign': 54 | response = personalize.update_campaign(campaignArn = arn, minProvisionedTPS = min_tps) 55 | notification_detail_type = 'PersonalizeCampaignMinProvisionedTPSUpdated' 56 | else: 57 | response = personalize.describe_recommender(recommenderArn = arn) 58 | 59 | config = response['recommender']['recommenderConfig'] 60 | config['minRecommendationRequestsPerSecond'] = min_tps 61 | 62 | response = personalize.update_recommender(recommenderArn = arn, recommenderConfig = config) 63 | notification_detail_type = 'PersonalizeRecommenderMinRecommendationRPSUpdated' 64 | 65 | if logger.isEnabledFor(logging.DEBUG): 66 | logger.debug(json.dumps(response, indent = 2, default = str)) 67 | 68 | if not reason: 69 | reason = f'Amazon Personalize {resource_type} {arn} min TPS update initiated (reason unspecified)' 70 | 71 | put_event( 72 | detail_type = notification_detail_type, 73 | detail = json.dumps({ 74 | 'ARN': arn, 75 | 'NewMinTPS': min_tps, 76 | 'Reason': reason 77 | }), 78 | resources = [ arn ] 79 | ) 80 | 81 | logger.info({ 82 | 'arn': arn, 83 | 'newMinTPS': min_tps 84 | }) 85 | 86 | return f'Successfully initiated update of min TPS to {min_tps} for {resource_type} {arn}' -------------------------------------------------------------------------------- /src/personalize_update_tps_function/requirements.txt: -------------------------------------------------------------------------------- 1 | # Note: AWS Lambda Power Tools dependency is satisfied by Lambda layer at runtime (part of deployment). -------------------------------------------------------------------------------- /template.yaml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: '2010-09-09' 2 | Transform: AWS::Serverless-2016-10-31 3 | Description: > 4 | (P9E-MONITOR) -Personalize monitoring tools including CloudWatch metrics, alarms, and dashboard; optional automated cost optimization 5 | 6 | Metadata: 7 | AWS::ServerlessRepo::Application: 8 | Name: Amazon-Personalize-Monitor 9 | Description: > 10 | Creates a CloudWatch dashboard for monitoring the utilization of Amazon Personalize 11 | campaigns and recommenders; creates CloudWatch alarms based on a user-defined threshold; and 12 | includes automated cost optimization actions. 13 | Author: AWS Applied AI - Personalize 14 | SpdxLicenseId: MIT-0 15 | LicenseUrl: LICENSE 16 | ReadmeUrl: README-SAR.md 17 | Labels: ['Personalize', 'CloudWatch', 'Monitoring'] 18 | HomePageUrl: https://github.com/aws-samples/amazon-personalize-monitor 19 | SemanticVersion: 1.2.1 20 | SourceCodeUrl: https://github.com/aws-samples/amazon-personalize-monitor 21 | 22 | AWS::CloudFormation::Interface: 23 | ParameterGroups: 24 | - Label: 25 | default: "Amazon Personalize inference resources to monitor" 26 | Parameters: 27 | - CampaignARNs 28 | - RecommenderARNs 29 | - Regions 30 | - Label: 31 | default: "CloudWatch alarm configuration" 32 | Parameters: 33 | - AutoCreateUtilizationAlarms 34 | - UtilizationThresholdAlarmLowerBound 35 | - AutoCreateIdleAlarms 36 | - IdleThresholdHours 37 | - Label: 38 | default: "Cost optimization actions" 39 | Parameters: 40 | - AutoAdjustMinTPS 41 | - AutoDeleteOrStopIdleResources 42 | - Label: 43 | default: "Notifications" 44 | Parameters: 45 | - NotificationEndpoint 46 | ParameterLabels: 47 | CampaignARNs: 48 | default: "Personalize campaign ARNs to monitor" 49 | RecommenderARNs: 50 | default: "Personalize recommender ARNs to monitor" 51 | Regions: 52 | default: "AWS regions to monitor" 53 | AutoCreateUtilizationAlarms: 54 | default: "Automatically create campaign/recommender utilization CloudWatch alarms?" 55 | UtilizationThresholdAlarmLowerBound: 56 | default: "Campaign/recommender utilization alarm lower bound threshold" 57 | AutoCreateIdleAlarms: 58 | default: "Automatically create idle campaign/recommender CloudWatch alarms?" 59 | IdleThresholdHours: 60 | default: "Number of hours without requests to be considered idle" 61 | AutoDeleteOrStopIdleResources: 62 | default: "Automatically delete idle campaigns and stop idle recommenders in idle alarm state?" 63 | AutoAdjustMinTPS: 64 | default: "Automatically adjust/lower minProvisionedTPS/minRecommendationRequestsPerSecond for campaigns/recommenders in utilization alarm state?" 65 | NotificationEndpoint: 66 | default: "Email address to receive notifications" 67 | 68 | Parameters: 69 | CampaignARNs: 70 | Type: String 71 | Description: > 72 | Comma separated list of Amazon Personalize campaign ARNs to monitor or 'all' to dynamically monitor all active campaigns. 73 | Default: 'all' 74 | 75 | RecommenderARNs: 76 | Type: String 77 | Description: > 78 | Comma separated list of Amazon Personalize recommender ARNs to monitor or 'all' to dynamically monitor all active recommenders. 79 | Default: 'all' 80 | 81 | Regions: 82 | Type: String 83 | Description: > 84 | Comma separated list of AWS region names. When using 'all' for CampaignARNs or RecommenderARNs, this parameter can be used 85 | to control the region(s) where the Personalize Monitor looks for active Personalize campaigns and recommenders. When not specified, 86 | the region where you deploy this application will be used. 87 | 88 | AutoCreateUtilizationAlarms: 89 | Type: String 90 | Description: > 91 | Whether to automatically create CloudWatch alarms for campaign/recommender utilization for monitored campaigns/recommenders. Valid values: Yes/No. 92 | AllowedValues: 93 | - 'Yes' 94 | - 'No' 95 | Default: 'Yes' 96 | 97 | UtilizationThresholdAlarmLowerBound: 98 | Type: Number 99 | Description: > 100 | Utilization alarm threshold value (in percent). When a monitored campaign's or recommender's utilization falls below this value, 101 | the alarm state will be set to ALARM. Valid values: 0-1000 (integer). 102 | MinValue: 0 103 | MaxValue: 1000 104 | Default: 100 105 | 106 | AutoAdjustMinTPS: 107 | Type: String 108 | Description: > 109 | Whether to automatically adjust minProvisionedTPS (campaigns) or minRecommendationRequestsPerSecond (recommenders) down to lowest average TPS over 110 | rolling 24 hour window. The minProvisionedTPS/minRecommendationRequestsPerSecond will never be increased. Valid values: Yes/No. 111 | AllowedValues: 112 | - 'Yes' 113 | - 'No' 114 | Default: 'Yes' 115 | 116 | AutoCreateIdleAlarms: 117 | Type: String 118 | Description: > 119 | Whether to automatically create CloudWatch alarms for detecting idle campaigns and recommenders. Valid values: Yes/No. 120 | AllowedValues: 121 | - 'Yes' 122 | - 'No' 123 | Default: 'Yes' 124 | 125 | IdleThresholdHours: 126 | Type: Number 127 | Description: > 128 | Number of consecutive idle hours before a campaign is automatically deleted or recommender is automatically stopped only if AutoDeleteOrStopIdleResources 129 | is Yes. Valid values: 2-48 (integer). 130 | MinValue: 2 131 | MaxValue: 48 132 | Default: 24 133 | 134 | AutoDeleteOrStopIdleResources: 135 | Type: String 136 | Description: > 137 | Whether to automatically delete campaigns and stop recommenders that have been idle for IdleThresholdHours consecutive hours. Valid values: Yes/No. 138 | AllowedValues: 139 | - 'Yes' 140 | - 'No' 141 | Default: 'No' 142 | 143 | NotificationEndpoint: 144 | Type: String 145 | Description: > 146 | Email address to receive CloudWatch alarm and other monitoring notifications. 147 | 148 | Globals: 149 | Function: 150 | Timeout: 5 151 | Runtime: python3.9 152 | 153 | Resources: 154 | CommonLayer: 155 | Type: AWS::Serverless::LayerVersion 156 | Properties: 157 | ContentUri: src/layer 158 | CompatibleRuntimes: 159 | - python3.9 160 | Metadata: 161 | BuildMethod: python3.9 162 | 163 | MonitorFunction: 164 | Type: AWS::Serverless::Function 165 | Properties: 166 | Description: Amazon Personalize monitor function that updates custom CloudWatch metrics and monitors campaign utilization every 5 minutes 167 | Timeout: 30 168 | CodeUri: src/personalize_monitor_function 169 | Handler: personalize_monitor.lambda_handler 170 | Layers: 171 | - !Sub 'arn:${AWS::Partition}:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV2:24' 172 | - !Ref CommonLayer 173 | Policies: 174 | - Statement: 175 | - Sid: PersonalizePolicy 176 | Effect: Allow 177 | Action: 178 | - personalize:DescribeCampaign 179 | - personalize:DescribeRecommender 180 | - personalize:DescribeSolutionVersion 181 | - personalize:ListCampaigns 182 | - personalize:ListRecommenders 183 | Resource: '*' 184 | - Sid: CloudWatchPolicy 185 | Effect: Allow 186 | Action: 187 | - cloudwatch:DescribeAlarmsForMetric 188 | - cloudwatch:DisableAlarmActions 189 | - cloudwatch:EnableAlarmActions 190 | - cloudwatch:GetMetricData 191 | - cloudwatch:PutMetricAlarm 192 | - cloudwatch:PutMetricData 193 | Resource: '*' 194 | - Sid: EventBridgePolicy 195 | Effect: Allow 196 | Action: 197 | - events:DescribeRule 198 | - events:PutEvents 199 | - events:PutRule 200 | - events:PutTargets 201 | Resource: '*' 202 | - Sid: SnsPolicy 203 | Effect: Allow 204 | Action: 205 | - sns:CreateTopic 206 | - sns:ListSubscriptionsByTopic 207 | - sns:SetTopicAttributes 208 | - sns:Subscribe 209 | Resource: !Sub 'arn:${AWS::Partition}:sns:*:${AWS::AccountId}:PersonalizeMonitorNotifications' 210 | - Sid: SnsSubPolicy 211 | Effect: Allow 212 | Action: 213 | - sns:GetSubscriptionAttributes 214 | Resource: '*' 215 | Events: 216 | ScheduledEvent: 217 | Type: Schedule 218 | Properties: 219 | Description: Triggers primary Personalize Monitor monitoring logic 220 | Schedule: cron(0/5 * * * ? *) 221 | Enabled: True 222 | Environment: 223 | Variables: 224 | CampaignARNs: !Ref CampaignARNs 225 | RecommenderARNs: !Ref RecommenderARNs 226 | Regions: !Ref Regions 227 | NotificationEndpoint: !Ref NotificationEndpoint 228 | AutoCreateUtilizationAlarms: !Ref AutoCreateUtilizationAlarms 229 | UtilizationThresholdAlarmLowerBound: !Ref UtilizationThresholdAlarmLowerBound 230 | AutoCreateIdleAlarms: !Ref AutoCreateIdleAlarms 231 | IdleThresholdHours: !Ref IdleThresholdHours 232 | AutoDeleteOrStopIdleResources: !Ref AutoDeleteOrStopIdleResources 233 | AutoAdjustMinTPS: !Ref AutoAdjustMinTPS 234 | 235 | DashboardManagementFunction: 236 | Type: AWS::Serverless::Function 237 | Properties: 238 | Description: Amazon Personalize monitor function that updates the CloudWatch dashboard hourly and when campaigns are added/deleted 239 | Timeout: 15 240 | CodeUri: src/dashboard_mgmt_function 241 | Handler: dashboard_mgmt.lambda_handler 242 | AutoPublishAlias: live 243 | Layers: 244 | - !Sub 'arn:${AWS::Partition}:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV2:24' 245 | - !Ref CommonLayer 246 | Policies: 247 | - Statement: 248 | - Sid: PersonalizePolicy 249 | Effect: Allow 250 | Action: 251 | - personalize:DescribeCampaign 252 | - personalize:DescribeDatasetGroup 253 | - personalize:DescribeRecommender 254 | - personalize:DescribeSolutionVersion 255 | - personalize:ListCampaigns 256 | - personalize:ListRecommenders 257 | Resource: '*' 258 | - Sid: DashboardPolicy 259 | Effect: Allow 260 | Action: 261 | - cloudwatch:DeleteDashboards 262 | - cloudwatch:PutDashboard 263 | Resource: '*' 264 | Environment: 265 | Variables: 266 | CampaignARNs: !Ref CampaignARNs 267 | RecommenderARNs: !Ref RecommenderARNs 268 | Regions: !Ref Regions 269 | Events: 270 | EBRule: 271 | Type: EventBridgeRule 272 | Properties: 273 | Pattern: 274 | source: 275 | - personalize.monitor 276 | detail-type: 277 | - BuildPersonalizeMonitorDashboard 278 | ScheduledEvent: 279 | Type: Schedule 280 | Properties: 281 | Description: Hourly rebuild of Personalize Monitor CloudWatch dashboard 282 | Schedule: cron(3 * * * ? *) 283 | Enabled: True 284 | 285 | DeployDashboardCustomResource: 286 | Type: Custom::DashboardCreate 287 | Properties: 288 | ServiceToken: !GetAtt DashboardManagementFunction.Arn 289 | CampaignARNs: !Ref CampaignARNs 290 | RecommenderARNs: !Ref RecommenderARNs 291 | Regions: !Ref Regions 292 | AutoCreateUtilizationAlarms: !Ref AutoCreateUtilizationAlarms 293 | UtilizationThresholdAlarmLowerBound: !Ref UtilizationThresholdAlarmLowerBound 294 | AutoCreateIdleAlarms: !Ref AutoCreateIdleAlarms 295 | IdleThresholdHours: !Ref IdleThresholdHours 296 | AutoDeleteOrStopIdleResources: !Ref AutoDeleteOrStopIdleResources 297 | AutoAdjustMinTPS: !Ref AutoAdjustMinTPS 298 | 299 | UpdateTPSFunction: 300 | Type: AWS::Serverless::Function 301 | Properties: 302 | Description: Amazon Personalize monitor function that updates the minProvisionedTPS for a campaign or the minRecommendationRequestsPerSecond for a recommender in response to an event 303 | CodeUri: src/personalize_update_tps_function 304 | Handler: personalize_update_tps.lambda_handler 305 | Layers: 306 | - !Sub 'arn:${AWS::Partition}:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV2:24' 307 | - !Ref CommonLayer 308 | Policies: 309 | - Statement: 310 | - Sid: PersonalizePolicy 311 | Effect: Allow 312 | Action: 313 | - personalize:DescribeRecommender 314 | - personalize:UpdateCampaign 315 | - personalize:UpdateRecommender 316 | Resource: '*' 317 | - Sid: EventBridgePolicy 318 | Effect: Allow 319 | Action: 320 | - events:PutEvents 321 | Resource: '*' 322 | Events: 323 | EBRule: 324 | Type: EventBridgeRule 325 | Properties: 326 | Pattern: 327 | source: 328 | - personalize.monitor 329 | detail-type: 330 | - UpdatePersonalizeCampaignMinProvisionedTPS 331 | - UpdatePersonalizeRecommenderMinRecommendationRPS 332 | 333 | DeleteCampaignFunction: 334 | Type: AWS::Serverless::Function 335 | Properties: 336 | Description: Amazon Personalize monitor function that deletes a campaign in response to an event 337 | CodeUri: src/personalize_delete_campaign_function 338 | Handler: personalize_delete_campaign.lambda_handler 339 | Layers: 340 | - !Sub 'arn:${AWS::Partition}:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV2:24' 341 | - !Ref CommonLayer 342 | Policies: 343 | - Statement: 344 | - Sid: PersonalizePolicy 345 | Effect: Allow 346 | Action: 347 | - personalize:DeleteCampaign 348 | Resource: '*' 349 | - Sid: EventBridgePolicy 350 | Effect: Allow 351 | Action: 352 | - events:PutEvents 353 | Resource: '*' 354 | - Sid: CloudWatchFindAlarmsPolicy 355 | Effect: Allow 356 | Action: 357 | - cloudwatch:DescribeAlarms 358 | - cloudwatch:ListTagsForResource 359 | Resource: '*' 360 | - Sid: CloudWatchDeletePolicy 361 | Effect: Allow 362 | Action: 363 | - cloudwatch:DeleteAlarms 364 | Resource: !Sub 'arn:${AWS::Partition}:cloudwatch:*:${AWS::AccountId}:alarm:PersonalizeMonitor-*' 365 | Events: 366 | EBCustomRule: 367 | Type: EventBridgeRule 368 | Properties: 369 | Pattern: 370 | source: 371 | - personalize.monitor 372 | detail-type: 373 | - DeletePersonalizeCampaign 374 | 375 | StopRecommenderFunction: 376 | Type: AWS::Serverless::Function 377 | Properties: 378 | Description: Amazon Personalize monitor function that stops a recommender in response to an event 379 | CodeUri: src/personalize_stop_recommender_function 380 | Handler: personalize_stop_recommender.lambda_handler 381 | Layers: 382 | - !Sub 'arn:${AWS::Partition}:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV2:24' 383 | - !Ref CommonLayer 384 | Policies: 385 | - Statement: 386 | - Sid: PersonalizePolicy 387 | Effect: Allow 388 | Action: 389 | - personalize:StopRecommender 390 | Resource: '*' 391 | - Sid: EventBridgePolicy 392 | Effect: Allow 393 | Action: 394 | - events:PutEvents 395 | Resource: '*' 396 | - Sid: CloudWatchFindAlarmsPolicy 397 | Effect: Allow 398 | Action: 399 | - cloudwatch:DescribeAlarms 400 | - cloudwatch:ListTagsForResource 401 | Resource: '*' 402 | - Sid: CloudWatchDeletePolicy 403 | Effect: Allow 404 | Action: 405 | - cloudwatch:DeleteAlarms 406 | Resource: !Sub 'arn:${AWS::Partition}:cloudwatch:*:${AWS::AccountId}:alarm:PersonalizeMonitor-*' 407 | Events: 408 | EBCustomRule: 409 | Type: EventBridgeRule 410 | Properties: 411 | Pattern: 412 | source: 413 | - personalize.monitor 414 | detail-type: 415 | - StopPersonalizeRecommender 416 | 417 | CleanupFunction: 418 | Type: AWS::Serverless::Function 419 | Properties: 420 | Description: Amazon Personalize monitor custom resource function that cleans up directly created resources when the application is deleted 421 | Timeout: 15 422 | CodeUri: src/cleanup_resources_function 423 | Handler: cleanup_resources.lambda_handler 424 | AutoPublishAlias: live 425 | Layers: 426 | - !Sub 'arn:${AWS::Partition}:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV2:24' 427 | - !Ref CommonLayer 428 | Policies: 429 | - Statement: 430 | - Sid: PersonalizePolicy 431 | Effect: Allow 432 | Action: 433 | - personalize:ListCampaigns 434 | - personalize:ListRecommenders 435 | Resource: '*' 436 | - Sid: CloudWatchFindAlarmsPolicy 437 | Effect: Allow 438 | Action: 439 | - cloudwatch:DescribeAlarms 440 | - cloudwatch:ListTagsForResource 441 | Resource: '*' 442 | - Sid: CloudWatchDeletePolicy 443 | Effect: Allow 444 | Action: 445 | - cloudwatch:DeleteAlarms 446 | Resource: !Sub 'arn:${AWS::Partition}:cloudwatch:*:${AWS::AccountId}:alarm:PersonalizeMonitor-*' 447 | - Sid: EventBridgePolicy 448 | Effect: Allow 449 | Action: 450 | - events:DeleteRule 451 | - events:RemoveTargets 452 | Resource: !Sub 'arn:${AWS::Partition}:events:*:${AWS::AccountId}:rule/PersonalizeMonitor-NotificationsRule' 453 | - Sid: SnsPolicy 454 | Effect: Allow 455 | Action: 456 | - sns:DeleteTopic 457 | Resource: !Sub 'arn:${AWS::Partition}:sns:*:${AWS::AccountId}:PersonalizeMonitorNotifications' 458 | Environment: 459 | Variables: 460 | CampaignARNs: !Ref CampaignARNs 461 | RecommenderARNs: !Ref RecommenderARNs 462 | Regions: !Ref Regions 463 | 464 | CleanupCustomResource: 465 | Type: Custom::Cleanup 466 | Properties: 467 | ServiceToken: !GetAtt CleanupFunction.Arn 468 | CampaignARNs: !Ref CampaignARNs 469 | RecommenderARNs: !Ref RecommenderARNs 470 | Regions: !Ref Regions 471 | AutoCreateUtilizationAlarms: !Ref AutoCreateUtilizationAlarms 472 | UtilizationThresholdAlarmLowerBound: !Ref UtilizationThresholdAlarmLowerBound 473 | AutoCreateIdleAlarms: !Ref AutoCreateIdleAlarms 474 | IdleThresholdHours: !Ref IdleThresholdHours 475 | AutoDeleteOrStopIdleResources: !Ref AutoDeleteOrStopIdleResources 476 | AutoAdjustMinTPS: !Ref AutoAdjustMinTPS 477 | 478 | Outputs: 479 | MonitorFunction: 480 | Description: "Personalize monitor Function ARN" 481 | Value: !GetAtt MonitorFunction.Arn 482 | 483 | DashboardManagementFunction: 484 | Description: "CloudWatch Dashboard Management Function ARN" 485 | Value: !GetAtt DashboardManagementFunction.Arn 486 | 487 | UpdateTPSFunction: 488 | Description: "Update Personalize Campaign/Recommender TPS Function ARN" 489 | Value: !GetAtt UpdateTPSFunction.Arn 490 | 491 | DeleteCampaignFunction: 492 | Description: "Delete Personalize Campaign Function ARN" 493 | Value: !GetAtt DeleteCampaignFunction.Arn 494 | 495 | StopRecommenderFunction: 496 | Description: "Stop Personalize Recommender Function ARN" 497 | Value: !GetAtt StopRecommenderFunction.Arn 498 | --------------------------------------------------------------------------------