├── .github
    └── PULL_REQUEST_TEMPLATE.md
├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README-SAR.md
├── README.md
├── images
    ├── personalize-monitor-architecture.png
    ├── personalize-monitor-cloudwatch-alarms.png
    ├── personalize-monitor-cloudwatch-dashboard.png
    └── personalize-monitor-cloudwatch-metrics.png
├── samconfig.toml
├── sar-publish.sh
├── src
    ├── cleanup_resources_function
    │   ├── README.md
    │   ├── __init__.py
    │   ├── cleanup_resources.py
    │   └── requirements.txt
    ├── dashboard_mgmt_function
    │   ├── README.md
    │   ├── __init__.py
    │   ├── dashboard-template.mustache
    │   ├── dashboard_mgmt.py
    │   └── requirements.txt
    ├── layer
    │   ├── README.md
    │   ├── __init__.py
    │   ├── common.py
    │   └── requirements.txt
    ├── personalize_delete_campaign_function
    │   ├── README.md
    │   ├── __init__.py
    │   ├── personalize_delete_campaign.py
    │   └── requirements.txt
    ├── personalize_monitor_function
    │   ├── README.md
    │   ├── __init__.py
    │   ├── personalize_monitor.py
    │   └── requirements.txt
    ├── personalize_stop_recommender_function
    │   ├── README.md
    │   ├── __init__.py
    │   ├── personalize_stop_recommender.py
    │   └── requirements.txt
    └── personalize_update_tps_function
    │   ├── README.md
    │   ├── __init__.py
    │   ├── personalize_update_tps.py
    │   └── requirements.txt
└── template.yaml


/.github/PULL_REQUEST_TEMPLATE.md:
--------------------------------------------------------------------------------
1 | *Issue #, if available:*
2 | 
3 | *Description of changes:*
4 | 
5 | 
6 | By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
7 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__/
2 | .DS_Store
3 | .vscode
4 | .aws-sam
5 | env


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing Guidelines
 2 | 
 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
 4 | documentation, we greatly value feedback and contributions from our community.
 5 | 
 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
 7 | information to effectively respond to your bug report or contribution.
 8 | 
 9 | 
10 | ## Reporting Bugs/Feature Requests
11 | 
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 | 
14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 | 
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 | 
22 | 
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 | 
26 | 1. You are working against the latest source on the *main* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 | 
30 | To send us a pull request, please:
31 | 
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 | 
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 | 
42 | 
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
45 | 
46 | 
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 | 
52 | 
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 | 
56 | 
57 | ## Licensing
58 | 
59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 4 | this software and associated documentation files (the "Software"), to deal in
 5 | the Software without restriction, including without limitation the rights to
 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
 7 | the Software, and to permit persons to whom the Software is furnished to do so.
 8 | 
 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
15 | 
16 | 


--------------------------------------------------------------------------------
/README-SAR.md:
--------------------------------------------------------------------------------
  1 | # Amazon Personalize Monitor
  2 | 
  3 | This project contains the source code and supporting files for deploying a serverless application that adds monitoring, alerting, and optimzation capabilities for [Amazon Personalize](https://aws.amazon.com/personalize/), an AI service from AWS that allows you to create custom ML recommenders based on your data. Highlights include:
  4 | 
  5 | - Generation of additional [CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) to track the average RPS and `minRecommendationRequestsPerSecond` for [recommenders](https://docs.aws.amazon.com/personalize/latest/dg/creating-recommenders.html), average TPS and `minProvisionedTPS` for [campaigns](https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html), and utilization of recommenders and campaigns over time.
  6 | - [CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) to alert you via SNS/email when recommender or campaign utilization drops below a configurable threshold or has been idle for a configurable length of time (optional).
  7 | - [CloudWatch dashboard](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html) populated with graph widgets for average (actual) vs provisioned TPS/RPS, recommender and campaign utilization, recommender and campaign latency, and the number of recommenders and campaigns being monitored.
  8 | - Capable of monitoring campaigns and recommenders across multiple regions in the same AWS account.
  9 | - Automatically [stop recommenders](https://docs.aws.amazon.com/personalize/latest/dg/stopping-starting-recommender.html) and delete campaigns that have been idle more than a configurable number of hours (optional).
 10 | - Automatically reduce the `minRecommendationRequestsPerSecond` for over-provisioned recommenders and `minProvisionedTPS` for over-provisioned campaigns to optimize cost (optional).
 11 | 
 12 | ## Why is this important?
 13 | 
 14 | Before you can retrieve real-time recommendations from Amazon Personalize, you must create a [recommender](https://docs.aws.amazon.com/personalize/latest/dg/creating-recommenders.html) or [campaign](https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html). Often times multiple recommenders and/or campaigns are needed to provide recommendations targeting different use cases for an pplication such as user-personalization, related items, and personalized ranking. Recommenders and campaigns represent resources that are auto-scaled by Personalize to meet the demand from requests from your application. This is typically how Personalize is integrated into your applications. When an application needs to display personalized recommendations to a user, a [GetRecommendations](https://docs.aws.amazon.com/personalize/latest/dg/getting-real-time-recommendations.html#recommendations) or [GetPersonalizedRanking](https://docs.aws.amazon.com/personalize/latest/dg/getting-real-time-recommendations.html#rankings) API call is made to a recommender or campaign to retrieve recommendations. Just like monitoring your own application components is important, monitoring your Personalize recommenders and campaigns is also important and considered a best practice. This application is designed to help you do just that.
 15 | 
 16 | When you provision a recommender using the [CreateRecommender](https://docs.aws.amazon.com/personalize/latest/dg/API_CreateRecommender.html) API or a campaign using the [CreateCampaign](https://docs.aws.amazon.com/personalize/latest/dg/API_CreateCampaign.html) API, you can optionally specify a value for `minRecommendationRequestsPerSecond` and `minProvisionedTPS`, respectively. This value specifies the requested _minimum_ requests/transactions (calls) per second that Amazon Personalize will support for that recommender or campaign. As your actual request volume to a recommender or campaign approaches its `minRecommendationRequestsPerSecond` or `minProvisionedTPS`, Personalize will automatically provision additional resources to support your request volume. Then when request volume drops, Personalize will automatically scale back down **no lower** than `minRecommendationRequestsPerSecond` or `minProvisionedTPS`. **Since you are billed based on the higher of actual TPS and `minRecommendationRequestsPerSecond`/`minProvisionedTPS`, it is therefore important to not over-provision your recommenders or campaigns to optimize cost.** This also means that leaving a recommender or campaign idle (active but no longer in-use) will result in unnecessary charges. This application gives you the tools to visualize your recommender and campaign utilization, to be notified when there is an opportunity to tune your recommender or campaign provisioning, and even take action to reduce and eliminate over-provisioning.
 17 | 
 18 | > General best practice is to set `minRecommendationRequestsPerSecond` and `minProvisionedTPS` to `1`, or your low watermark for recommendations requests, and let Personalize auto-scale recommender or campaign resources to meet actual demand.
 19 | 
 20 | See the Amazon Personalize [pricing page](https://aws.amazon.com/personalize/pricing/) for full details on costs.
 21 | 
 22 | ### CloudWatch Dashboard
 23 | 
 24 | When you deploy this project, a [CloudWatch dashboard](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html) is built with widgets for Actual vs Provisioned TPS/RPS, recommender/campaign utilization, and recommender/campaign latency for the recommenders and campaigns you wish to monitor. The dashboard gives you critical visual information to assess how your recommenders and campaigns are performing and being utilized. The data in these graphs can help you properly tune your recommender's `minRecommendationRequestsPerSecond` and campaign's `minProvisionedTPS`.
 25 | 
 26 | ![Personalize Monitor CloudWatch Dashboard](https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/master/images/personalize-monitor-cloudwatch-dashboard.png)
 27 | 
 28 | For more details on the CloudWatch dashboard created and maintained by this application, see the [dashboard_mgmt](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/dashboard_mgmt_function/) function page.
 29 | 
 30 | ### CloudWatch Alarms
 31 | 
 32 | At deployment time, you can optionally have this application automatically create [CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) that will alert you when a monitored recommender's or campaign's utilization drops below a threshold you define for nine out of twelve evaluation periods. Since the intervals are 5 minutes, that means that nine of the 5 minute evaluations over a 1 hour span must be below the threshold to enter an alarm status. The same rule applies to transition from alarm to OK status. Similarly, the idle recommender/campaign alarm will alert you when there has been no request activity for a recommender/campaign for a configurable amount of time. The alarms will be setup to alert you via email through an SNS topic in each region where resources are monitored. Once the alarms are setup, you can alternatively link them to any operations and messaging tools you already use (i.e. Slack, PagerDuty, etc).
 33 | 
 34 | ![Personalize Monitor CloudWatch Alarms](https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/master/images/personalize-monitor-cloudwatch-alarms.png)
 35 | 
 36 | For more details on the CloudWatch alarms created by this application, see the [personalize_monitor](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/personalize_monitor_function/) function page.
 37 | 
 38 | ### CloudWatch Metrics
 39 | 
 40 | To support the CloudWatch dashboard and alarms described above, a few new custom [CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) are added for the monitored recommenders and campaigns. These metrics are populated by the [personalize_monitor](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/personalize_monitor_function/) Lambda function that is setup to run every 5 minutes in your account. You can find these metrics in CloudWatch under Metrics in the "PersonalizeMonitor" namespace.
 41 | 
 42 | ![Personalize Monitor CloudWatch Metrics](https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/master/images/personalize-monitor-cloudwatch-metrics.png)
 43 | 
 44 | For more details on the custom metrics created by this application, see the [personalize_monitor](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/personalize_monitor_function/) function page.
 45 | 
 46 | ### Cost optimization (optional)
 47 | 
 48 | This application can be optionally configured to automatically perform cost optimization actions for your Amazon Personalize recommenders and campaigns.
 49 | 
 50 | #### <a name='Idlecampaigns'></a>Idle recommenders/campaigns
 51 | Idle recommenders/campaigns are those that have been provisioned but are not receiving any `GetRecommendations`/`GetPersonalizedRanking` calls. Since costs are incurred while a recommender/campaign is active regardless of whether it receives any requests, detecting and eliminating these idle recommenders/campaigns can be an important cost optimization activity. This can be particularly useful in non-production AWS accounts such as development and testing where you are more likely to have abandoned recommenders/campaigns.
 52 | 
 53 | Note that this is where an important difference between recommenders and campaigns comes into play. Recommenders can be started and stopped to provision and de-provision the resources needed for real-time inference. When a recommender is stopped, the real-time inference resources are deleted (which pauses ongoing recommender charges) but the underlying model artifacts are preserved. This allows you to later start the recommender without having to train the model again. Campaigns, on the other hand, represent only the resources needed for real-time inference for a solution version. Therefore, you must delete a campaign to release the real-time resources and pause campaign charges. Since the solution version is not being deleted, model artifacts are preserved similar to when a recommender is stopped.
 54 | 
 55 | See the `AutoDeleteOrStopIdleResources` and `IdleThresholdHours` deployment parameters in the installation instructions below and the [personalize_monitor](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/personalize_monitor_function#automatically-deleting-idle-campaigns-optional) function for details.
 56 | 
 57 | #### <a name='Over-provisionedcampaigns'></a>Over-provisioned recommenders/campaigns
 58 | 
 59 | Properly provisioning recommenders and campaigns, as described earlier, is also an important cost optimization activity. This application can be configured to automatically reduce a recommender's `minRecommendationRequestsPerSecond` or a campaign's `minProvisionedTPS` based on actual request volume. This will optimize recommender/campaign utilization when request volume is lower while relying on Personalize to auto-scale based on actual activity. See the `AutoAdjustMinTPS` deployment parameter below and the [personalize_monitor](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/personalize_monitor_function#automatically-adjusting-campaign-minprovisionedtps-optional) function for details.
 60 | 
 61 | ### Architecture
 62 | 
 63 | The following diagram depicts how the Lambda functions in this application work together using an event-driven approach built on [Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/what-is-amazon-eventbridge.html). The [personalize_monitor](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/personalize_monitor_function/) function is invoked every five minutes to generate CloudWatch metric data based on the monitored recommenders/campaigns and create alarms (if configured). It also generates events which are published to EventBridge that trigger activities such as optimizing `minRecommendationRequestsPerSecond`/`minProvisionedTPS`, stopping idle recommenders, deleting idle campaigns, updating the Personalize Monitor CloudWatch dashboard, and sending notifications. This approach allows you to more easily integrate these functions into your own operations by sending your own events, say, to trigger the dashboard to be rebuilt after you create a campaign or register your own targets to events generated by this application.
 64 | 
 65 | ![Personalize Monitor Architecture](https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/master/images/personalize-monitor-architecture.png)
 66 | 
 67 | See the readme pages for each function for details on the events that they produce and consume.
 68 | 
 69 | ## Installing the application
 70 | 
 71 | ***IMPORTANT NOTE:** Deploying this application in your AWS account will create and consume AWS resources, which will cost money. For example, the CloudWatch dashboard, the Lambda function that collects additional monitoring metrics is run every 5 minutes, CloudWatch alarms, logging, and so on. Therefore, if after installing this application you choose not to use it as part of your monitoring strategy, be sure to follow the Uninstall instructions in the next section to avoid ongoing charges and to clean up all data.*
 72 | 
 73 | | Parameter | Description | Default |
 74 | | --- | --- | --- |
 75 | | CampaignARNs | Comma separated list of Personalize campaign ARNs to monitor or `all` to monitor all active campaigns. It is recommended to use `all` so that any new campaigns that are added after deployment will be automatically detected, monitored, and have alarms created (optional) | `all` |
 76 | | RecommenderARNs | Comma separated list of Personalize recommender ARNs to monitor or `all` to monitor all active recommenders. It is recommended to use `all` so that any new recommenders that are added after deployment will be automatically detected, monitored, and have alarms created (optional) | `all` |
 77 | | Regions | Comma separated list of AWS regions to monitor recommenders/campaigns. Only applicable when `all` is used for `CampaignARNs` or `RecommenderARNs`. Leaving this value blank will default to the region where this application is deployed (i.e. `AWS Region` parameter above). | |
 78 | | AutoCreateUtilizationAlarms | Whether to automatically create a utilization CloudWatch alarm for each monitored recommender or campaign. | `Yes` |
 79 | | UtilizationThresholdAlarmLowerBound | Minimum threshold value (in percent) to enter alarm state for recommender/campaign utilization. This value is only relevant if `AutoCreateAlarms` is `Yes`. | `100` |
 80 | | AutoAdjustMinTPS | Whether to automatically compare recommender/campaign request activity against the configured `minRecommendationRequestsPerSecond`/`minProvisionedTPS` to determine if `minRecommendationRequestsPerSecond`/`minProvisionedTPS` can be reduced to optimize utilization. | `Yes` |
 81 | | AutoCreateIdleAlarms | Whether to automatically create a idle detection CloudWatch alarm for each monitored recommender/campaign. | `Yes` |
 82 | | IdleThresholdHours | Number of hours that a recommender/campaign must be idle (i.e. no requests) before it is automatically stopped (recommender) or deleted (campaign). `AutoDeleteOrStopIdleResources` must be `Yes` for idle recommender stop or campaign deletion to occur. | `24` |
 83 | | AutoDeleteOrStopIdleResources | Whether to automatically stop idle recommenders or delete idle campaigns. An idle recommender/campaign is one that has not had any requests in `IdleThresholdHours` hours. | `No` |
 84 | | NotificationEndpoint | Email address to receive alarm and ok notifications, recommender stop/update, campaign delete/update events (optional). An [SNS](https://aws.amazon.com/sns/) topic is created in each region where resources are monitored and this email address will be added as a subscriber to the topic(s). You will receive a confirmation email for the SNS topic subscription in each region so be sure to click the confirmation link in that email to ensure you receive notifications. | |
 85 | 
 86 | ## Uninstalling the application
 87 | 
 88 | To remove the resources created by this application in your AWS account, be sure to uninstall the application.
 89 | 
 90 | ## FAQs
 91 | 
 92 | ***Q: Can I use this application to determine my accumulated inference charges during the month?***
 93 | 
 94 | ***A:*** No! Although the `averageRPS`/`averageTPS` and `minRecommendationRequestsPerSecond`/`minProvisionedTPS` custom metrics generated by this application may be used to calculate an approximation of your accumulated inference charges, they should not be used as a substitute or proxy for actual Personalize inference costs. Always consult your AWS Billing Dashboard for actual service charges.
 95 | 
 96 | ***Q: What is an ideal recommender/campaign utilization percentage? Is it okay if my recommender/campaign utilization is over 100%?***
 97 | 
 98 | ***A:*** The recommender/campaign utilization metric is a measure of your actual usage compared against the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` for the recommender/campaign. Any utilization value >= 100% is ideal since that means you are not over-provisioning, and therefore not over-paying, for resources. You're letting Personalize handle the scaling in/out of the recommender/campaign. Anytime your utilization is below 100%, more resources are provisioned than are needed to satisfy the volume of requests at that time.
 99 | 
100 | ***Q: How can I tell if Personalize is scaling out fast enough?***
101 | 
102 | ***A:*** Compare the "Actual vs Provisioned RPS/TPS" graph to the "Recommender/Campaign Latency" graph on the Personalize Monitor CloudWatch dashboard. When your Actual RPS/TPS increases/spikes, does the latency for the same recommender/campaign at the same time stay consistent? If so, this tells you that Personalize is maintaining response time as request volume increases and therefore scaling fast enough to meet demand. However, if latency increases significantly and to an unacceptable level for your application, this is an indication that Personalize may not be scaling fast enough to meet your traffic patterns. See the answer to the following question for some options.
103 | 
104 | ***Q: My workload is very spiky and Personalize is not scaling fast enough. What can I do?***
105 | 
106 | ***A:*** First, be sure to confirm that it is Personalize that is not scaling fast enough by reviewing the answer above. If the spikes are predictable or cyclical, you can pre-warm capacity in your recommender/campaign ahead of time by adjusting the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` using the [UpdateRecommender](https://docs.aws.amazon.com/personalize/latest/dg/API_UpdateRecommender.html) or [UpdateCampaign](https://docs.aws.amazon.com/personalize/latest/dg/API_UpdateCampaign.html) API and then dropping it back down after the traffic subsides. For example, increase capacity 30 minutes before a flash sale or marketing campaign is launched that brings a temporary surge in traffic. This can be done manually using the AWS console or automated by using [CloudWatch events](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/WhatIsCloudWatchEvents.html) based on a schedule or triggered based on an event in your application. The [personalize_update_tps](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/personalize_update_campaign_tps_function/) function that is deployed with this application can be used as the target for CloudWatch events or you can publish an `UpdatePersonalizeRecommenderMinRecommendationRPS` or `UpdatePersonalizeCampaignMinProvisionedTPS` event to EventBridge. If spikes in your workload are not predictable or known ahead of time, determining the optimal `minRecommendationRequestsPerSecond`/`minProvisionedTPS` to balance consistent latency vs cost is the best option. The metrics and dashboard graphs in this application can help you determine this value.
107 | 
108 | ***Q: After deploying this application in my AWS account, I created some new Personalize recommenders or campaigns that I also want to monitor. How can I add them to be monitored and have them appear on my dashboard? Also, what about monitoried recommenders or campaigns that I delete?***
109 | 
110 | ***A:*** If you specified `all` for the `RecommenderARNs` or `CampaignARNs` deployment parameter (see installation instructions above), any new recommenders/campaigns you create will be automatically monitored and alarms created (if `AutoCreateAlarms` was set to `Yes`) when the recommenders/campaigns become active. Likewise, any recommenders/campaigns that are deleted will no longer be monitored. If you want this application to monitor recommenders/campaigns across multiple regions, be sure to specify the region names in the `Regions` deployment parameter. Note that this only applies when `RecommenderARNs` or `CampaignARNs` is set to `all`. The CloudWatch dashboard will be automatically rebuilt ever hour to add new recommenders and campaigns and drop deleted recommenders and campaigns. You can also trigger the dashboard to be rebuilt by publishing a `BuildPersonalizeMonitorDashboard` event to the default EventBridge event bus (see [dashboard_mgmt_function](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/dashboard_mgmt_function/)).
111 | 
112 | ## Reporting issues
113 | 
114 | If you encounter a bug, please create a new issue with as much detail as possible and steps for reproducing the bug. Similarly, if you have an idea for an improvement, please add an issue as well. Pull requests are also welcome! See the [Contributing Guidelines](https://github.com/aws-samples/amazon-personalize-monitor/tree/master/src/CONTRIBUTING.md) for more details.
115 | 
116 | ## License summary
117 | 
118 | This sample code is made available under a modified MIT license. See the LICENSE file.
119 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Amazon Personalize Monitor
  2 | 
  3 | <!-- vscode-markdown-toc -->
  4 | * [Why is this important?](#Whyisthisimportant)
  5 | * [Features](#Features)
  6 | 	* [CloudWatch dashboard](#CloudWatchdashboard)
  7 | 	* [CloudWatch alarms](#CloudWatchalarms)
  8 | 	* [CloudWatch metrics](#CloudWatchmetrics)
  9 | 	* [Cost optimization (optional)](#Costoptimizationoptional)
 10 | 		* [Idle campaigns](#Idlecampaigns)
 11 | 		* [Over-provisioned campaigns](#Over-provisionedcampaigns)
 12 | * [Architecture](#Architecture)
 13 | * [Installing the application](#Installingtheapplication)
 14 | 	* [Option 1 - Install from Serverless Application Repository](#Option1-InstallfromServerlessApplicationRepository)
 15 | 	* [Option 2 - Install using Serverless Application Model](#Option2-InstallusingServerlessApplicationModel)
 16 | 	* [Application settings/parameters](#Applicationsettingsparameters)
 17 | * [Uninstalling the application](#Uninstallingtheapplication)
 18 | * [FAQs](#FAQs)
 19 | * [Reporting issues](#Reportingissues)
 20 | * [License summary](#Licensesummary)
 21 | 
 22 | <!-- vscode-markdown-toc-config
 23 | 	numbering=false
 24 | 	autoSave=true
 25 | 	/vscode-markdown-toc-config -->
 26 | <!-- /vscode-markdown-toc -->
 27 | 
 28 | This project contains the source code and supporting files for deploying a serverless application that adds monitoring, alerting, and optimzation capabilities for [Amazon Personalize](https://aws.amazon.com/personalize/), an AI service from AWS that allows you to create custom ML recommenders based on your data. Highlights include:
 29 | 
 30 | - Generation of additional [CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) to track the average RPS and `minRecommendationRequestsPerSecond` for [recommenders](https://docs.aws.amazon.com/personalize/latest/dg/creating-recommenders.html), average TPS and `minProvisionedTPS` for [campaigns](https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html), and utilization of recommenders and campaigns over time.
 31 | - [CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) to alert you via SNS/email when recommender or campaign utilization drops below a configurable threshold or has been idle for a configurable length of time (optional).
 32 | - [CloudWatch dashboard](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html) populated with graph widgets for average (actual) vs provisioned TPS/RPS, recommender and campaign utilization, recommender and campaign latency, and the number of recommenders and campaigns being monitored.
 33 | - Capable of monitoring campaigns and recommenders across multiple regions in the same AWS account.
 34 | - Automatically [stop recommenders](https://docs.aws.amazon.com/personalize/latest/dg/stopping-starting-recommender.html) and delete campaigns that have been idle more than a configurable number of hours (optional).
 35 | - Automatically reduce the `minRecommendationRequestsPerSecond` for over-provisioned recommenders and `minProvisionedTPS` for over-provisioned campaigns to optimize cost (optional).
 36 | 
 37 | ## <a name='Whyisthisimportant'></a>Why is this important?
 38 | 
 39 | Before you can retrieve real-time recommendations from Amazon Personalize, you must create a [recommender](https://docs.aws.amazon.com/personalize/latest/dg/creating-recommenders.html) or [campaign](https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html). Often times multiple recommenders and/or campaigns are needed to provide recommendations targeting different use cases for an pplication such as user-personalization, related items, and personalized ranking. Recommenders and campaigns represent resources that are auto-scaled by Personalize to meet the demand from requests from your application. This is typically how Personalize is integrated into your applications. When an application needs to display personalized recommendations to a user, a [GetRecommendations](https://docs.aws.amazon.com/personalize/latest/dg/getting-real-time-recommendations.html#recommendations) or [GetPersonalizedRanking](https://docs.aws.amazon.com/personalize/latest/dg/getting-real-time-recommendations.html#rankings) API call is made to a recommender or campaign to retrieve recommendations. Just like monitoring your own application components is important, monitoring your Personalize recommenders and campaigns is also important and considered a best practice. This application is designed to help you do just that.
 40 | 
 41 | When you provision a recommender using the [CreateRecommender](https://docs.aws.amazon.com/personalize/latest/dg/API_CreateRecommender.html) API or a campaign using the [CreateCampaign](https://docs.aws.amazon.com/personalize/latest/dg/API_CreateCampaign.html) API, you can optionally specify a value for `minRecommendationRequestsPerSecond` and `minProvisionedTPS`, respectively. This value specifies the requested _minimum_ requests/transactions (calls) per second that Amazon Personalize will support for that recommender or campaign. As your actual request volume to a recommender or campaign approaches its `minRecommendationRequestsPerSecond` or `minProvisionedTPS`, Personalize will automatically provision additional resources to support your request volume. Then when request volume drops, Personalize will automatically scale back down **no lower** than `minRecommendationRequestsPerSecond` or `minProvisionedTPS`. **Since you are billed based on the higher of actual TPS and `minRecommendationRequestsPerSecond`/`minProvisionedTPS`, it is therefore important to not over-provision your recommenders or campaigns to optimize cost.** This also means that leaving a recommender or campaign idle (active but no longer in-use) will result in unnecessary charges. This application gives you the tools to visualize your recommender and campaign utilization, to be notified when there is an opportunity to tune your recommender or campaign provisioning, and even take action to reduce and eliminate over-provisioning.
 42 | 
 43 | > General best practice is to set `minRecommendationRequestsPerSecond` and `minProvisionedTPS` to `1`, or your low watermark for recommendations requests, and let Personalize auto-scale recommender or campaign resources to meet actual demand.
 44 | 
 45 | See the Amazon Personalize [pricing page](https://aws.amazon.com/personalize/pricing/) for full details on costs.
 46 | 
 47 | ## <a name='Features'></a>Features
 48 | 
 49 | ### <a name='CloudWatchdashboard'></a>CloudWatch dashboard
 50 | 
 51 | When you deploy this project, a [CloudWatch dashboard](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html) is built with widgets for Actual vs Provisioned TPS/RPS, recommender/campaign utilization, and recommender/campaign latency for the recommenders and campaigns you wish to monitor. The dashboard gives you critical visual information to assess how your recommenders and campaigns are performing and being utilized. The data in these graphs can help you properly tune your recommender's `minRecommendationRequestsPerSecond` and campaign's `minProvisionedTPS`.
 52 | 
 53 | ![Personalize Monitor CloudWatch Dashboard](./images/personalize-monitor-cloudwatch-dashboard.png)
 54 | 
 55 | For more details on the CloudWatch dashboard created and maintained by this application, see the [dashboard_mgmt](./src/dashboard_mgmt_function/) function page.
 56 | 
 57 | ### <a name='CloudWatchalarms'></a>CloudWatch alarms
 58 | 
 59 | At deployment time, you can optionally have this application automatically create [CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) that will alert you when a monitored recommender's or campaign's utilization drops below a threshold you define for nine out of twelve evaluation periods. Since the intervals are 5 minutes, that means that nine of the 5 minute evaluations over a 1 hour span must be below the threshold to enter an alarm status. The same rule applies to transition from alarm to OK status. Similarly, the idle recommender/campaign alarm will alert you when there has been no request activity for a recommender/campaign for a configurable amount of time. The alarms will be setup to alert you via email through an SNS topic in each region where resources are monitored. Once the alarms are setup, you can alternatively link them to any operations and messaging tools you already use (i.e. Slack, PagerDuty, etc).
 60 | 
 61 | ![Personalize Monitor CloudWatch Alarms](./images/personalize-monitor-cloudwatch-alarms.png)
 62 | 
 63 | For more details on the CloudWatch alarms created by this application, see the [personalize_monitor](./src/personalize_monitor_function/) function page.
 64 | 
 65 | ### <a name='CloudWatchmetrics'></a>CloudWatch metrics
 66 | 
 67 | To support the CloudWatch dashboard and alarms described above, a few new custom [CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) are added for the monitored recommenders and campaigns. These metrics are populated by the [personalize_monitor](./src/personalize_monitor_function/) Lambda function that is setup to run every 5 minutes in your account. You can find these metrics in CloudWatch under Metrics in the "PersonalizeMonitor" namespace.
 68 | 
 69 | ![Personalize Monitor CloudWatch Metrics](./images/personalize-monitor-cloudwatch-metrics.png)
 70 | 
 71 | For more details on the custom metrics created by this application, see the [personalize_monitor](./src/personalize_monitor_function/) function page.
 72 | 
 73 | ### <a name='Costoptimizationoptional'></a>Cost optimization (optional)
 74 | 
 75 | This application can be optionally configured to automatically perform cost optimization actions for your Amazon Personalize recommenders and campaigns.
 76 | 
 77 | #### <a name='Idlecampaigns'></a>Idle recommenders/campaigns
 78 | Idle recommenders/campaigns are those that have been provisioned but are not receiving any `GetRecommendations`/`GetPersonalizedRanking` calls. Since costs are incurred while a recommender/campaign is active regardless of whether it receives any requests, detecting and eliminating these idle recommenders/campaigns can be an important cost optimization activity. This can be particularly useful in non-production AWS accounts such as development and testing where you are more likely to have abandoned recommenders/campaigns.
 79 | 
 80 | Note that this is where an important difference between recommenders and campaigns comes into play. Recommenders can be started and stopped to provision and de-provision the resources needed for real-time inference. When a recommender is stopped, the real-time inference resources are deleted (which pauses ongoing recommender charges) but the underlying model artifacts are preserved. This allows you to later start the recommender without having to train the model again. Campaigns, on the other hand, represent only the resources needed for real-time inference for a solution version. Therefore, you must delete a campaign to release the real-time resources and pause campaign charges. Since the solution version is not being deleted, model artifacts are preserved similar to when a recommender is stopped.
 81 | 
 82 | See the `AutoDeleteOrStopIdleResources` and `IdleThresholdHours` deployment parameters in the installation instructions below and the [personalize_monitor](./src/personalize_monitor_function#automatically-deleting-idle-campaigns-optional) function for details.
 83 | 
 84 | #### <a name='Over-provisionedcampaigns'></a>Over-provisioned recommenders/campaigns
 85 | 
 86 | Properly provisioning recommenders and campaigns, as described earlier, is also an important cost optimization activity. This application can be configured to automatically reduce a recommender's `minRecommendationRequestsPerSecond` or a campaign's `minProvisionedTPS` based on actual request volume. This will optimize recommender/campaign utilization when request volume is lower while relying on Personalize to auto-scale based on actual activity. See the `AutoAdjustMinTPS` deployment parameter below and the [personalize_monitor](./src/personalize_monitor_function#automatically-adjusting-campaign-minprovisionedtps-optional) function for details.
 87 | 
 88 | ## <a name='Architecture'></a>Architecture
 89 | 
 90 | The following diagram depicts how the Lambda functions in this application work together using an event-driven approach built on [Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/what-is-amazon-eventbridge.html). The [personalize_monitor](./src/personalize_monitor_function/) function is invoked every five minutes to generate CloudWatch metric data based on the monitored recommenders/campaigns and create alarms (if configured). It also generates events which are published to EventBridge that trigger activities such as optimizing `minRecommendationRequestsPerSecond`/`minProvisionedTPS`, stopping idle recommenders, deleting idle campaigns, updating the Personalize Monitor CloudWatch dashboard, and sending notifications. This approach allows you to more easily integrate these functions into your own operations by sending your own events, say, to trigger the dashboard to be rebuilt after you create a campaign or register your own targets to events generated by this application.
 91 | 
 92 | ![Personalize Monitor Architecture](./images/personalize-monitor-architecture.png)
 93 | 
 94 | See the readme pages for each function for details on the events that they produce and consume.
 95 | 
 96 | ## <a name='Installingtheapplication'></a>Installing the application
 97 | 
 98 | ***IMPORTANT NOTE:** Deploying this application in your AWS account will create and consume AWS resources, which will cost money. For example, the CloudWatch dashboard, the Lambda function that collects additional monitoring metrics is run every 5 minutes, CloudWatch alarms, logging, and so on. Therefore, if after installing this application you choose not to use it as part of your monitoring strategy, be sure to follow the Uninstall instructions below to clean up all resources and avoid ongoing charges.*
 99 | 
100 | ### <a name='Option1-InstallfromServerlessApplicationRepository'></a>Option 1 - Install from Serverless Application Repository
101 | 
102 | The easiest way to deploy this application is from the [Serverless Application Repository](https://aws.amazon.com/serverless/serverlessrepo/) (SAR).
103 | 
104 | 1. Within the AWS account where you wish to deploy the application, browse to the [application's page](https://serverlessrepo.aws.amazon.com/applications/arn:aws:serverlessrepo:us-east-1:316031960777:applications~Amazon-Personalize-Monitor) in the Serverless Application Repository and click **"Deploy"**.
105 | 2. Enter/update values in the **"Application settings"** panel (described below) and click **"Deploy"** again.
106 | 
107 | ### <a name='Option2-InstallusingServerlessApplicationModel'></a>Option 2 - Install using Serverless Application Model
108 | 
109 | If you'd rather install the application manually, you can use the AWS [Serverless Application Model](https://aws.amazon.com/serverless/sam/) (SAM) CLI to build and  deploy the application into your AWS account.
110 | 
111 | To use the SAM CLI, you need the following tools.
112 | 
113 | * SAM CLI - [Install the SAM CLI](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html)
114 | * [Python 3 installed](https://www.python.org/downloads/)
115 | * Docker - [Install Docker community edition](https://hub.docker.com/search/?type=edition&offering=community)
116 | 
117 | Then ensure you are logged in to `public.ecr.aws` in Docker so SAM can download the Docker build images by running the following command in your shell.
118 | 
119 | ```bash
120 | aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws
121 | ```
122 | 
123 | To build and deploy the application for the first time, run the following in your shell:
124 | 
125 | ```bash
126 | sam build --use-container --cached
127 | sam deploy --guided
128 | ```
129 | 
130 | The first command will build the source of the application. The second command will package and deploy the application to your AWS account with a series of prompts. The following section describes the supported application parameters.
131 | 
132 | ### <a name='Applicationsettingsparameters'></a>Application settings/parameters
133 | 
134 | Whether you install this application from SAR or SAM, the following parameters can be used to control how the application monitors your Personalize deployments.
135 | 
136 | | Prompt/Parameter | Description | Default |
137 | | --- | --- | --- |
138 | | Stack Name | The name of the stack to deploy to CloudFormation. This should be unique to your account and region. | `personalize-monitor` |
139 | | AWS Region | The AWS region you want to deploy this application to. Note that the CloudWatch metrics Lambda function in this application will still be able to monitor campaigns across multiple regions; you will be prompted for the region(s) to monitor below. | Your current region |
140 | | Parameter CampaignARNs | Comma separated list of Personalize campaign ARNs to monitor or `all` to monitor all active campaigns. It is recommended to use `all` so that any new campaigns that are added after deployment will be automatically detected, monitored, and have alarms created (optional) | `all` |
141 | | Parameter RecommenderARNs | Comma separated list of Personalize recommender ARNs to monitor or `all` to monitor all active recommenders. It is recommended to use `all` so that any new recommenders that are added after deployment will be automatically detected, monitored, and have alarms created (optional) | `all` |
142 | | Parameter Regions | Comma separated list of AWS regions to monitor recommenders/campaigns. Only applicable when `all` is used for `CampaignARNs` or `RecommenderARNs`. Leaving this value blank will default to the region where this application is deployed (i.e. `AWS Region` parameter above). | |
143 | | Parameter AutoCreateUtilizationAlarms | Whether to automatically create a utilization CloudWatch alarm for each monitored recommender or campaign. | `Yes` |
144 | | Parameter UtilizationThresholdAlarmLowerBound | Minimum threshold value (in percent) to enter alarm state for recommender/campaign utilization. This value is only relevant if `AutoCreateAlarms` is `Yes`. | `100` |
145 | | Parameter AutoAdjustMinTPS | Whether to automatically compare recommender/campaign request activity against the configured `minRecommendationRequestsPerSecond`/`minProvisionedTPS` to determine if `minRecommendationRequestsPerSecond`/`minProvisionedTPS` can be reduced to optimize utilization. | `Yes` |
146 | | Parameter AutoCreateIdleAlarms | Whether to automatically create a idle detection CloudWatch alarm for each monitored recommender/campaign. | `Yes` |
147 | | Parameter IdleThresholdHours | Number of hours that a recommender/campaign must be idle (i.e. no requests) before it is automatically stopped (recommender) or deleted (campaign). `AutoDeleteOrStopIdleResources` must be `Yes` for idle recommender stop or campaign deletion to occur. | `24` |
148 | | Parameter AutoDeleteOrStopIdleResources | Whether to automatically stop idle recommenders or delete idle campaigns. An idle recommender/campaign is one that has not had any requests in `IdleThresholdHours` hours. | `No` |
149 | | Parameter NotificationEndpoint | Email address to receive alarm and ok notifications, recommender stop/update, campaign delete/update events (optional). An [SNS](https://aws.amazon.com/sns/) topic is created in each region where resources are monitored and this email address will be added as a subscriber to the topic(s). You will receive a confirmation email for the SNS topic subscription in each region so be sure to click the confirmation link in that email to ensure you receive notifications. | |
150 | | Confirm changes before deploy | If set to yes, any CloudFormation change sets will be shown to you before execution for manual review. If set to no, the AWS SAM CLI will automatically deploy application changes. | |
151 | | Allow SAM CLI IAM role creation | Since this application creates IAM roles to allow the Lambda functions to access AWS services, this setting must be `Yes`. | |
152 | | Save arguments to samconfig.toml | If set to yes, your choices will be saved to a configuration file inside the application, so that in the future you can just re-run `sam deploy` without parameters to deploy changes to your application. | |
153 | 
154 | ## <a name='Uninstallingtheapplication'></a>Uninstalling the application
155 | 
156 | If you installed the application from the Serverless Application Repository, you can delete the application from the Lambda console in your AWS account (under Applications).
157 | 
158 | Alternatively, if you installed the application using SAM, you can delete the application using the AWS CLI. Assuming you used the default application name for the stack name (`personalize-monitor`), you can run the following:
159 | 
160 | ```bash
161 | aws cloudformation delete-stack --stack-name personalize-monitor
162 | ```
163 | 
164 | You can also delete the application stack in CloudFormation in the AWS console.
165 | 
166 | ## <a name='FAQs'></a>FAQs
167 | 
168 | ***Q: Can I use this application to determine my accumulated inference charges during the month?***
169 | 
170 | ***A:*** No! Although the `averageRPS`/`averageTPS` and `minRecommendationRequestsPerSecond`/`minProvisionedTPS` custom metrics generated by this application may be used to calculate an approximation of your accumulated inference charges, they should not be used as a substitute or proxy for actual Personalize inference costs. Always consult your AWS Billing Dashboard for actual service charges.
171 | 
172 | ***Q: What is an ideal recommender/campaign utilization percentage? Is it okay if my recommender/campaign utilization is over 100%?***
173 | 
174 | ***A:*** The recommender/campaign utilization metric is a measure of your actual usage compared against the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` for the recommender/campaign. Any utilization value >= 100% is ideal since that means you are not over-provisioning, and therefore not over-paying, for resources. You're letting Personalize handle the scaling in/out of the recommender/campaign. Anytime your utilization is below 100%, more resources are provisioned than are needed to satisfy the volume of requests at that time.
175 | 
176 | ***Q: How can I tell if Personalize is scaling out fast enough?***
177 | 
178 | ***A:*** Compare the "Actual vs Provisioned RPS/TPS" graph to the "Recommender/Campaign Latency" graph on the Personalize Monitor CloudWatch dashboard. When your Actual RPS/TPS increases/spikes, does the latency for the same recommender/campaign at the same time stay consistent? If so, this tells you that Personalize is maintaining response time as request volume increases and therefore scaling fast enough to meet demand. However, if latency increases significantly and to an unacceptable level for your application, this is an indication that Personalize may not be scaling fast enough to meet your traffic patterns. See the answer to the following question for some options.
179 | 
180 | ***Q: My workload is very spiky and Personalize is not scaling fast enough. What can I do?***
181 | 
182 | ***A:*** First, be sure to confirm that it is Personalize that is not scaling fast enough by reviewing the answer above. If the spikes are predictable or cyclical, you can pre-warm capacity in your recommender/campaign ahead of time by adjusting the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` using the [UpdateRecommender](https://docs.aws.amazon.com/personalize/latest/dg/API_UpdateRecommender.html) or [UpdateCampaign](https://docs.aws.amazon.com/personalize/latest/dg/API_UpdateCampaign.html) API and then dropping it back down after the traffic subsides. For example, increase capacity 30 minutes before a flash sale or marketing campaign is launched that brings a temporary surge in traffic. This can be done manually using the AWS console or automated by using [CloudWatch events](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/WhatIsCloudWatchEvents.html) based on a schedule or triggered based on an event in your application. The [personalize_update_tps](./src/personalize_update_tps_function/) function that is deployed with this application can be used as the target for CloudWatch events or you can publish an `UpdatePersonalizeRecommenderMinRecommendationRPS` or `UpdatePersonalizeCampaignMinProvisionedTPS` event to EventBridge. If spikes in your workload are not predictable or known ahead of time, determining the optimal `minRecommendationRequestsPerSecond`/`minProvisionedTPS` to balance consistent latency vs cost is the best option. The metrics and dashboard graphs in this application can help you determine this value.
183 | 
184 | ***Q: After deploying this application in my AWS account, I created some new Personalize recommenders or campaigns that I also want to monitor. How can I add them to be monitored and have them appear on my dashboard? Also, what about monitoried recommenders or campaigns that I delete?***
185 | 
186 | ***A:*** If you specified `all` for the `RecommenderARNs` or `CampaignARNs` deployment parameter (see installation instructions above), any new recommenders/campaigns you create will be automatically monitored and alarms created (if `AutoCreateAlarms` was set to `Yes`) when the recommenders/campaigns become active. Likewise, any recommenders/campaigns that are deleted will no longer be monitored. If you want this application to monitor recommenders/campaigns across multiple regions, be sure to specify the region names in the `Regions` deployment parameter. Note that this only applies when `RecommenderARNs` or `CampaignARNs` is set to `all`. The CloudWatch dashboard will be automatically rebuilt ever hour to add new recommenders and campaigns and drop deleted recommenders and campaigns. You can also trigger the dashboard to be rebuilt by publishing a `BuildPersonalizeMonitorDashboard` event to the default EventBridge event bus (see [dashboard_mgmt_function](./src/dashboard_mgmt_function/)).
187 | 
188 | If you want to change your deployment parameters that control what recommenders/campaigns are monitored, redeploy the application using the `--guided` parameter and follow the prompts.
189 | 
190 | **IMPORTANT: Redeploying this application will fully rebuild and replace your Personalize Monitor dashboard so any changes you made manually to the dashboard will be lost.**
191 | 
192 | ## <a name='Reportingissues'></a>Reporting issues
193 | 
194 | If you encounter a bug, please create a new issue with as much detail as possible and steps for reproducing the bug. Similarly, if you have an idea for an improvement, please add an issue as well. Pull requests are also welcome! See the [Contributing Guidelines](./CONTRIBUTING.md) for more details.
195 | 
196 | ## <a name='Licensesummary'></a>License summary
197 | 
198 | This sample code is made available under a modified MIT license. See the LICENSE file.
199 | 


--------------------------------------------------------------------------------
/images/personalize-monitor-architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/images/personalize-monitor-architecture.png


--------------------------------------------------------------------------------
/images/personalize-monitor-cloudwatch-alarms.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/images/personalize-monitor-cloudwatch-alarms.png


--------------------------------------------------------------------------------
/images/personalize-monitor-cloudwatch-dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/images/personalize-monitor-cloudwatch-dashboard.png


--------------------------------------------------------------------------------
/images/personalize-monitor-cloudwatch-metrics.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/images/personalize-monitor-cloudwatch-metrics.png


--------------------------------------------------------------------------------
/samconfig.toml:
--------------------------------------------------------------------------------
 1 | version = 0.1
 2 | [default]
 3 | [default.deploy]
 4 | [default.deploy.parameters]
 5 | stack_name = "personalize-monitor"
 6 | s3_prefix = "personalize-monitor"
 7 | parameter_overrides = "CampaignARNs=\"all\" RecommenderARNs=\"all\" AutoCreateUtilizationAlarms=\"Yes\" UtilizationThresholdAlarmLowerBound=\"100\" AutoAdjustMinTPS=\"Yes\" AutoCreateIdleAlarms=\"Yes\" IdleThresholdHours=\"24\" AutoDeleteOrStopIdleResources=\"No\""
 8 | capabilities = "CAPABILITY_IAM"
 9 | resolve_s3 = true
10 | image_repositories = []
11 | 


--------------------------------------------------------------------------------
/sar-publish.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Utility script to deploy application to the Serverless Application Repository.
 4 | 
 5 | set -e
 6 | 
 7 | # Bucket must have policy to allow SAR access.
 8 | # See https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-template-publishing-applications.html
 9 | BUCKET=$1
10 | REGION=$2
11 | 
12 | if [ "$BUCKET" == "" ] || [ "$REGION" == "" ]; then
13 |     echo "Usage: $0 BUCKET REGION"
14 |     echo "  where BUCKET is the S3 bucket to deploy packaged resources for SAR and REGION is the AWS region where to publish the application"
15 |     exit 1
16 | fi
17 | 
18 | echo "Building application"
19 | sam build --use-container --cached
20 | 
21 | cd .aws-sam/build
22 | echo "Packaging application"
23 | sam package --template-file template.yaml --output-template-file packaged.yaml --s3-bucket $BUCKET
24 | echo "Publishing application to the SAR"
25 | sam publish --template packaged.yaml --region $REGION
26 | cd -


--------------------------------------------------------------------------------
/src/cleanup_resources_function/README.md:
--------------------------------------------------------------------------------
1 | # Amazon Personalize Monitor - Cleanup Function
2 | 
3 | This Lambda function is called as a CloudFormation custom resource when the application is deleted/uninstalled so that resources created dynamically by the application, such as CloudWatch alarms and SNS topics, are also deleted.


--------------------------------------------------------------------------------
/src/cleanup_resources_function/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/src/cleanup_resources_function/__init__.py


--------------------------------------------------------------------------------
/src/cleanup_resources_function/cleanup_resources.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | """Cleans up resources created by this application outside of CloudFormation
 5 | 
 6 | This function is called as a CloudFormation custom resource.
 7 | """
 8 | 
 9 | import boto3
10 | 
11 | from crhelper import CfnResource
12 | from aws_lambda_powertools import Logger
13 | 
14 | from common import (
15 |     PROJECT_NAME,
16 |     ALARM_NAME_PREFIX,
17 |     SNS_TOPIC_NAME,
18 |     NOTIFICATIONS_RULE,
19 |     NOTIFICATIONS_RULE_TARGET_ID,
20 |     extract_region,
21 |     get_client,
22 |     determine_campaign_arns,
23 |     determine_recommender_arns
24 | )
25 | 
26 | logger = Logger()
27 | helper = CfnResource()
28 | 
29 | sts = boto3.client('sts')
30 | account_id = sts.get_caller_identity()['Account']
31 | 
32 | @helper.delete
33 | def delete_resources(event, _):
34 |     campaign_arns = determine_campaign_arns(event.get('ResourceProperties'))
35 |     recommender_arns = determine_recommender_arns(event.get('ResourceProperties'))
36 | 
37 |     logger.debug('Campaigns to check for resources to delete: %s', campaign_arns)
38 |     logger.debug('Recommenders to check for resources to delete: %s', recommender_arns)
39 | 
40 |     regions = set()
41 | 
42 |     for campaign_arn in campaign_arns:
43 |         regions.add(extract_region(campaign_arn))
44 | 
45 |     for recommender_arn in recommender_arns:
46 |         regions.add(extract_region(recommender_arn))
47 | 
48 |     logger.debug('Regions to check for resources to delete: %s', regions)
49 | 
50 |     alarms_deleted = 0
51 | 
52 |     for region in regions:
53 |         cw = get_client(service_name = 'cloudwatch', region_name = region)
54 | 
55 |         alarm_names_to_delete = set()
56 | 
57 |         alarms_paginator = cw.get_paginator('describe_alarms')
58 |         for alarms_page in alarms_paginator.paginate(AlarmNamePrefix = ALARM_NAME_PREFIX, AlarmTypes=['MetricAlarm']):
59 |             for alarm in alarms_page['MetricAlarms']:
60 |                 tags_response = cw.list_tags_for_resource(ResourceARN = alarm['AlarmArn'])
61 | 
62 |                 for tag in tags_response['Tags']:
63 |                     if tag['Key'] == 'CreatedBy' and tag['Value'] == PROJECT_NAME:
64 |                         alarm_names_to_delete.add(alarm['AlarmName'])
65 |                         break
66 | 
67 |         if alarm_names_to_delete:
68 |             # FUTURE: max check of 100
69 |             logger.info('Deleting CloudWatch alarms in %s for campaigns %s and recommenders %s: %s', region, campaign_arns, recommender_arns, alarm_names_to_delete)
70 |             cw.delete_alarms(AlarmNames=list(alarm_names_to_delete))
71 |             alarms_deleted += len(alarm_names_to_delete)
72 | 
73 |         events = get_client(service_name = 'events', region_name = region)
74 |         try:
75 |             logger.info('Removing targets from EventBridge notification rule %s for region %s', NOTIFICATIONS_RULE, region)
76 |             events.remove_targets(
77 |                 Rule = NOTIFICATIONS_RULE,
78 |                 Ids = [ NOTIFICATIONS_RULE_TARGET_ID ]
79 |             )
80 |         except events.exceptions.ResourceNotFoundException:
81 |             logger.warn('EventBridge notification rule targets not found')
82 | 
83 |         try:
84 |             logger.info('Deleting EventBridge notification rule %s for region %s', NOTIFICATIONS_RULE, region)
85 |             events.delete_rule(Name = NOTIFICATIONS_RULE)
86 |         except events.exceptions.ResourceNotFoundException:
87 |             logger.warn('EventBridge notification rule %s does not exist', NOTIFICATIONS_RULE)
88 | 
89 |         sns = get_client(service_name = 'sns', region_name = region)
90 |         topic_arn = f'arn:aws:sns:{region}:{account_id}:{SNS_TOPIC_NAME}'
91 |         logger.info('Deleting SNS topic %s', topic_arn)
92 |         # This API is idempotent so will not fail if topic does not exist
93 |         sns.delete_topic(TopicArn = topic_arn)
94 | 
95 |     logger.info('Deleted %d alarms', alarms_deleted)
96 | 
97 | @logger.inject_lambda_context(log_event=True)
98 | def lambda_handler(event, context):
99 |     helper(event, context)


--------------------------------------------------------------------------------
/src/cleanup_resources_function/requirements.txt:
--------------------------------------------------------------------------------
1 | # Note: AWS Lambda Power Tools dependency is satisfied by Lambda layer at runtime (part of deployment).
2 | crhelper==2.0.6
3 | 


--------------------------------------------------------------------------------
/src/dashboard_mgmt_function/README.md:
--------------------------------------------------------------------------------
 1 | # Amazon Personalize Monitor - CloudWatch Dashboard Create/Update/Delete Function
 2 | 
 3 | The [dashboard_mgmt.py](./dashboard_mgmt.py) Lambda function is responsible for creating, updating/refreshing, and deleting the [CloudWatch dashboard](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html) for this application. It is called in the following contexts:
 4 | 
 5 | - As part of the CloudFormation deployment process for this application as a [custom resource](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/template-custom-resources.html) (create, update, delete).
 6 | - In response to the `BuildPersonalizeMonitorDashboard` CloudWatch event being handled. This event is published to the default [Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/what-is-amazon-eventbridge.html) event bus when a monitored campaign is automatically deleted so that the dashboard can be rebuilt. An EventBridge rule is used to trigger this function to be invoked when the event is received.
 7 | - At the top of every hour, triggered by a scheduled CloudWatch event. This ensures that any campaigns that are created or deleted (outside of this application) that meet the monitoring criteria are added to the dashboard.
 8 | 
 9 |  The dashboard will include line graph widgets for actual vs provisioned TPS, recommender/campaign utilization, and recommender/campaign latency for the Personalize recommenders/campaigns you wish to monitor. Here is an example of a dashboard.
10 | 
11 | ![Personalize Monitor CloudWatch Dashboard](../../images/personalize-monitor-cloudwatch-dashboard.png)
12 | 
13 | ## How it works
14 | 
15 | The EventBridge event structure that triggers this function looks something like this:
16 | 
17 | ```javascript
18 | {
19 |     "source": "personalize.monitor",
20 |     "detail-type": "BuildPersonalizeMonitorDashboard",
21 |     "resources": [ CAMPAIGN_OR_RECOMMENDER_ARN_THAT_TRIGGERED ],
22 |     "detail": {
23 |         "Reason": DESCRIPTIVE_REASON_FOR_UPDATE
24 |     }
25 | }
26 | ```
27 | 
28 | This function can also be invoked directly as part of your own operational process. The `Reason` is optional and just used for logging.
29 | 
30 | ```javascript
31 | {
32 |     "Reason": DESCRIPTIVE_REASON_FOR_UPDATE
33 | }
34 | ```
35 | 
36 | ### Create/Update
37 | 
38 | When called as part of this application's create or update deployment process or as a result of the `BuildPersonalizeMonitorDashboard`, the function first determines what Personalize recommenders/campaigns should be monitored based on the CloudFormation template parameters you specify when you [installed](../README.md#installing-the-application) the application. The monitored recommenders/campaigns are grouped by [dataset group](https://docs.aws.amazon.com/personalize/latest/dg/data-prep-ds-group.html) and placed in a dictionary that is passed to the python [chevron](https://github.com/noahmorrison/chevron) library to render the [dashboard template](./dashboard-template.mustache) file. The template uses the [mustache templating language](http://mustache.github.io/) to build the widgets.
39 | 
40 | Once the template is rendered as dashboard source (JSON), the dashboard source is used to create or update the CloudWatch dashboard by calling the [PutDashboard API](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutDashboard.html).
41 | 
42 | Therefore, if you want to change what recommenders/campaigns are monitored, just re-deploy this application and the current dashboard will be overwritten with your recommender/campaign changes or wait for the dashboard to automatically update itself (subject to monitoring configuration). **This also means that any manual changes you make to the Personalize Monitor dashboard will be lost.** If you want to add your own widgets to the dashboard or change the existing widgets, you can fork this repository, change the [dashboard-template.mustache](./dashboard-template.mustache) template file, and deploy into your AWS account.
43 | 
44 | ### Delete
45 | 
46 | When the CloudFormation stack is deleted for this application, this function will delete the dashboard.
47 | 
48 | ## Calling from your own code
49 | 
50 | You can trigger the CloudWatch dashboard to be rebuilt by publishing the `BuildPersonalizeMonitorDashboard` detail-type from own code. Here is an example in python.
51 | 
52 | ```python
53 | import boto3
54 | import json
55 | 
56 | event_bridge = boto3.client('events')
57 | 
58 | event_bridge.put_events(
59 |     Entries=[
60 |         {
61 |             'Source': 'personalize.monitor',
62 |             'DetailType': 'BuildPersonalizeMonitorDashboard',
63 |             'Detail': json.dumps({
64 |                 'Reason': 'Rebuild the dashboard because I said so'
65 |             })
66 |         }
67 |     ]
68 | )
69 | ```


--------------------------------------------------------------------------------
/src/dashboard_mgmt_function/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/src/dashboard_mgmt_function/__init__.py


--------------------------------------------------------------------------------
/src/dashboard_mgmt_function/dashboard-template.mustache:
--------------------------------------------------------------------------------
  1 | {
  2 | 	"widgets": [{
  3 | 		"type": "metric",
  4 | 		"width": 4,
  5 | 		"height": 4,
  6 | 		"properties": {
  7 | 			"metrics": [
  8 | 				["{{namespace}}", "monitoredResourceCount"]
  9 | 			],
 10 | 			"view": "singleValue",
 11 | 			"region": "{{current_region}}",
 12 | 			"title": "Resources Monitored",
 13 | 			"stat": "Average",
 14 | 			"period": 300
 15 | 		}
 16 | 	},
 17 | 	{
 18 | 		"type": "text",
 19 | 		"width": 20,
 20 | 		"height": 4,
 21 | 		"properties": {
 22 | 			"markdown": "\n## Amazon Personalize Monitor Dashboard\n*This dashboard and its widgets are automatically managed by the [Personalize Monitor](https://github.com/aws-samples/amazon-personalize-monitor/) application. This is an open-source project. Please submit bugs/fixes/ideas [here](https://github.com/aws-samples/personalization-apis/issues).*\n\nFor best practices on integrating with and operating [Amazon Personalize](https://aws.amazon.com/personalize/), please see our [Cheat Sheet](https://github.com/aws-samples/amazon-personalize-samples/blob/master/PersonalizeCheatSheet2.0.md).\n\nAmazon Personalize resources: [Service Documentation](https://docs.aws.amazon.com/personalize/latest/dg/what-is-personalize.html) | [Personalize Blog](https://aws.amazon.com/blogs/machine-learning/category/artificial-intelligence/amazon-personalize/) | [Samples on GitHub](https://github.com/aws-samples/amazon-personalize-samples)\n"
 23 | 		}
 24 | 	}
 25 | 	{{#dataset_groups}}
 26 | 	,{
 27 | 		"type": "text",
 28 | 		"width": 24,
 29 | 		"height": 1,
 30 | 		"properties": {
 31 | 			"markdown": "\n### Dataset Group: **{{name}}** ({{region}}) | [Manage](https://console.aws.amazon.com/personalize/home?region={{region}}#arn:aws:personalize:{{region}}:{{account_id}}:dataset-group${{name}}/setup)\n"
 32 | 		}
 33 | 	},
 34 | 	{
 35 | 		"type": "metric",
 36 | 		"width": 8,
 37 | 		"height": 8,
 38 | 		"properties": {
 39 | 			"metrics": [
 40 | 				{{#inference_resources}}
 41 | 				["{{namespace}}", "{{resource_min_tps_name}}", "{{resource_arn_name}}", "{{inference_arn}}", {
 42 | 					"label": "{{name}} {{resource_min_tps_name}}"
 43 | 				}],
 44 | 				["{{namespace}}", "{{resource_avg_tps_name}}", "{{resource_arn_name}}", "{{inference_arn}}", {
 45 | 					"label": "{{name}} {{resource_avg_tps_name}}"
 46 | 				}]{{^last_resource}}, {{/last_resource}}
 47 | 				{{/inference_resources}}
 48 | 			],
 49 | 			"region": "{{region}}",
 50 | 			"view": "timeSeries",
 51 | 			"stacked": false,
 52 | 			"stat": "Average",
 53 | 			"period": 300,
 54 | 			"title": "Actual vs Provisioned TPS/RPS",
 55 | 			"yAxis": {
 56 | 				"left": {
 57 | 					"label": "TPS/RPS",
 58 | 					"min": 0,
 59 | 					"showUnits": false
 60 | 				},
 61 | 				"right": {
 62 | 					"showUnits": true,
 63 | 					"label": ""
 64 | 				}
 65 | 			},
 66 | 			"annotations": {
 67 | 				"horizontal": [{
 68 | 					"label": "Lowest TPS/RPS Allowed",
 69 | 					"value": 1
 70 | 				}]
 71 | 			}
 72 | 		}
 73 | 	},
 74 | 	{
 75 | 		"type": "metric",
 76 | 		"width": 8,
 77 | 		"height": 8,
 78 | 		"properties": {
 79 | 			"view": "timeSeries",
 80 | 			"stacked": false,
 81 | 			"metrics": [
 82 | 				{{#inference_resources}}
 83 | 				["{{namespace}}", "{{resource_utilization_name}}", "{{resource_arn_name}}", "{{inference_arn}}", {
 84 | 					"label": "{{name}} {{resource_utilization_name}}"
 85 | 				}]{{^last_resource}}, {{/last_resource}}
 86 | 				{{/inference_resources}}
 87 | 			],
 88 | 			"region": "{{region}}",
 89 | 			"title": "Campaign/Recommender Utilization"
 90 | 		}
 91 | 	},
 92 | 	{
 93 | 		"type": "metric",
 94 | 		"width": 8,
 95 | 		"height": 8,
 96 | 		"properties": {
 97 | 			"view": "timeSeries",
 98 | 			"stacked": false,
 99 | 			"metrics": [
100 | 				{{#inference_resources}}
101 | 				["AWS/Personalize", "{{latency_metric_name}}", "{{resource_arn_name}}", "{{inference_arn}}", {
102 | 					"label": "{{name}} {{latency_metric_name}}"
103 | 				}]{{^last_resource}}, {{/last_resource}}
104 | 				{{/inference_resources}}
105 | 			],
106 | 			"region": "{{region}}",
107 | 			"title": "Campaign/Recommender Latency"
108 | 		}
109 | 	}
110 | 	{{/dataset_groups}}
111 | 	]
112 | }


--------------------------------------------------------------------------------
/src/dashboard_mgmt_function/dashboard_mgmt.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | """Manages create/update/delete of the Personalize Monitor CloudWatch dashboard
  5 | 
  6 | This function is called two ways:
  7 | 
  8 | 1. From CloudFormation when the application is deployed, updated, or deleted in an AWS
  9 | account. When the resource is created, this function will create the Personalize
 10 | Monitor Dashboard in CloudWatch populated with widgets for monitoring Personalize
 11 | resources configured as deployment parameters.
 12 | 
 13 | When this resource is updated (i.e. redeployed), the dashboard will be rebuilt and
 14 | updated/replaced.
 15 | 
 16 | When this resource is deleted, this function will delete the CloudWatch Dashboard.
 17 | 
 18 | 2. As the target of an EventBridge rule that signals that the dashboard should be
 19 | rebuilt as a result of an event occurring. The event could be after a campaign has
 20 | been deleted and therefore a good point to rebuild the dashboard. It could also
 21 | be setup to periodically rebuild the dashboard on a schedule so it picks up new
 22 | campaigns too.
 23 | 
 24 | See the layer_dashboard Lambda Laye for details on how the dashboard is built.
 25 | """
 26 | 
 27 | import json
 28 | import os
 29 | import boto3
 30 | import chevron
 31 | 
 32 | from crhelper import CfnResource
 33 | from aws_lambda_powertools import Logger
 34 | from common import (
 35 |     extract_region,
 36 |     extract_account_id,
 37 |     get_client,
 38 |     get_configured_active_campaigns,
 39 |     get_configured_active_recommenders
 40 | )
 41 | 
 42 | logger = Logger()
 43 | helper = CfnResource()
 44 | 
 45 | cloudwatch = boto3.client('cloudwatch')
 46 | 
 47 | DASHBOARD_NAME = 'Amazon-Personalize-Monitor'
 48 | 
 49 | def build_dashboard(event):
 50 |     # Will hold the data used to render the template.
 51 |     template_data = {}
 52 | 
 53 |     template_data['namespace'] = 'PersonalizeMonitor'
 54 |     template_data['current_region'] = os.environ['AWS_REGION']
 55 | 
 56 |     logger.debug('Loading active campaigns and recommenders')
 57 | 
 58 |     campaigns = get_configured_active_campaigns(event)
 59 |     template_data['active_campaign_count'] = len(campaigns)
 60 | 
 61 |     recommenders = get_configured_active_recommenders(event)
 62 |     template_data['active_recommender_count'] = len(recommenders)
 63 | 
 64 |     # Group campaigns/recommenders by dataset group so we can create DSG specific widgets in rows
 65 |     resources_by_dsg_arn = {}
 66 |     # Holds DSG info so we only have describe once per DSG
 67 |     dsgs_by_arn = {}
 68 | 
 69 |     for campaign in campaigns:
 70 |         logger.info('Campaign %s will be added to the dashboard', campaign['campaignArn'])
 71 | 
 72 |         campaign_region = extract_region(campaign['campaignArn'])
 73 | 
 74 |         personalize = get_client('personalize', campaign_region)
 75 | 
 76 |         response = personalize.describe_solution_version(solutionVersionArn = campaign['solutionVersionArn'])
 77 | 
 78 |         dsg_arn = response['solutionVersion']['datasetGroupArn']
 79 |         recipe_arn = response['solutionVersion']['recipeArn']
 80 | 
 81 |         dsg = dsgs_by_arn.get(dsg_arn)
 82 |         if not dsg:
 83 |             response = personalize.describe_dataset_group(datasetGroupArn = dsg_arn)
 84 |             dsg = response['datasetGroup']
 85 |             dsgs_by_arn[dsg_arn] = dsg
 86 | 
 87 |         inference_resource_datas = resources_by_dsg_arn.get(dsg_arn)
 88 |         if not inference_resource_datas:
 89 |             inference_resource_datas = []
 90 |             resources_by_dsg_arn[dsg_arn] = inference_resource_datas
 91 | 
 92 |         campaign_data = {
 93 |             'name': campaign['name'],
 94 |             'resource_arn_name': 'CampaignArn',
 95 |             'resource_min_tps_name': 'minProvisionedTPS',
 96 |             'resource_avg_tps_name': 'averageTPS',
 97 |             'resource_utilization_name': 'campaignUtilization',
 98 |             'inference_arn': campaign['campaignArn'],
 99 |             'region': campaign_region
100 |         }
101 | 
102 |         if recipe_arn == 'arn:aws:personalize:::recipe/aws-personalized-ranking':
103 |             campaign_data['latency_metric_name'] = 'GetPersonalizedRankingLatency'
104 |         else:
105 |             campaign_data['latency_metric_name'] = 'GetRecommendationsLatency'
106 | 
107 |         inference_resource_datas.append(campaign_data)
108 | 
109 |     for recommender in recommenders:
110 |         logger.info('Recommender %s will be added to the dashboard', recommender['recommenderArn'])
111 | 
112 |         recommender_region = extract_region(recommender['recommenderArn'])
113 | 
114 |         dsg_arn = recommender['datasetGroupArn']
115 | 
116 |         dsg = dsgs_by_arn.get(dsg_arn)
117 |         if not dsg:
118 |             response = personalize.describe_dataset_group(datasetGroupArn = dsg_arn)
119 |             dsg = response['datasetGroup']
120 |             dsgs_by_arn[dsg_arn] = dsg
121 | 
122 |         inference_resource_datas = resources_by_dsg_arn.get(dsg_arn)
123 |         if not inference_resource_datas:
124 |             inference_resource_datas = []
125 |             resources_by_dsg_arn[dsg_arn] = inference_resource_datas
126 | 
127 |         recommender_data = {
128 |             'name': recommender['name'],
129 |             'resource_arn_name': 'RecommenderArn',
130 |             'resource_min_tps_name': 'minRecommendationRequestsPerSecond',
131 |             'resource_avg_tps_name': 'averageRPS',
132 |             'resource_utilization_name': 'recommenderUtilization',
133 |             'latency_metric_name': 'GetRecommendationsLatency',
134 |             'inference_arn': recommender['recommenderArn'],
135 |             'region': recommender_region
136 |         }
137 | 
138 |         inference_resource_datas.append(recommender_data)
139 | 
140 |     dsgs_for_template = []
141 | 
142 |     for dsg_arn, inference_resource_datas in resources_by_dsg_arn.items():
143 |         dsg = dsgs_by_arn[dsg_arn]
144 | 
145 |         # Minor hack to know when we're on the last item in list when iterating in template.
146 |         inference_resource_datas[len(inference_resource_datas) - 1]['last_resource'] = True
147 | 
148 |         dsgs_for_template.append({
149 |             'name': dsg['name'],
150 |             'region': extract_region(dsg_arn),
151 |             'account_id': extract_account_id(dsg_arn),
152 |             'inference_resources': inference_resource_datas
153 |         })
154 | 
155 |     template_data['dataset_groups'] = sorted(dsgs_for_template, key = lambda dsg: dsg['region'] + dsg['name'])
156 | 
157 |     # Render template and use as dashboard body.
158 |     with open('dashboard-template.mustache', 'r') as f:
159 |         dashboard = chevron.render(f, template_data)
160 | 
161 |         logger.debug(json.dumps(dashboard, indent = 2, default = str))
162 | 
163 |         logger.info('Adding/updating dashboard')
164 | 
165 |         cloudwatch.put_dashboard(
166 |             DashboardName = DASHBOARD_NAME,
167 |             DashboardBody = dashboard
168 |         )
169 | 
170 | def delete_dashboard():
171 |     logger.info('Deleting dashboard')
172 | 
173 |     cloudwatch.delete_dashboards(
174 |         DashboardNames = [ DASHBOARD_NAME ]
175 |     )
176 | 
177 | @helper.create
178 | @helper.update
179 | def create_or_update_resource(event, _):
180 |     build_dashboard(event)
181 | 
182 | @helper.delete
183 | def delete_resource(event, _):
184 |     delete_dashboard()
185 | 
186 | @logger.inject_lambda_context(log_event=True)
187 | def lambda_handler(event, context):
188 |     # If the event has a RequestType, we're being called by CFN as custom resource
189 |     if event.get('RequestType'):
190 |         logger.info('Called via CloudFormation as a custom resource; letting CfnResource route request')
191 |         helper(event, context)
192 |     else:
193 |         logger.info('Called via Invoke; assuming caller wants to build dashboard')
194 | 
195 |         if event.get('detail'):
196 |             reason = event['detail'].get('Reason')
197 |         else:
198 |             reason = event.get('Reason')
199 | 
200 |         if reason:
201 |             logger.info('Reason for dashboard build: %s', reason)
202 | 
203 |         build_dashboard(event)


--------------------------------------------------------------------------------
/src/dashboard_mgmt_function/requirements.txt:
--------------------------------------------------------------------------------
1 | # Note: AWS Lambda Power Tools dependency is satisfied by Lambda layer at runtime (part of deployment).
2 | chevron==0.13.1
3 | crhelper==2.0.6
4 | 


--------------------------------------------------------------------------------
/src/layer/README.md:
--------------------------------------------------------------------------------
1 | # Amazon Personalize Monitor - Common Lambda Layer
2 | 
3 | This [Lambda Layer](https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html) includes dependencies shared across all/most functions in this application. In addition, the [common.py](./common.py) file includes utility functions that are also shared across the Lambda functions in this application.


--------------------------------------------------------------------------------
/src/layer/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/src/layer/__init__.py


--------------------------------------------------------------------------------
/src/layer/common.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | """
  5 | Lambda layer functions shared across Lambda functions in this application
  6 | """
  7 | 
  8 | import boto3
  9 | import os
 10 | import json
 11 | import logging
 12 | import random
 13 | from typing import Dict, List
 14 | 
 15 | from botocore.exceptions import ClientError
 16 | from aws_lambda_powertools import Logger
 17 | from expiring_dict import ExpiringDict
 18 | 
 19 | logger = Logger(child=True)
 20 | 
 21 | _clients_by_region = {}
 22 | # Since the DescribeCampaign and DescribeRecommender APIs easily throttle,
 23 | # use a cache to help smooth out periods where we get throttled.
 24 | _resource_cache = ExpiringDict(max_age_seconds = 22 * 60)
 25 | 
 26 | PROJECT_NAME = 'PersonalizeMonitor'
 27 | ALARM_NAME_PREFIX = PROJECT_NAME + '-'
 28 | SNS_TOPIC_NAME = 'PersonalizeMonitorNotifications'
 29 | NOTIFICATIONS_RULE = 'PersonalizeMonitor-NotificationsRule'
 30 | NOTIFICATIONS_RULE_TARGET_ID = 'PersonalizeMonitorNotificationsId'
 31 | 
 32 | def put_event(detail_type, detail, resources = []):
 33 |     event_bridge = get_client('events')
 34 | 
 35 |     logger.info({
 36 |         'detail_type': detail_type,
 37 |         'detail': detail,
 38 |         'resources': resources
 39 |     })
 40 | 
 41 |     event_bridge.put_events(
 42 |         Entries=[
 43 |             {
 44 |                 'Source': 'personalize.monitor',
 45 |                 'Resources': resources,
 46 |                 'DetailType': detail_type,
 47 |                 'Detail': detail
 48 |             }
 49 |         ]
 50 |     )
 51 | 
 52 | def extract_region(arn: str) -> str:
 53 |     ''' Extracts region from an AWS ARN '''
 54 |     region = None
 55 |     elements = arn.split(':')
 56 |     if len(elements) > 3:
 57 |         region = elements[3]
 58 | 
 59 |     return region
 60 | 
 61 | def extract_resource_type(arn: str) -> str:
 62 |     ''' Extracts resource type from an AWS ARN '''
 63 |     resource = None
 64 |     elements = arn.split(':')
 65 |     if len(elements) > 5:
 66 |         resource = elements[5].split('/')[0]
 67 | 
 68 |     return resource
 69 | 
 70 | def is_campaign(arn: str) -> bool:
 71 |     return extract_resource_type(arn) == 'campaign'
 72 | 
 73 | def is_recommender(arn: str) -> bool:
 74 |     return extract_resource_type(arn) == 'recommender'
 75 | 
 76 | def extract_account_id(arn: str) -> str:
 77 |     ''' Extracts account ID from an AWS ARN '''
 78 |     account_id = None
 79 |     elements = arn.split(':')
 80 |     if len(elements) > 4:
 81 |         account_id = elements[4]
 82 | 
 83 |     return account_id
 84 | 
 85 | def get_client(service_name: str, region_name: str = None):
 86 |     if not region_name:
 87 |         region_name = os.environ['AWS_REGION']
 88 | 
 89 |     ''' Returns boto3 client for a service and region '''
 90 |     clients_by_service = _clients_by_region.get(region_name)
 91 | 
 92 |     if not clients_by_service:
 93 |         clients_by_service = {}
 94 |         _clients_by_region[region_name] = clients_by_service
 95 | 
 96 |     client = clients_by_service.get(service_name)
 97 | 
 98 |     if not client:
 99 |         client = boto3.client(service_name = service_name, region_name = region_name)
100 |         clients_by_service[service_name] = client
101 | 
102 |     return client
103 | 
104 | def determine_regions(event: Dict) -> List[str]:
105 |     ''' Determines regions from function event or environment '''
106 |     # Check event first (list of region names)
107 |     regions = None
108 |     if event:
109 |         regions = event.get('Regions')
110 | 
111 |     if not regions:
112 |         # Check environment variable next for list of region names as CSV
113 |         regions = os.environ.get('Regions')
114 | 
115 |     if not regions:
116 |         # Lastly, use current region from environment.
117 |         regions = os.environ['AWS_REGION']
118 | 
119 |     if regions and isinstance(regions, str):
120 |         regions = [exp.strip(' ') for exp in regions.split(',')]
121 | 
122 |     return regions
123 | 
124 | def _determine_arns(event: Dict, arn_param_name: str, arn_list_type: str) -> List[str]:
125 |     ''' Determines Personalize campaign ARNs based on function event or environment '''
126 | 
127 |     # Check event first (list of ARNs)
128 |     arns_spec = None
129 |     if event:
130 |         arns_spec = event.get(arn_param_name)
131 | 
132 |     if not arns_spec:
133 |         # Check environment variable next for list of ARNs as CSV
134 |         arns_spec = os.environ.get(arn_param_name)
135 | 
136 |     if not arns_spec:
137 |         raise Exception(f'"{arn_param_name}" expression required in event or environment')
138 | 
139 |     if isinstance(arns_spec, str):
140 |         arns_spec = [exp.strip(' ') for exp in arns_spec.split(',')]
141 | 
142 |     logger.debug('%s expression: %s', arn_param_name, arns_spec)
143 | 
144 |     # Look for magic value of "all" to mean all active campaigns/recommenders in configured region(s)
145 |     if len(arns_spec) == 1 and arns_spec[0].lower() == 'all':
146 |         logger.debug('Retrieving all active ARNs')
147 |         arns = []
148 | 
149 |         # Determine regions we need to consider
150 |         regions = determine_regions(event)
151 |         logger.debug('Regions to scan for active resources: %s', regions)
152 | 
153 |         for region in regions:
154 |             personalize = get_client(service_name = 'personalize', region_name = region)
155 | 
156 |             arns_for_region = 0
157 | 
158 |             resources_paginator = personalize.get_paginator(arn_list_type)
159 |             for resources_page in resources_paginator.paginate():
160 |                 if resources_page.get('campaigns'):
161 |                     for resource in resources_page['campaigns']:
162 |                         arns.append(resource['campaignArn'])
163 |                         arns_for_region += 1
164 |                 elif resources_page.get('recommenders'):
165 |                     for resource in resources_page['recommenders']:
166 |                         arns.append(resource['recommenderArn'])
167 |                         arns_for_region += 1
168 | 
169 |             logger.debug('Region %s has %d resources', region, arns_for_region)
170 |     else:
171 |         arns = arns_spec
172 | 
173 |     return arns
174 | 
175 | def determine_campaign_arns(event: Dict) -> List[str]:
176 |     ''' Determines Personalize campaign ARNs based on function event or environment '''
177 |     return _determine_arns(event, 'CampaignARNs', 'list_campaigns')
178 | 
179 | def determine_recommender_arns(event: Dict) -> List[str]:
180 |     ''' Determines Personalize recommender ARNs based on function event or environment '''
181 |     return _determine_arns(event, 'RecommenderARNs', 'list_recommenders')
182 | 
183 | def get_configured_active_campaigns(event: Dict) -> List[Dict]:
184 |     ''' Returns list of active campaigns as configured by function event and/or environment '''
185 |     campaign_arns = determine_campaign_arns(event)
186 | 
187 |     # Shuffle the list of arns so we don't try to describe campaigns in the same order each
188 |     # time and potentially use cached campaign details for the same campaigns further down
189 |     # the list due to rare but possible API throttling.
190 |     random.shuffle(campaign_arns)
191 | 
192 |     campaigns = []
193 | 
194 |     for campaign_arn in campaign_arns:
195 |         campaign_region = extract_region(campaign_arn)
196 |         personalize = get_client(service_name = 'personalize', region_name = campaign_region)
197 |         campaign = None
198 | 
199 |         try:
200 |             # Always try the DescribeCampaign API directly first.
201 |             campaign = personalize.describe_campaign(campaignArn = campaign_arn)['campaign']
202 |             if logger.isEnabledFor(logging.DEBUG):
203 |                 logger.debug('Campaign: %s', json.dumps(campaign, indent = 2, default = str))
204 |             _resource_cache[campaign_arn] = campaign
205 |         except ClientError as e:
206 |             error_code = e.response['Error']['Code']
207 |             if error_code == 'ThrottlingException':
208 |                 logger.error('ThrottlingException trapped when calling DescribeCampaign API for %s', campaign_arn)
209 | 
210 |                 # Fallback to see if we have a cached Campaign to use instead.
211 |                 campaign = _resource_cache.get(campaign_arn)
212 |                 if campaign:
213 |                     logger.warn('Using cached campaign object for %s', campaign_arn)
214 |                 else:
215 |                     logger.warn('Campaign %s NOT found found in cache; skipping this time', campaign_arn)
216 |             elif error_code == 'ResourceNotFoundException':
217 |                 # Campaign has been deleted; log and skip.
218 |                 logger.error('Campaign %s no longer exists; skipping', campaign_arn)
219 |             else:
220 |                 raise e
221 | 
222 |         if campaign:
223 |             if campaign['status'] == 'ACTIVE':
224 |                 latest_status = None
225 |                 if campaign.get('latestCampaignUpdate'):
226 |                     latest_status = campaign['latestCampaignUpdate']['status']
227 | 
228 |                 if not latest_status or (latest_status != 'DELETE PENDING' and latest_status != 'DELETE IN_PROGRESS'):
229 |                     campaigns.append(campaign)
230 |                 else:
231 |                     logger.info('Campaign %s latestCampaignUpdate.status is %s and cannot be monitored in this state; skipping', campaign_arn, latest_status)
232 |             else:
233 |                 logger.info('Campaign %s status is %s and cannot be monitored in this state; skipping', campaign_arn, campaign['status'])
234 | 
235 |     return campaigns
236 | 
237 | def get_configured_active_recommenders(event: Dict) -> List[Dict]:
238 |     ''' Returns list of active recommenders as configured by function event and/or environment '''
239 |     recommender_arns = determine_recommender_arns(event)
240 | 
241 |     # Shuffle the list of arns so we don't try to describe recommenders in the same order each
242 |     # time and potentially use cached recommender details for the same recommenders further down
243 |     # the list due to rare but possible API throttling.
244 |     random.shuffle(recommender_arns)
245 | 
246 |     recommenders = []
247 | 
248 |     for recommender_arn in recommender_arns:
249 |         region = extract_region(recommender_arn)
250 |         personalize = get_client(service_name = 'personalize', region_name = region)
251 |         recommender = None
252 | 
253 |         try:
254 |             # Always try the DescribeRecommender API directly first.
255 |             recommender = personalize.describe_recommender(recommenderArn = recommender_arn)['recommender']
256 |             if logger.isEnabledFor(logging.DEBUG):
257 |                 logger.debug('Recommender: %s', json.dumps(recommender, indent = 2, default = str))
258 |             _resource_cache[recommender_arn] = recommender
259 |         except ClientError as e:
260 |             error_code = e.response['Error']['Code']
261 |             if error_code == 'ThrottlingException':
262 |                 logger.error('ThrottlingException trapped when calling DescribeRecommender API for %s', recommender_arn)
263 | 
264 |                 # Fallback to see if we have a cached Recommender to use instead.
265 |                 recommender = _resource_cache.get(recommender_arn)
266 |                 if recommender:
267 |                     logger.warn('Using cached recommender object for %s', recommender_arn)
268 |                 else:
269 |                     logger.warn('Recommender %s NOT found found in cache; skipping this time', recommender_arn)
270 |             elif error_code == 'ResourceNotFoundException':
271 |                 # Recommender has been deleted; log and skip.
272 |                 logger.error('Recommender %s no longer exists; skipping', recommender_arn)
273 |             else:
274 |                 raise e
275 | 
276 |         if recommender:
277 |             if recommender['status'] == 'ACTIVE':
278 |                 latest_status = None
279 |                 if recommender.get('latestRecommenderUpdate'):
280 |                     latest_status = recommender['latestRecommenderUpdate']['status']
281 | 
282 |                 if not latest_status or (latest_status != 'DELETE PENDING' and latest_status != 'DELETE IN_PROGRESS'):
283 |                     recommenders.append(recommender)
284 |                 else:
285 |                     logger.info('Recommender %s latestRecommenderUpdate.status is %s and cannot be monitored in this state; skipping', recommender_arn, latest_status)
286 |             else:
287 |                 logger.info('Recommender %s status is %s and cannot be monitored in this state; skipping', recommender_arn, recommender['status'])
288 | 
289 |     return recommenders


--------------------------------------------------------------------------------
/src/layer/requirements.txt:
--------------------------------------------------------------------------------
1 | # Runtime requirements:
2 | # Note: the following dependency must be provided at runtime as Lambda layer:
3 | #   - AWS Lambda Power Tools as a Lambda layer.
4 | # Explicitly bring in a more recent boto3 to get latest API defs for Personalize that include recommender support.
5 | boto3==1.26.104
6 | expiring-dict==1.1.0


--------------------------------------------------------------------------------
/src/personalize_delete_campaign_function/README.md:
--------------------------------------------------------------------------------
 1 | # Amazon Personalize Monitor - Delete Campaign Function
 2 | 
 3 | This Lambda function deletes a Personalize campaign. It is called as the target of an EventBridge rule that matches events with the `DeletePersonalizeCampaign` detail-type. The [personalize-monitor](../personalize_monitor_function/) function publishes this event when the `AutoDeleteOrStopIdleResources` deployment parameter is `Yes` AND a monitored campaign has been idle more than `IdleThresholdHours` hours. Therefore, an idle campaign is one that has not had any `GetRecommendations` or `GetPersonalizedRanking` calls in the last `IdleThresholdHours` hours.
 4 | 
 5 | This function will also delete any CloudWatch alarms that were dynamically created by this application for the deleted campaign. Alarms can be created for idle campaigns and low utilization campaigns via the `AutoCreateIdleCampaignAlarms` and `AutoCreateCampaignUtilizationAlarms` deployment parameters.
 6 | 
 7 | > Note that Personalize recommenders are stopped and not deleted by this application so that the underlying model artifacts are retained. See the [personalize_stop_recommender](../personalize_stop_recommender_function/) function for details.
 8 | 
 9 | ## How it works
10 | 
11 | The EventBridge event structure that triggers this function looks something like this:
12 | 
13 | ```javascript
14 | {
15 |     "source": "personalize.monitor",
16 |     "detail-type": "DeletePersonalizeCampaign",
17 |     "resources": [ CAMPAIGN_ARN_TO_DELETE ],
18 |     "detail": {
19 |         'ARN': CAMPAIGN_ARN_TO_DELETE,
20 |         'Utilization': CURRENT_UTILIZATION,
21 |         'AgeHours': CAMPAIGN_AGE_IN_HOURS,
22 |         'IdleThresholdHours': CAMPAIGN_IDLE_HOURS,
23 |         'TotalRequestsDuringIdleThresholdHours': 0,
24 |         'Reason': DESCRIPTIVE_REASON_FOR_DELETE
25 |     }
26 | }
27 | ```
28 | 
29 | This function can also be invoked directly as part of your own operational process. The event you pass to the function only requires the campaign ARN as follows.
30 | 
31 | ```javascript
32 | {
33 |     "ARN": CAMPAIGN_ARN_TO_DELETE,
34 |     "Reason": OPTIONAL_DESCRIPTIVE_REASON_FOR_DELETE
35 | }
36 | ```
37 | 
38 | The Personalize [DeleteCampaign](https://docs.aws.amazon.com/personalize/latest/dg/API_DeleteCampaign.html) API is used to delete the campaign.
39 | 
40 | ## Published events
41 | 
42 | When the deletion of a campaign and any dynamically created CloudWatch alarms for the campaign have been successfully initiated by this function, two events are published to EventBridge. One event will trigger a notification to the SNS topic for this application and the other trigger the CloudWatch dashboard to be rebuilt.
43 | 
44 | ### Delete notification
45 | 
46 | The following event is published to EventBridge to signal that a campaign has been deleted.
47 | 
48 | ```javascript
49 | {
50 |     "source": "personalize.monitor",
51 |     "detail_type": "PersonalizeCampaignDeleted",
52 |     "resources": [ CAMPAIGN_ARN_DELETED ],
53 |     "detail": {
54 |         "ARN": CAMPAIGN_ARN_DELETED,
55 |         "Reason": DESCRIPTIVE_REASON_FOR_DELETE
56 |     }
57 | }
58 | ```
59 | 
60 | An EventBridge rule is setup that will target an SNS topic with `NotificationEndpoint` as the subscriber. This is the email address you provided at deployment time. If you'd like, you can customize how these notification events are handled in the EventBridge and SNS consoles.
61 | 
62 | ### Rebuild CloudWatch dashboard
63 | 
64 | Since a monitored campaign has been deleted, the CloudWatch dashboard needs to be rebuilt so that the campaign is removed from the widgets. This is accomplished by publishing a `BuildPersonalizeMonitorDashboard` event that is processed by the [dashboard_mgmt](../dashboard_mgmt_function/) function.
65 | 
66 | ```javascript
67 | {
68 |     "source": "personalize.monitor",
69 |     "detail_type": "BuildPersonalizeMonitorDashboard",
70 |     "resources": [ CAMPAIGN_ARN_DELETED ],
71 |     "detail": {
72 |         "ARN": CAMPAIGN_ARN_DELETED,
73 |         "Reason": DESCRIPTIVE_REASON_FOR_REBUILD
74 |     }
75 | }
76 | ```
77 | 


--------------------------------------------------------------------------------
/src/personalize_delete_campaign_function/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/src/personalize_delete_campaign_function/__init__.py


--------------------------------------------------------------------------------
/src/personalize_delete_campaign_function/personalize_delete_campaign.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | """
 5 | Lambda function that is used to delete a Personalize campaign based on prolonged idle time
 6 | and according to configuration to automatically delete campaigns under these conditions.
 7 | """
 8 | 
 9 | import json
10 | import logging
11 | 
12 | from aws_lambda_powertools import Logger
13 | 
14 | from common import (
15 |     PROJECT_NAME,
16 |     ALARM_NAME_PREFIX,
17 |     extract_region,
18 |     get_client,
19 |     put_event
20 | )
21 | 
22 | logger = Logger()
23 | 
24 | def delete_alarms_for_campaign(campaign_arn):
25 |     cw = get_client(service_name = 'cloudwatch', region_name = extract_region(campaign_arn))
26 | 
27 |     alarm_names_to_delete = set()
28 | 
29 |     alarms_paginator = cw.get_paginator('describe_alarms')
30 |     for alarms_page in alarms_paginator.paginate(AlarmNamePrefix = ALARM_NAME_PREFIX, AlarmTypes=['MetricAlarm']):
31 |         for alarm in alarms_page['MetricAlarms']:
32 |             for dim in alarm['Dimensions']:
33 |                 if dim['Name'] == 'CampaignArn' and dim['Value'] == campaign_arn:
34 |                     tags_response = cw.list_tags_for_resource(ResourceARN = alarm['AlarmArn'])
35 | 
36 |                     for tag in tags_response['Tags']:
37 |                         if tag['Key'] == 'CreatedBy' and tag['Value'] == PROJECT_NAME:
38 |                             alarm_names_to_delete.add(alarm['AlarmName'])
39 |                             break
40 | 
41 |     if alarm_names_to_delete:
42 |         # FUTURE: max check of 100
43 |         logger.info('Deleting CloudWatch alarms for campaign %s: %s', campaign_arn, alarm_names_to_delete)
44 |         cw.delete_alarms(AlarmNames=list(alarm_names_to_delete))
45 |         alarms_deleted += len(alarm_names_to_delete)
46 |     else:
47 |         logger.info('No CloudWatch alarms to delete for campaign %s', campaign_arn)
48 | 
49 | @logger.inject_lambda_context(log_event=True)
50 | def lambda_handler(event, _):
51 |     ''' Initiates the delete of a Personalize campaign '''
52 |     if event.get('detail'):
53 |         campaign_arn = event['detail']['ARN']
54 |         reason = event['detail'].get('Reason')
55 |     else:
56 |         campaign_arn = event['ARN']
57 |         reason = event.get('Reason')
58 | 
59 |     region = extract_region(campaign_arn)
60 |     if not region:
61 |         raise Exception('Region could not be extracted from campaign_arn')
62 | 
63 |     personalize = get_client(service_name = 'personalize', region_name = region)
64 | 
65 |     response = personalize.delete_campaign(campaignArn = campaign_arn)
66 | 
67 |     if logger.isEnabledFor(logging.DEBUG):
68 |         logger.debug(json.dumps(response, indent = 2, default = str))
69 | 
70 |     if not reason:
71 |         reason = f'Amazon Personalize campaign {campaign_arn} deletion initiated (reason unspecified)'
72 | 
73 |     put_event(
74 |         detail_type = 'PersonalizeCampaignDeleted',
75 |         detail = json.dumps({
76 |             'ARN': campaign_arn,
77 |             'Reason': reason
78 |         }),
79 |         resources = [ campaign_arn ]
80 |     )
81 | 
82 |     put_event(
83 |         detail_type = 'BuildPersonalizeMonitorDashboard',
84 |         detail = json.dumps({
85 |             'ARN': campaign_arn,
86 |             'Reason': reason
87 |         }),
88 |         resources = [ campaign_arn ]
89 |     )
90 | 
91 |     logger.info({
92 |         'campaignArn': campaign_arn
93 |     })
94 | 
95 |     delete_alarms_for_campaign(campaign_arn)
96 | 
97 |     return f'Successfully initiated delete of campaign {campaign_arn}'


--------------------------------------------------------------------------------
/src/personalize_delete_campaign_function/requirements.txt:
--------------------------------------------------------------------------------
1 | # Note: AWS Lambda Power Tools dependency is satisfied by Lambda layer at runtime (part of deployment).
2 | 


--------------------------------------------------------------------------------
/src/personalize_monitor_function/README.md:
--------------------------------------------------------------------------------
 1 | # Amazon Personalize Monitor - Core Monitor Function
 2 | 
 3 | The [personalize_monitor.py](./personalize_monitor.py) Lambda is called every 5 minutes by a CloudWatch scheduled event rule to generate the CloudWatch metrics needed to populate the Personalize Monitor dashboard line graph widgets and to trigger the CloudWatch alarms for low recommender/campaign utilization and idle recommender/campaign detection (if configured). Also, if the `AutoDeleteOrStopIdleResources` deployment parameter is `Yes` AND a monitored campaign has been idle more than `IdleThresholdHours` hours, this function will publish a `DeletePersonalizeCampaign` event to EventBridge that is handled by the [personalize_delete_campaign](../personalize_delete_campaign_function/) function.  An idle campaign is one that has not had any `GetRecommendations` or `GetPersonalizedRanking` calls in the last `IdleThresholdHours` hours. Finally, this function will adjust a campaign's `minProvisionedTPS` (down only) if the `AutoAdjustMinTPS` deployment parameter is `Yes`.
 4 | 
 5 | ## How it works
 6 | 
 7 | The function first determines what Personalize campaigns should be monitored based on the CloudFormation template parameters you specify when you [install](../README.md#installing-the-application) the application.
 8 | 
 9 | ## CloudWatch Metrics
10 | 
11 | The following custom CloudWatch metrics are generated by this function on 5 minute intervals. You can find these metrics in the AWS console under CloudWatch and then Metrics or you can query them using the CloudWatch API.
12 | 
13 | | Namespace | MetricName | Dimensions | Unit | Description |
14 | | --- | --- | --- | --- | --- |
15 | | PersonalizeMonitor | monitoredResourceCount | | Count | Number of recommenders and campaigns currently being monitored at interval |
16 | | PersonalizeMonitor | minRecommendationRequestsPerSecond | RecommenderArn | Count/Second | `minRecommendationRequestsPerSecond` value for the recommender at interval |
17 | | PersonalizeMonitor | averageRPS | RecommenderArn | Count/Second | Average RPS for the recommender at interval |
18 | | PersonalizeMonitor | recommenderUtilization | RecommenderArn | Percent | Utilization percentage of `averageRPS` vs `minRecommendationRequestsPerSecond` at interval |
19 | | PersonalizeMonitor | minProvisionedTPS | CampaignArn | Count/Second | `minProvisionedTPS` value for the campaign at interval |
20 | | PersonalizeMonitor | averageTPS | CampaignArn | Count/Second | Average TPS for the campaign at interval |
21 | | PersonalizeMonitor | campaignUtilization | CampaignArn | Percent | Utilization percentage of `averageTPS` vs `minProvisionedTPS` at interval |
22 | 
23 | ### How is averageRPS/averageTPS calculated?
24 | 
25 | The `averageRPS` and `averageTPS` metric value for each monitored recommender and campaign is calculated by first determining the number of requests made to the recommender or campaign during the 5 minute interval and dividing by 300 (the number of seconds in 5 minutes). The number of requests is pulled from the `GetRecommendations` or `GetPersonalizedRanking` metric (depending on the underlying recipe) for the recommender/campaign from the `AWS/Personalize` namespace. The request count metric is automatically updated by Personalize itself.
26 | 
27 | ## CloudWatch Alarms (optional)
28 | 
29 | You can optionally have CloudWatch alarms dynamically created for monitored recommenders/campaigns for low utilization and idle recommenders/campaigns.
30 | 
31 | ### Low Recommender/Campaign Utilization Alarm
32 | 
33 | If you set the `AutoCreateUtilizationAlarms` CloudFormation template parameter to `Yes` when you installed this application, this function will automatically create a CloudWatch alarm for every recommender and campaign that it monitors. The alarm will trigger when the `recommenderUtilization` or `campaignUtilization` custom metric described above drops below the `UtilizationThresholdAlarmLowerBound` installation parameter for 9 out of 12 evaluation periods. Since the intervals are 5 minutes, that means that 9 of the 12 five minute evaluations over a 60 minute span must be below the threshold to enter an alarm status. The same rule applies to transition from alarm to OK status. The alarm will be created in the region where the recommender/campaign was created. An [SNS](https://aws.amazon.com/sns/) topic created by this application will be used as the alarm and ok actions and the `NotificationEndpoint` (email address) deployment parameter will be setup as a subscriber to the topic. **Be sure to confirm the subscription sent when this application creates SNS topics and subscribes the email address you provided. You will receive a confirmation email for a topic created in each region where resources are monitored.**
34 | 
35 | The alarm will have its actions disabled when the `minRecommendationRequestsPerSecond` or `minProvisionedTPS` is 1 and enabled with `minRecommendationRequestsPerSecond` or `minProvisionedTPS` is > 1 so that notifications are only sent when utilization can be impacted by adjusting `minRecommendationRequestsPerSecond`/`minProvisionedTPS`.
36 | 
37 | ### Idle Recommender/Campaign Alarm
38 | 
39 | If you set the `AutoCreateIdleAlarms` CloudFormation template parameter to `Yes` when you installed this application, this function will automatically create a CloudWatch alarm for every monitored recommender/campaign that is idle for at least `IdleThresholdHours` hours. The actions for the alarm will be enabled only after the recommender/campaign has existed for `IdleThresholdHours` as well. The `GetRecommendations` or `GetPersonalizedRanking` (depending on the resource's recipe) will be used to assess the resource's idle state. The alarm will be created in the region where the recommender/campaign was created. An [SNS](https://aws.amazon.com/sns/) topic created by this application will be used as the alarm and ok actions and the `NotificationEndpoint` (email address) deployment parameter will be setup as a subscriber to the topic. **Be sure to confirm the subscription sent when this application creates SNS topics and subscribes the email address you provided. You will receive a confirmation email for a topic created in each region where resources are monitored.**
40 | 
41 | ## Automatically adjusting minRecommendationRequestsPerSecond (recommenders) and minProvisionedTPS (campaigns) (optional)
42 | 
43 | If the `AutoAdjustMinTPS` deployment parameter is `Yes`, this function will check the actual hourly RPS/TPS over the last 14 days against the currently configured `minRecommendationRequestsPerSecond`/`minProvisionedTPS` and look for opportunities to reduce the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` to optimize utilization and reduce costs. It does this by checking the recommender's or campaign's request volume for the previous 14 days on hourly intervals and finding the hour with the lowest average RPS/TPS (low watermark). If the low watermark average is less than `minRecommendationRequestsPerSecond`/`minProvisionedTPS` AND the recommender/campaign is more than 1 day old, it will drop the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` by 25%. This process will be repeated each hour until either the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` meets the low watermark RPS/TPS or the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` reaches 1 (the lowest allowed value). **This function will NOT increase the `minRecommendationRequestsPerSecond`/`minProvisionedTPS`.** Instead it will rely on Personalize to auto-scale recommenders/campaigns up and back down to `minRecommendationRequestsPerSecond`/`minProvisionedTPS` to meet demand.
44 | 
45 | > Since it can take several minutes for a recommender/campaign to redeploy after updating its `minRecommendationRequestsPerSecond`/`minProvisionedTPS`, you will receive the notification when the redeploy starts. The recommender/campaign will continue to respond to `GetRecommendations`/`GetPersonalizedRanking` API requests while it is redeploying. There will be no interruption of service.
46 | 
47 | See the [personalize_update_tps](../personalize_update_tps_function/) function for details on the update function.
48 | 
49 | ## Automatically stopping recommenders and deleting idle campaigns (optional)
50 | 
51 | If the `AutoDeleteOrStopIdleResources` deployment parameter is `Yes`, this function will perform additional checks once per hour for each monitored recommender/campaign to see if it has been idle for more than `IdleThresholdHours` hours. The purpose of this feature is to prevent abandoned recommenders/campaigns from continuing to incur inference costs when they are no longer being used. Recommender/campaign checks are distributed across each hour in 10 minute blocks in an attempt to spread out the API calls needed to check and update recommenders/campaigns.
52 | 
53 | To avoid too aggressively stopping recommenders or deleting campaigns, new recommenders/campaigns that are not more than `IdleThresholdHours` hours old are exempt from being stopped/deleted. Similarly, if a recommender/campaign has been updated within `IdleThresholdHours`, it will also be exempt from being automatically stopped/deleted. The idea is that new or actively updated recommenders/campaigns are likely not safe to delete.
54 | 


--------------------------------------------------------------------------------
/src/personalize_monitor_function/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/src/personalize_monitor_function/__init__.py


--------------------------------------------------------------------------------
/src/personalize_monitor_function/personalize_monitor.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | """Lambda function that records Personalize resource metrics
  5 | 
  6 | Lambda function designed to be called every five minutes to record campaign TPS
  7 | utilization metrics and recommender RRPS in CloudWatch. The metrics are used for
  8 | alarms and on the CloudWatch dashboard created by this application.
  9 | """
 10 | 
 11 | import json
 12 | import os
 13 | import datetime
 14 | import sys
 15 | import math
 16 | import logging
 17 | 
 18 | from typing import Dict
 19 | from aws_lambda_powertools import Logger
 20 | 
 21 | from common import (
 22 |     PROJECT_NAME,
 23 |     ALARM_NAME_PREFIX,
 24 |     SNS_TOPIC_NAME,
 25 |     NOTIFICATIONS_RULE,
 26 |     NOTIFICATIONS_RULE_TARGET_ID,
 27 |     extract_region,
 28 |     get_client,
 29 |     get_configured_active_campaigns,
 30 |     get_configured_active_recommenders,
 31 |     put_event
 32 | )
 33 | 
 34 | logger = Logger()
 35 | 
 36 | MAX_METRICS_PER_CALL = 20
 37 | MIN_IDLE_THRESHOLD_HOURS = 1
 38 | 
 39 | ALARM_PERIOD_SECONDS = 300
 40 | ALARM_NAME_PREFIX_LOW_CAMPAIGN_UTILIZATION = ALARM_NAME_PREFIX + 'LowCampaignUtilization-'
 41 | ALARM_NAME_PREFIX_LOW_RECOMMENDER_UTILIZATION = ALARM_NAME_PREFIX + 'LowRecommenderUtilization-'
 42 | ALARM_NAME_PREFIX_IDLE_CAMPAIGN = ALARM_NAME_PREFIX + 'IdleCampaign-'
 43 | ALARM_NAME_PREFIX_IDLE_RECOMMENDER = ALARM_NAME_PREFIX + 'IdleRecommender-'
 44 | 
 45 | _topic_arn_by_region = {}
 46 | 
 47 | def get_recipe_arn(resource: Dict):
 48 |     recipe_arn = resource.get('recipeArn')
 49 |     if not recipe_arn and 'campaignArn' in resource:
 50 |         campaign_region = extract_region(resource['campaignArn'])
 51 |         personalize = get_client('personalize', campaign_region)
 52 | 
 53 |         response = personalize.describe_solution_version(solutionVersionArn = resource['solutionVersionArn'])
 54 | 
 55 |         recipe_arn = response['solutionVersion']['recipeArn']
 56 |         resource['recipeArn'] = recipe_arn
 57 | 
 58 |     return recipe_arn
 59 | 
 60 | def get_inference_metric_name(resource):
 61 |     metric_name = 'GetRecommendations'
 62 |     if 'campaignArn' in resource and get_recipe_arn(resource) == 'arn:aws:personalize:::recipe/aws-personalized-ranking':
 63 |         metric_name = 'GetPersonalizedRanking'
 64 | 
 65 |     return metric_name
 66 | 
 67 | def get_sum_requests_datapoints(resource, start_time, end_time, period):
 68 |     if 'campaignArn' in resource:
 69 |         arn_key = 'campaignArn'
 70 |         dim_name = 'CampaignArn'
 71 |     else:
 72 |         arn_key = 'recommenderArn'
 73 |         dim_name = 'RecommenderArn'
 74 | 
 75 |     resource_region = extract_region(resource[arn_key])
 76 |     cw = get_client(service_name = 'cloudwatch', region_name = resource_region)
 77 | 
 78 |     metric_name = get_inference_metric_name(resource)
 79 | 
 80 |     response = cw.get_metric_data(
 81 |         MetricDataQueries = [
 82 |             {
 83 |                 'Id': 'm1',
 84 |                 'MetricStat': {
 85 |                     'Metric': {
 86 |                         'Namespace': 'AWS/Personalize',
 87 |                         'MetricName': metric_name,
 88 |                         'Dimensions': [
 89 |                             {
 90 |                                 'Name': dim_name,
 91 |                                 'Value': resource[arn_key]
 92 |                             }
 93 |                         ]
 94 |                     },
 95 |                     'Period': period,
 96 |                     'Stat': 'Sum'
 97 |                 },
 98 |                 'ReturnData': True
 99 |             }
100 |         ],
101 |         StartTime = start_time,
102 |         EndTime = end_time,
103 |         ScanBy = 'TimestampDescending'
104 |     )
105 | 
106 |     datapoints = []
107 | 
108 |     if response.get('MetricDataResults') and len(response['MetricDataResults']) > 0:
109 |         results = response['MetricDataResults'][0]
110 | 
111 |         for idx, ts in enumerate(results['Timestamps']):
112 |             datapoints.append({
113 |                 'Timestamp': ts,
114 |                 'Value': results['Values'][idx]
115 |             })
116 | 
117 |     return datapoints
118 | 
119 | def get_sum_requests_by_hour(resource, start_time, end_time):
120 |     datapoints = get_sum_requests_datapoints(resource, start_time, end_time, 3600)
121 |     return datapoints
122 | 
123 | def get_total_requests(resource, start_time, end_time, period):
124 |     datapoints = get_sum_requests_datapoints(resource, start_time, end_time, period)
125 | 
126 |     sum_requests = 0
127 |     if datapoints:
128 |         for datapoint in datapoints:
129 |             sum_requests += datapoint['Value']
130 | 
131 |     return sum_requests
132 | 
133 | def get_average_tps(resource, start_time, end_time, period = ALARM_PERIOD_SECONDS):
134 |     sum_requests = get_total_requests(resource, start_time, end_time, period)
135 |     return sum_requests / period
136 | 
137 | def get_age_hours(resource):
138 |     diff = datetime.datetime.now(datetime.timezone.utc) - resource['creationDateTime']
139 |     days, seconds = diff.days, diff.seconds
140 | 
141 |     hours_age = days * 24 + seconds // 3600
142 |     return hours_age
143 | 
144 | def get_last_update_age_hours(resource):
145 |     hours_age = None
146 |     if resource.get('lastUpdatedDateTime'):
147 |         diff = datetime.datetime.now(datetime.timezone.utc) - resource['lastUpdatedDateTime']
148 |         days, seconds = diff.days, diff.seconds
149 | 
150 |         hours_age = days * 24 + seconds // 3600
151 |     return hours_age
152 | 
153 | def is_resource_updatable(resource):
154 |     status = resource['status']
155 |     updatable = status == 'ACTIVE' or status == 'CREATE FAILED'
156 | 
157 |     if updatable:
158 |         if resource.get('latestCampaignUpdate'):
159 |             status = resource['latestCampaignUpdate']['status']
160 |             updatable = status == 'ACTIVE' or status == 'CREATE FAILED'
161 |         elif resource.get('latestRecommenderUpdate'):
162 |             status = resource['latestRecommenderUpdate']['status']
163 |             updatable = status == 'ACTIVE' or status == 'CREATE FAILED'
164 | 
165 |     return updatable
166 | 
167 | def put_metrics(client, metric_datas):
168 |     metric = {
169 |         'Namespace': PROJECT_NAME,
170 |         'MetricData': metric_datas
171 |     }
172 | 
173 |     client.put_metric_data(**metric)
174 |     logger.debug('Put data for %d metrics', len(metric_datas))
175 | 
176 | def append_metric(metric_datas_by_region, region, metric):
177 |     metric_datas = metric_datas_by_region.get(region)
178 | 
179 |     if not metric_datas:
180 |         metric_datas = []
181 |         metric_datas_by_region[region] = metric_datas
182 | 
183 |     metric_datas.append(metric)
184 | 
185 | def notifications_rule_exists(events_client) -> bool:
186 |     try:
187 |         events_client.describe_rule(Name = NOTIFICATIONS_RULE)
188 |         return True
189 |     except events_client.exceptions.ResourceNotFoundException:
190 |         return False
191 | 
192 | def get_notification_subscription(sns_client, topic_arn, endpoint: str) -> Dict:
193 |     subs_paginator = sns_client.get_paginator('list_subscriptions_by_topic')
194 |     for subs_page in subs_paginator.paginate(TopicArn = topic_arn):
195 |         if subs_page.get('Subscriptions'):
196 |             for sub in subs_page['Subscriptions']:
197 |                 if endpoint == sub.get('Endpoint'):
198 |                     return sns_client.get_subscription_attributes(SubscriptionArn=sub['SubscriptionArn'])['Attributes']
199 |     return None
200 | 
201 | def get_topic_arn(resource_region: str) -> str:
202 |     # If the ARN has already been created/fetched, return it from cache.
203 |     if resource_region in _topic_arn_by_region:
204 |         logger.debug('Returning cached SNS topic ARN for region %s', resource_region)
205 |         return _topic_arn_by_region[resource_region]
206 | 
207 |     sns = get_client(service_name = 'sns', region_name = resource_region)
208 | 
209 |     logger.info('Creating/fetching SNS topic ARN for topic %s in region %s', SNS_TOPIC_NAME, resource_region)
210 |     response = sns.create_topic(Name = SNS_TOPIC_NAME)
211 |     topic_arn = response['TopicArn']
212 | 
213 |     logger.info('Setting topic policy for SNS topic %s', topic_arn)
214 |     sns.set_topic_attributes(
215 |         TopicArn = topic_arn,
216 |         AttributeName = 'Policy',
217 |         AttributeValue = '''{
218 |             "Version": "2008-10-17",
219 |             "Id": "PublishPolicy",
220 |             "Statement": [{
221 |                 "Effect": "Allow",
222 |                 "Principal": {
223 |                 "Service": [
224 |                     "cloudwatch.amazonaws.com",
225 |                     "events.amazonaws.com"
226 |                 ]
227 |                 },
228 |                 "Action": [ "sns:Publish" ],
229 |                 "Resource": "%s"
230 |             }]
231 |         }''' % (topic_arn)
232 |     )
233 | 
234 |     # Cache it so we avoid repeat calls while function is resident.
235 |     _topic_arn_by_region[resource_region] = topic_arn
236 | 
237 |     events = get_client(service_name = 'events', region_name = resource_region)
238 | 
239 |     if not notifications_rule_exists(events):
240 |         logger.info('EventBridge notifications rule %s does not exist; creating', NOTIFICATIONS_RULE)
241 | 
242 |         response = events.put_rule(
243 |             Name = NOTIFICATIONS_RULE,
244 |             EventPattern = '''{
245 |                 "detail-type": ["PersonalizeCampaignMinProvisionedTPSUpdated", "PersonalizeCampaignDeleted", "PersonalizeRecommenderMinRecommendationRPSUpdated", "PersonalizeRecommenderStopped"],
246 |                 "source": ["personalize.monitor"]
247 |             }''',
248 |             State = 'ENABLED',
249 |             Description = 'Routes Personalize Monitor notifications to notification SNS topic'
250 |         )
251 | 
252 |         logger.info('Setting target on notification rule')
253 |         events.put_targets(
254 |             Rule = NOTIFICATIONS_RULE,
255 |             Targets = [{
256 |                 'Id': NOTIFICATIONS_RULE_TARGET_ID,
257 |                 'Arn': topic_arn
258 |             }]
259 |         )
260 |     else:
261 |         logger.info('EventBridge notification rule %s already exists', NOTIFICATIONS_RULE)
262 | 
263 |     notification_endpoint = os.environ.get('NotificationEndpoint')
264 | 
265 |     if notification_endpoint:
266 |         logger.info('Verifying SNS topic subscription for %s', notification_endpoint)
267 |         subscription = get_notification_subscription(sns, topic_arn, notification_endpoint)
268 |         if subscription == None:
269 |             logger.info('Subscribing endpoint %s to SNS topic %s', notification_endpoint, topic_arn)
270 |             sns.subscribe(
271 |                 TopicArn = topic_arn,
272 |                 Protocol = 'email',
273 |                 Endpoint = notification_endpoint
274 |             )
275 |         elif subscription['PendingConfirmation'] == 'true':
276 |             logger.warn('SNS topic subscription is still pending confirmation')
277 |         else:
278 |             logger.info('Endpoint is subscribed and confirmed for SNS topic')
279 |     else:
280 |         logger.warn('No notification endpoint specified at deployment so not adding subscriber')
281 | 
282 |     return topic_arn
283 | 
284 | def create_utilization_alarm(resource_region, resource, utilization_threshold_lower_bound):
285 |     cw = get_client(service_name = 'cloudwatch', region_name = resource_region)
286 | 
287 |     if 'campaignArn' in resource:
288 |         metric_name = 'campaignUtilization'
289 |         arn_key = 'campaignArn'
290 |         dim_name = 'CampaignArn'
291 |         alarm_prefix = ALARM_NAME_PREFIX_LOW_CAMPAIGN_UTILIZATION
292 |         # Only enable alarm actions when minTPS > 1 since we can't really do
293 |         # anything to impact utilization by dropping minTPS. Let the idle
294 |         # alarm handle abandoned campaigns/recommenders.
295 |         enable_actions = resource['minProvisionedTPS'] > 1
296 |     else:
297 |         metric_name = 'recommenderUtilization'
298 |         arn_key = 'recommenderArn'
299 |         dim_name = 'RecommenderArn'
300 |         alarm_prefix = ALARM_NAME_PREFIX_LOW_RECOMMENDER_UTILIZATION
301 |         # Only enable alarm actions when minRPS > 1 since we can't really do
302 |         # anything to impact utilization by dropping minTPS. Let the idle
303 |         # alarm handle abandoned campaigns/recommenders.
304 |         enable_actions = resource['recommenderConfig']['minRecommendationRequestsPerSecond'] > 1
305 | 
306 |     response = cw.describe_alarms_for_metric(
307 |         MetricName = metric_name,
308 |         Namespace = PROJECT_NAME,
309 |         Dimensions=[
310 |             {
311 |                 'Name': dim_name,
312 |                 'Value': resource[arn_key]
313 |             },
314 |         ]
315 |     )
316 | 
317 |     alarm_name = alarm_prefix + resource['name']
318 | 
319 |     low_utilization_alarm_exists = False
320 |     actions_currently_enabled = False
321 | 
322 |     for alarm in response['MetricAlarms']:
323 |         if (alarm['AlarmName'].startswith(alarm_prefix) and
324 |                 alarm['ComparisonOperator'] in [ 'LessThanThreshold', 'LessThanOrEqualToThreshold' ]):
325 |             alarm_name = alarm['AlarmName']
326 |             low_utilization_alarm_exists = True
327 |             actions_currently_enabled = alarm['ActionsEnabled']
328 |             break
329 | 
330 |     alarm_created = False
331 | 
332 |     if not low_utilization_alarm_exists:
333 |         logger.info('Creating lower bound utilization alarm for %s', resource[arn_key])
334 | 
335 |         topic_arn = get_topic_arn(resource_region)
336 | 
337 |         cw.put_metric_alarm(
338 |             AlarmName = alarm_name,
339 |             AlarmDescription = 'Alarms when utilization falls below threshold indicating possible over provisioning condition',
340 |             ActionsEnabled = enable_actions,
341 |             OKActions = [ topic_arn ],
342 |             AlarmActions = [ topic_arn ],
343 |             MetricName = metric_name,
344 |             Namespace = PROJECT_NAME,
345 |             Statistic = 'Average',
346 |             Dimensions = [
347 |                 {
348 |                     'Name': dim_name,
349 |                     'Value': resource[arn_key]
350 |                 }
351 |             ],
352 |             Period = ALARM_PERIOD_SECONDS,
353 |             EvaluationPeriods = 12, # last 60 minutes
354 |             DatapointsToAlarm = 9,  # alarm state for 45 of last 60 minutes
355 |             Threshold = utilization_threshold_lower_bound,
356 |             ComparisonOperator = 'LessThanThreshold',
357 |             TreatMissingData = 'missing',
358 |             Tags=[
359 |                 {
360 |                     'Key': 'CreatedBy',
361 |                     'Value': PROJECT_NAME
362 |                 }
363 |             ]
364 |         )
365 | 
366 |         alarm_created = True
367 |     elif enable_actions != actions_currently_enabled:
368 |         # Toggle enable/disable actions for existing alarm.
369 |         if enable_actions:
370 |             cw.enable_alarm_actions(AlarmNames = [ alarm_name ])
371 |         else:
372 |             cw.disable_alarm_actions(AlarmNames = [ alarm_name ])
373 | 
374 |     return alarm_created
375 | 
376 | def create_idle_resource_alarm(resource_region, resource, idle_threshold_hours):
377 |     cw = get_client(service_name = 'cloudwatch', region_name = resource_region)
378 |     topic_arn = get_topic_arn(resource_region)
379 | 
380 |     metric_name = get_inference_metric_name(resource)
381 | 
382 |     if 'campaignArn' in resource:
383 |         arn_key = 'campaignArn'
384 |         dim_name = 'CampaignArn'
385 |         alarm_prefix = ALARM_NAME_PREFIX_IDLE_CAMPAIGN
386 |     else:
387 |         arn_key = 'recommenderArn'
388 |         dim_name = 'RecommenderArn'
389 |         alarm_prefix = ALARM_NAME_PREFIX_IDLE_RECOMMENDER
390 | 
391 |     response = cw.describe_alarms_for_metric(
392 |         MetricName = metric_name,
393 |         Namespace = 'AWS/Personalize',
394 |         Dimensions=[
395 |             {
396 |                 'Name': dim_name,
397 |                 'Value': resource[arn_key]
398 |             },
399 |         ]
400 |     )
401 | 
402 |     alarm_name = alarm_prefix + resource['name']
403 | 
404 |     idle_alarm_exists = False
405 |     # Only enable actions when the campaign/recommender has existed at least as
406 |     # long as the idle threshold. This is necessary since the alarm treats missing
407 |     # data as breaching.
408 |     enable_actions = get_age_hours(resource) >= idle_threshold_hours
409 |     actions_currently_enabled = False
410 | 
411 |     for alarm in response['MetricAlarms']:
412 |         if (alarm['AlarmName'].startswith(alarm_prefix) and
413 |                 alarm['ComparisonOperator'] == 'LessThanOrEqualToThreshold' and
414 |                 int(alarm['Threshold']) == 0):
415 |             alarm_name = alarm['AlarmName']
416 |             idle_alarm_exists = True
417 |             actions_currently_enabled = alarm['ActionsEnabled']
418 |             break
419 | 
420 |     alarm_created = False
421 | 
422 |     if not idle_alarm_exists:
423 |         logger.info('Creating idle utilization alarm for %s', resource[arn_key])
424 | 
425 |         cw.put_metric_alarm(
426 |             AlarmName = alarm_name,
427 |             AlarmDescription = 'Alarms when utilization is idle for continguous length of time indicating potential abandoned campaign/recommender',
428 |             ActionsEnabled = enable_actions,
429 |             OKActions = [ topic_arn ],
430 |             AlarmActions = [ topic_arn ],
431 |             MetricName = metric_name,
432 |             Namespace = 'AWS/Personalize',
433 |             Statistic = 'Sum',
434 |             Dimensions = [
435 |                 {
436 |                     'Name': dim_name,
437 |                     'Value': resource[arn_key]
438 |                 }
439 |             ],
440 |             Period = ALARM_PERIOD_SECONDS,
441 |             EvaluationPeriods = int(((60 * 60) / ALARM_PERIOD_SECONDS) * idle_threshold_hours),
442 |             Threshold = 0,
443 |             ComparisonOperator = 'LessThanOrEqualToThreshold',
444 |             TreatMissingData = 'breaching', # Won't get metric data for idle campaigns
445 |             Tags=[
446 |                 {
447 |                     'Key': 'CreatedBy',
448 |                     'Value': PROJECT_NAME
449 |                 }
450 |             ]
451 |         )
452 | 
453 |         alarm_created = True
454 |     elif enable_actions != actions_currently_enabled:
455 |         # Toggle enable/disable actions for existing alarm.
456 |         if enable_actions:
457 |             cw.enable_alarm_actions(AlarmNames = [ alarm_name ])
458 |         else:
459 |             cw.disable_alarm_actions(AlarmNames = [ alarm_name ])
460 | 
461 |     return alarm_created
462 | 
463 | def divide_chunks(l, n):
464 |     for i in range(0, len(l), n):
465 |         yield l[i:i + n]
466 | 
467 | def perform_hourly_checks(resource_arn):
468 |     ''' Hashes resource_arn across 10 minute intervals of the current hour so we spread out hourly checks '''
469 |     num_slots = 6  # 60 mins / 10
470 |     slot = sum(bytearray(resource_arn.encode('utf-8'))) % num_slots
471 |     # Allow for match on first two minutes of 10 minute slot to account for CW event lag (assumes current schedule of every 5 mins).
472 |     return datetime.datetime.now().minute in range(slot * 10, slot * 10 + 2)
473 | 
474 | @logger.inject_lambda_context(log_event=True)
475 | def lambda_handler(event, _):
476 |     auto_create_utilization_alarms = event.get('AutoCreateUtilizationAlarms')
477 |     if not auto_create_utilization_alarms:
478 |         auto_create_utilization_alarms = os.environ.get('AutoCreateUtilizationAlarms', 'yes').lower() in [ 'true', 'yes', '1' ]
479 | 
480 |     utilization_threshold_lower_bound = event.get('UtilizationThresholdAlarmLowerBound')
481 |     if not utilization_threshold_lower_bound:
482 |         utilization_threshold_lower_bound = float(os.environ.get('UtilizationThresholdAlarmLowerBound', '100.0'))
483 | 
484 |     auto_create_idle_alarms = event.get('AutoCreateIdleAlarms')
485 |     if not auto_create_idle_alarms:
486 |         auto_create_idle_alarms = os.environ.get('AutoCreateIdleAlarms', 'yes').lower() in [ 'true', 'yes', '1' ]
487 | 
488 |     auto_delete_idle_resources = event.get('AutoDeleteOrStopIdleResources')
489 |     if not auto_delete_idle_resources:
490 |         auto_delete_idle_resources = os.environ.get('AutoDeleteOrStopIdleResources', 'false').lower() in [ 'true', 'yes', '1' ]
491 | 
492 |     idle_resource_threshold_hours = event.get('IdleThresholdHours')
493 |     if not idle_resource_threshold_hours:
494 |         idle_resource_threshold_hours = int(os.environ.get('IdleThresholdHours', '24'))
495 | 
496 |     if idle_resource_threshold_hours < MIN_IDLE_THRESHOLD_HOURS:
497 |         raise ValueError(f'"IdleThresholdHours" must be >= {MIN_IDLE_THRESHOLD_HOURS} hours')
498 | 
499 |     auto_adjust_min_tps = event.get('AutoAdjustMinTPS')
500 |     if not auto_adjust_min_tps:
501 |         auto_adjust_min_tps = os.environ.get('AutoAdjustMinTPS', 'yes').lower() in [ 'true', 'yes', '1' ]
502 | 
503 |     campaigns = get_configured_active_campaigns(event)
504 |     recommenders = get_configured_active_recommenders(event)
505 | 
506 |     current_region = os.environ['AWS_REGION']
507 | 
508 |     metric_datas_by_region = {}
509 | 
510 |     append_metric(metric_datas_by_region, current_region, {
511 |         'MetricName': 'monitoredResourceCount',
512 |         'Value': len(campaigns) + len(recommenders),
513 |         'Unit': 'Count'
514 |     })
515 | 
516 |     resource_metrics_written = 0
517 |     all_metrics_written = 0
518 |     alarms_created = 0
519 | 
520 |     # Define our 5 minute window, ensuring it's on prior 5 minute boundary.
521 |     end_time = datetime.datetime.now(datetime.timezone.utc)
522 |     end_time = end_time.replace(microsecond=0,second=0, minute=end_time.minute - end_time.minute % 5)
523 |     start_time = end_time - datetime.timedelta(minutes=5)
524 | 
525 |     logger.info('Retrieving minProvisionedTPS for %d active campaigns', len(campaigns))
526 |     logger.info('Retrieving minRecommendationRequestsPerSecond for %d active recommenders', len(recommenders))
527 | 
528 |     for resource in campaigns + recommenders:
529 |         if logger.isEnabledFor(logging.DEBUG):
530 |             logger.debug('Resource: %s', json.dumps(resource, indent = 2, default = str))
531 | 
532 |         is_campaign = 'campaignArn' in resource
533 | 
534 |         resource_arn = resource['campaignArn'] if is_campaign else resource['recommenderArn']
535 |         resource_region = extract_region(resource_arn)
536 | 
537 |         min_tps = resource['minProvisionedTPS'] if is_campaign else resource['recommenderConfig']['minRecommendationRequestsPerSecond']
538 | 
539 |         append_metric(metric_datas_by_region, resource_region, {
540 |             'MetricName': 'minProvisionedTPS' if is_campaign else 'minRecommendationRequestsPerSecond',
541 |             'Dimensions': [
542 |                 {
543 |                     'Name': 'CampaignArn' if is_campaign else 'RecommenderArn',
544 |                     'Value': resource_arn
545 |                 }
546 |             ],
547 |             'Value': min_tps,
548 |             'Unit': 'Count/Second'
549 |         })
550 | 
551 |         tps = get_average_tps(resource, start_time, end_time)
552 |         utilization = 0
553 | 
554 |         if tps:
555 |             append_metric(metric_datas_by_region, resource_region, {
556 |                 'MetricName': 'averageTPS' if is_campaign else 'averageRPS',
557 |                 'Dimensions': [
558 |                     {
559 |                         'Name': 'CampaignArn' if is_campaign else 'RecommenderArn',
560 |                         'Value': resource_arn
561 |                     }
562 |                 ],
563 |                 'Value': tps,
564 |                 'Unit': 'Count/Second'
565 |             })
566 | 
567 |             utilization = tps / min_tps * 100
568 | 
569 |         append_metric(metric_datas_by_region, resource_region, {
570 |             'MetricName': 'campaignUtilization' if is_campaign else 'recommenderUtilization',
571 |             'Dimensions': [
572 |                 {
573 |                     'Name': 'CampaignArn' if is_campaign else 'RecommenderArn',
574 |                     'Value': resource_arn
575 |                 }
576 |             ],
577 |             'Value': utilization,
578 |             'Unit': 'Percent'
579 |         })
580 | 
581 |         logger.debug(
582 |             'Resource %s has current minTPS of %d and actual TPS of %s yielding %.2f%% utilization',
583 |             resource_arn, min_tps, tps, utilization
584 |         )
585 |         resource_metrics_written += 1
586 | 
587 |         # Only do idle resource and minTPS adjustment checks once per hour for each campaign/recommender.
588 |         perform_hourly_checks_this_run = perform_hourly_checks(resource_arn)
589 | 
590 |         # Determine how old the resource is and time since last update.
591 |         resource_age_hours = get_age_hours(resource)
592 |         resource_update_age_hours = get_last_update_age_hours(resource)
593 | 
594 |         resource_delete_stop_event_fired = False
595 | 
596 |         if utilization == 0 and perform_hourly_checks_this_run and auto_delete_idle_resources:
597 |             # Resource is currently idle. Let's see if it's old enough and not being updated recently.
598 |             logger.info(
599 |                 'Performing idle stop/delete check for %s; resource is %d hours old; last updated %s hours ago',
600 |                 resource_arn, resource_age_hours, resource_update_age_hours
601 |             )
602 | 
603 |             if (resource_age_hours >= idle_resource_threshold_hours):
604 | 
605 |                 # Resource has been around long enough. Let's see how long it's been idle.
606 |                 end_time_idle_check = datetime.datetime.now(datetime.timezone.utc)
607 |                 start_time_idle_check = end_time_idle_check - datetime.timedelta(hours = idle_resource_threshold_hours)
608 |                 period_idle_check = idle_resource_threshold_hours * 60 * 60
609 | 
610 |                 total_requests = get_total_requests(resource, start_time_idle_check, end_time_idle_check, period_idle_check)
611 | 
612 |                 if total_requests == 0:
613 |                     if is_resource_updatable(resource):
614 |                         if is_campaign:
615 |                             detail_type = 'DeletePersonalizeCampaign'
616 |                             reason = f'Campaign {resource_arn} has been idle for at least {idle_resource_threshold_hours} hours so initiating delete according to configuration.'
617 |                         else:
618 |                             detail_type = 'StopPersonalizeRecommender'
619 |                             reason = f'Recommender {resource_arn} has been idle for at least {idle_resource_threshold_hours} hours so initiating stop according to configuration.'
620 | 
621 |                         logger.info(reason)
622 | 
623 |                         put_event(
624 |                             detail_type = detail_type,
625 |                             detail = json.dumps({
626 |                                 'ARN': resource_arn,
627 |                                 'Utilization': utilization,
628 |                                 'AgeHours': resource_age_hours,
629 |                                 'IdleThresholdHours': idle_resource_threshold_hours,
630 |                                 'TotalRequestsDuringIdleThresholdHours': total_requests,
631 |                                 'Reason': reason
632 |                             }),
633 |                             resources = [ resource_arn ]
634 |                         )
635 | 
636 |                         resource_delete_stop_event_fired = True
637 |                     else:
638 |                         logger.warn(
639 |                             'Resource %s has been idle for at least %d hours but its status will not allow it to be deleted/stopped on this run',
640 |                             resource_arn, idle_resource_threshold_hours
641 |                         )
642 |                 else:
643 |                     logger.warn(
644 |                         'Resource %s is currently idle but has had %d requests within the last %d hours so does not meet idle criteria for auto-deletion/auto-stop',
645 |                         resource_arn, total_requests, idle_resource_threshold_hours
646 |                     )
647 |             else:
648 |                 logger.info(
649 |                     'Resource %s is only %d hours old and last update %s hours old; too new to consider for auto-deletion/auto-stop',
650 |                     resource_arn, resource_age_hours, resource_update_age_hours
651 |                 )
652 | 
653 |         if (not resource_delete_stop_event_fired and
654 |                 perform_hourly_checks_this_run and
655 |                 auto_adjust_min_tps and
656 |                 min_tps > 1):
657 | 
658 |             days_back = 14
659 |             end_time_tps_check = datetime.datetime.now(datetime.timezone.utc).replace(minute=0, second=0, microsecond=0)
660 |             start_time_tps_check = end_time_tps_check - datetime.timedelta(days = days_back)
661 | 
662 |             datapoints = get_sum_requests_by_hour(resource, start_time_tps_check, end_time_tps_check)
663 |             min_reqs = sys.maxsize
664 |             max_reqs = total_reqs = total_avg_tps = min_avg_tps = max_avg_tps = 0
665 | 
666 |             for datapoint in datapoints:
667 |                 total_reqs += datapoint['Value']
668 |                 min_reqs = min(min_reqs, datapoint['Value'])
669 |                 max_reqs = max(max_reqs, datapoint['Value'])
670 | 
671 |             if len(datapoints) > 0:
672 |                 total_avg_tps = int(total_reqs / (len(datapoints) * 3600))
673 |                 min_avg_tps = int(min_reqs / 3600)
674 |                 max_avg_tps = int(max_reqs / 3600)
675 | 
676 |             logger.info(
677 |                 'Performing minTPS/minRPS adjustment check for %s; min/max/avg hourly TPS over last %d days for %d datapoints: %d/%d/%.2f',
678 |                 resource_arn, days_back, len(datapoints), min_avg_tps, max_avg_tps, total_avg_tps
679 |             )
680 | 
681 |             min_age_to_update_hours = 24
682 | 
683 |             age_eligible = True
684 | 
685 |             if resource_age_hours < min_age_to_update_hours:
686 |                 logger.info(
687 |                     'Resource %s is less than %d hours old so not eligible for minTPS/minRPS adjustment yet',
688 |                     resource_arn, min_age_to_update_hours
689 |                 )
690 |                 age_eligible = False
691 | 
692 |             if age_eligible and min_avg_tps < min_tps:
693 |                 # Incrementally drop minTPS/minRPS.
694 |                 new_min_tps = max(1, int(math.floor(min_tps * .75)))
695 | 
696 |                 if is_resource_updatable(resource):
697 |                     reason = f'Step down adjustment of minTPS/minRPS for {resource_arn} down from {min_tps} to {new_min_tps} based on average hourly TPS low watermark of {min_avg_tps} over last {days_back} days'
698 |                     logger.info(reason)
699 | 
700 |                     put_event(
701 |                         detail_type = 'UpdatePersonalizeCampaignMinProvisionedTPS' if is_campaign else 'UpdatePersonalizeRecommenderMinRecommendationRPS',
702 |                         detail = json.dumps({
703 |                             'ARN': resource_arn,
704 |                             'Utilization': utilization,
705 |                             'AgeHours': resource_age_hours,
706 |                             'CurrentMinTPS': min_tps,
707 |                             'NewMinTPS': new_min_tps,
708 |                             'MinAverageTPS': min_avg_tps,
709 |                             'MaxAverageTPS': max_avg_tps,
710 |                             'Datapoints': datapoints,
711 |                             'Reason': reason
712 |                         }, default = str),
713 |                         resources = [ resource_arn ]
714 |                     )
715 |                 else:
716 |                     logger.warn(
717 |                         'Resource %s could have its minTPS/minRPS adjusted down from %d to %d based on average hourly TPS low watermark over last %d days but its status will not allow it to be updated on this run',
718 |                         resource_arn, min_tps, new_min_tps, days_back
719 |                     )
720 | 
721 |         if not resource_delete_stop_event_fired:
722 |             if auto_create_utilization_alarms:
723 |                 if create_utilization_alarm(resource_region, resource, utilization_threshold_lower_bound):
724 |                     alarms_created += 1
725 | 
726 |             if auto_create_idle_alarms:
727 |                 if create_idle_resource_alarm(resource_region, resource, idle_resource_threshold_hours):
728 |                     alarms_created += 1
729 | 
730 |     for region, metric_datas in metric_datas_by_region.items():
731 |         cw = get_client(service_name = 'cloudwatch', region_name = region)
732 | 
733 |         metric_datas_chunks = divide_chunks(metric_datas, MAX_METRICS_PER_CALL)
734 | 
735 |         for metrics_datas_chunk in metric_datas_chunks:
736 |             put_metrics(cw, metrics_datas_chunk)
737 |             all_metrics_written += len(metrics_datas_chunk)
738 | 
739 |     outcome = f'Logged {all_metrics_written} TPS utilization metrics for {resource_metrics_written} active campaigns and recommenders; {alarms_created} alarms created'
740 |     logger.info(outcome)
741 | 
742 |     if alarms_created > 0:
743 |         # At least one new alarm was created so that likely means new campaigns were created too. Let's trigger the dashboard to be rebuilt.
744 |         logger.info('Triggering rebuild of the CloudWatch dashboard since %d new alarm(s) were created', alarms_created)
745 |         put_event(
746 |             detail_type = 'BuildPersonalizeMonitorDashboard',
747 |             detail = json.dumps({
748 |                 'Reason': f'Triggered rebuild due to {alarms_created} new alarm(s) being created'
749 |             })
750 |         )
751 | 
752 |     return outcome
753 | 


--------------------------------------------------------------------------------
/src/personalize_monitor_function/requirements.txt:
--------------------------------------------------------------------------------
1 | # Note: AWS Lambda Power Tools dependency is satisfied by Lambda layer at runtime (part of deployment).
2 | 


--------------------------------------------------------------------------------
/src/personalize_stop_recommender_function/README.md:
--------------------------------------------------------------------------------
 1 | # Amazon Personalize Monitor - Stop Recommender Function
 2 | 
 3 | This Lambda function stops a Personalize recommender. It is called as the target of an EventBridge rule that matches events with the `StopPersonalizeRecommender` detail-type. The [personalize-monitor](../personalize_monitor_function/) function publishes this event when the `AutoDeleteOrStopIdleResources` deployment parameter is `Yes` AND a monitored recommender has been idle more than `IdleThresholdHours` hours. Therefore, an idle recommender is one that has not had any `GetRecommendations` calls in the last `IdleThresholdHours` hours.
 4 | 
 5 | This function will also delete any CloudWatch alarms that were dynamically created by this application for the stopped recommender. Alarms can be created for idle recommenders and low utilization recommenders via the `AutoCreateIdleAlarms` and `AutoCreateUtilizationAlarms` deployment parameters.
 6 | 
 7 | > Note that Personalize campaigns are deleted and not stopped by this application. Since model artifacts are associated with a solution version, deleting a campaign does **not** delete the actual model artifacts. See the [personalize_delete_campaign](../personalize_delete_campaign_function/) function for details.
 8 | 
 9 | ## How it works
10 | 
11 | The EventBridge event structure that triggers this function looks something like this:
12 | 
13 | ```javascript
14 | {
15 |     "source": "personalize.monitor",
16 |     "detail-type": "StopPersonalizeRecommender",
17 |     "resources": [ RECOMMENDER_ARN_TO_STOP ],
18 |     "detail": {
19 |         'ARN': RECOMMENDER_ARN_TO_STOP,
20 |         'Utilization': CURRENT_UTILIZATION,
21 |         'AgeHours': RECOMMENDER_AGE_IN_HOURS,
22 |         'IdleThresholdHours': RECOMMENDER_IDLE_HOURS,
23 |         'TotalRequestsDuringIdleThresholdHours': 0,
24 |         'Reason': DESCRIPTIVE_REASON_FOR_DELETE
25 |     }
26 | }
27 | ```
28 | 
29 | This function can also be invoked directly as part of your own operational process. The event you pass to the function only requires the recommender ARN as follows.
30 | 
31 | ```javascript
32 | {
33 |     "ARN": RECOMMENDER_ARN_TO_STOP,
34 |     "Reason": OPTIONAL_DESCRIPTIVE_REASON_FOR_DELETE
35 | }
36 | ```
37 | 
38 | The Personalize [StopRecommender](https://docs.aws.amazon.com/personalize/latest/dg/API_StopRecommender.html) API is used to stop the recommender.
39 | 
40 | ## Published events
41 | 
42 | When the recommender stop request and the deletion of any dynamically created CloudWatch alarms for the recommender have been successfully initiated by this function, two events are published to EventBridge. One event will trigger a notification to the SNS topic for this application and the other trigger the CloudWatch dashboard to be rebuilt.
43 | 
44 | ### Delete notification
45 | 
46 | The following event is published to EventBridge to signal that a campaign has been deleted.
47 | 
48 | ```javascript
49 | {
50 |     "source": "personalize.monitor",
51 |     "detail_type": "PersonalizeRecommenderStopped",
52 |     "resources": [ RECOMMENDER_ARN_STOPPED ],
53 |     "detail": {
54 |         "ARN": RECOMMENDER_ARN_STOPPED,
55 |         "Reason": DESCRIPTIVE_REASON_FOR_STOP
56 |     }
57 | }
58 | ```
59 | 
60 | An EventBridge rule is setup that will target an SNS topic with `NotificationEndpoint` as the subscriber. This is the email address you provided at deployment time. If you'd like, you can customize how these notification events are handled in the EventBridge and SNS consoles.
61 | 
62 | ### Rebuild CloudWatch dashboard
63 | 
64 | Since a monitored recommender has been stopped, the CloudWatch dashboard needs to be rebuilt so that the recommender is removed from the widgets. This is accomplished by publishing a `BuildPersonalizeMonitorDashboard` event that is processed by the [dashboard_mgmt](../dashboard_mgmt_function/) function.
65 | 
66 | ```javascript
67 | {
68 |     "source": "personalize.monitor",
69 |     "detail_type": "BuildPersonalizeMonitorDashboard",
70 |     "resources": [ RECOMMENDER_ARN_STOPPED ],
71 |     "detail": {
72 |         "ARN": RECOMMENDER_ARN_STOPPED,
73 |         "Reason": DESCRIPTIVE_REASON_FOR_REBUILD
74 |     }
75 | }
76 | ```
77 | 


--------------------------------------------------------------------------------
/src/personalize_stop_recommender_function/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/src/personalize_stop_recommender_function/__init__.py


--------------------------------------------------------------------------------
/src/personalize_stop_recommender_function/personalize_stop_recommender.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | """
 5 | Lambda function that is used to stop a Personalize recommender based on prolonged idle time
 6 | and according to configuration to automatically stop recommenders under these conditions.
 7 | Note that this function just stops the recommender; it does NOT delete the recommender. The
 8 | idea is to stop ongoing charges for an idle recommender.
 9 | """
10 | 
11 | import json
12 | import logging
13 | 
14 | from aws_lambda_powertools import Logger
15 | 
16 | from common import (
17 |     PROJECT_NAME,
18 |     ALARM_NAME_PREFIX,
19 |     extract_region,
20 |     get_client,
21 |     put_event
22 | )
23 | 
24 | logger = Logger()
25 | 
26 | def delete_alarms_for_recommender(recommender_arn):
27 |     cw = get_client(service_name = 'cloudwatch', region_name = extract_region(recommender_arn))
28 | 
29 |     alarm_names_to_delete = set()
30 | 
31 |     alarms_paginator = cw.get_paginator('describe_alarms')
32 |     for alarms_page in alarms_paginator.paginate(AlarmNamePrefix = ALARM_NAME_PREFIX, AlarmTypes=['MetricAlarm']):
33 |         for alarm in alarms_page['MetricAlarms']:
34 |             for dim in alarm['Dimensions']:
35 |                 if dim['Name'] == 'RecommenderArn' and dim['Value'] == recommender_arn:
36 |                     tags_response = cw.list_tags_for_resource(ResourceARN = alarm['AlarmArn'])
37 | 
38 |                     for tag in tags_response['Tags']:
39 |                         if tag['Key'] == 'CreatedBy' and tag['Value'] == PROJECT_NAME:
40 |                             alarm_names_to_delete.add(alarm['AlarmName'])
41 |                             break
42 | 
43 |     if alarm_names_to_delete:
44 |         # FUTURE: max check of 100
45 |         logger.info('Deleting CloudWatch alarms for recommender %s: %s', recommender_arn, alarm_names_to_delete)
46 |         cw.delete_alarms(AlarmNames=list(alarm_names_to_delete))
47 |         alarms_deleted += len(alarm_names_to_delete)
48 |     else:
49 |         logger.info('No CloudWatch alarms to delete for recommender %s', recommender_arn)
50 | 
51 | @logger.inject_lambda_context(log_event=True)
52 | def lambda_handler(event, _):
53 |     ''' Initiates stopping a Personalize recommender '''
54 |     if event.get('detail'):
55 |         recommender_arn = event['detail']['ARN']
56 |         reason = event['detail'].get('Reason')
57 |     else:
58 |         recommender_arn = event['ARN']
59 |         reason = event.get('Reason')
60 | 
61 |     region = extract_region(recommender_arn)
62 |     if not region:
63 |         raise Exception('Region could not be extracted from ARN')
64 | 
65 |     personalize = get_client(service_name = 'personalize', region_name = region)
66 | 
67 |     response = personalize.stop_recommender(recommenderArn = recommender_arn)
68 | 
69 |     if logger.isEnabledFor(logging.DEBUG):
70 |         logger.debug(json.dumps(response, indent = 2, default = str))
71 | 
72 |     if not reason:
73 |         reason = f'Amazon Personalize recommender {recommender_arn} stop initiated (reason unspecified)'
74 | 
75 |     put_event(
76 |         detail_type = 'PersonalizeRecommenderStopped',
77 |         detail = json.dumps({
78 |             'ARN': recommender_arn,
79 |             'Reason': reason
80 |         }),
81 |         resources = [ recommender_arn ]
82 |     )
83 | 
84 |     put_event(
85 |         detail_type = 'BuildPersonalizeMonitorDashboard',
86 |         detail = json.dumps({
87 |             'ARN': recommender_arn,
88 |             'Reason': reason
89 |         }),
90 |         resources = [ recommender_arn ]
91 |     )
92 | 
93 |     logger.info({
94 |         'recommenderArn': recommender_arn
95 |     })
96 | 
97 |     delete_alarms_for_recommender(recommender_arn)
98 | 
99 |     return f'Successfully initiated stop of recommender {recommender_arn}'


--------------------------------------------------------------------------------
/src/personalize_stop_recommender_function/requirements.txt:
--------------------------------------------------------------------------------
1 | # Note: AWS Lambda Power Tools dependency is satisfied by Lambda layer at runtime (part of deployment).


--------------------------------------------------------------------------------
/src/personalize_update_tps_function/README.md:
--------------------------------------------------------------------------------
  1 | # Amazon Personalize Monitor - Campaign Provisioned TPS Update Function
  2 | 
  3 | This Lambda function adjusts the `minProvisionedTPS` value for a Personalize campaign or the `minRecommendationRequestsPerSecond` for a Personalize recommender. It is called as the target of EventBridge rules for events emitted by the [personalize_monitor](../personalize_monitor_function/) function when configured to update campaigns and recommenders based on actual TPS activity. You can also incorporate this function into your own operations to scale campaigns and recommenders up and down. For example, if you know your campaign or recommender will experience a massive spike in requests at a certain time (i.e. flash sale) and you want to pre-warm your campaign or recommender capacity, you can create a [CloudWatch event](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html) to call this function 30 minutes before the expected spike in traffic to increase endpoint capacity and then again after the traffic event to lower the capacity. Alternatively, if there are certain events that occur in your application that you know will generate a predictably higher or lower volume of requests than the current `minProvisionedTPS`/`minRecommendationRequestsPerSecond` **AND** Personalize's auto-scaling will not suffice, you can use this function as a trigger to adjust `minProvisionedTPS`/`minRecommendationRequestsPerSecond` accordingly.
  4 | 
  5 | ## How it works
  6 | 
  7 | The EventBridge event structure that triggers this function for a camapaign looks something like this:
  8 | 
  9 | ```javascript
 10 | {
 11 |     "source": "personalize.monitor",
 12 |     "detail-type": "UpdatePersonalizeCampaignMinProvisionedTPS",
 13 |     "resources": [ CAMPAIGN_ARN_TO_UPDATE ],
 14 |     "detail": {
 15 |         "ARN": CAMPAIGN_ARN_TO_UPDATE,
 16 |         "Utilization": CURRENT_UTILIZATION,
 17 |         "AgeHours": CAMPAIGN_AGE_IN_HOURS,
 18 |         "CurrentMinTPS": CURRENT_MIN_PROVISIONED_TPS,
 19 |         "NewMinTPS": NEW_MIN_PROVISIONED_TPS,
 20 |         "MinAverageTPS": MIN_AVERAGE_TPS_LAST_24_HOURS,
 21 |         "MaxAverageTPS": MAX_AVERATE_TPS_LAST_24_HOURS,
 22 |         "Datapoints": [ CW_METRIC_DATAPOINTS_LAST_24_HOURS ],
 23 |         "Reason": DESCRIPTIVE_REASON_FOR_UPDATE
 24 |     }
 25 | }
 26 | ```
 27 | 
 28 | Similarly, the EventBridge event structure that triggers this function for a recommender looks something like this:
 29 | 
 30 | ```javascript
 31 | {
 32 |     "source": "personalize.monitor",
 33 |     "detail-type": "UpdatePersonalizeRecommenderMinRecommendationRPS",
 34 |     "resources": [ RECOMMENDER_ARN_TO_UPDATE ],
 35 |     "detail": {
 36 |         "ARN": RECOMMENDER_ARN_TO_UPDATE,
 37 |         "Utilization": CURRENT_UTILIZATION,
 38 |         "AgeHours": RECOMMENDER_AGE_IN_HOURS,
 39 |         "CurrentMinTPS": CURRENT_MIN_RECOMMENDATION_RPS,
 40 |         "NewMinTPS": NEW_MIN_RECOMMENDATION_RPS,
 41 |         "MinAverageTPS": MIN_AVERAGE_TPS_LAST_24_HOURS,
 42 |         "MaxAverageTPS": MAX_AVERATE_TPS_LAST_24_HOURS,
 43 |         "Datapoints": [ CW_METRIC_DATAPOINTS_LAST_24_HOURS ],
 44 |         "Reason": DESCRIPTIVE_REASON_FOR_UPDATE
 45 |     }
 46 | }
 47 | ```
 48 | 
 49 | This function can also be invoked directly as part of your own operational process. The event you pass to the function only requires the campaign ARN and new `minProvisionedTPS` as follows.
 50 | 
 51 | ```javascript
 52 | {
 53 |     "ARN": "CAMPAIGN_OR_RECOMMENDER_ARN_HERE",
 54 |     "NewMinTPS": NEW_MIN_TPS_HERE,
 55 |     "Reason": DESCRIPTIVE_REASON_FOR_UPDATE
 56 | }
 57 | ```
 58 | 
 59 | For Personalize campaigns, the [UpdateCampaign](https://docs.aws.amazon.com/personalize/latest/dg/API_UpdateCampaign.html) API is used to update the `minProvisionedTPS` value. For Personalize recommenders, the [UpdateRecommender](https://docs.aws.amazon.com/personalize/latest/dg/API_UpdateRecommender.html) API is used to update the `minRecommendationRequestsPerSecond` value.
 60 | 
 61 | ## Published events
 62 | 
 63 | When an update of a campaign's `minProvisionedTPS` or recommender's `minRecommendationRequestsPerSecond` has been successfully initiated by this function, an event is published to EventBridge to trigger a notification.
 64 | 
 65 | > Since it can take several minutes for a campaign or recommender to redeploy after updating its `minProvisionedTPS` or `minRecommendationRequestsPerSecond`, you will receive the notification when the redeploy starts. The campaign/recommender will continue to respond to `GetRecommendations`/`GetPersonalizedRanking` API requests while it is redeploying. **Therefore, there will be no interruption of service while it's redeploying.**
 66 | 
 67 | ### Update minProvisionedTPS notification
 68 | 
 69 | The following event is published to EventBridge to signal that an update to a campaign has been initiated.
 70 | 
 71 | ```javascript
 72 | {
 73 |     "source": "personalize.monitor",
 74 |     "detail_type": "PersonalizeCampaignMinProvisionedTPSUpdated",
 75 |     "resources": [ CAMPAIGN_ARN_UPDATED ],
 76 |     "detail": {
 77 |         "ARN": CAMPAIGN_ARN_UPDATED,
 78 |         "NewMinTPS": NEW_TPS,
 79 |         "Reason": DESCRIPTIVE_REASON_FOR_DELETE
 80 |     }
 81 | }
 82 | ```
 83 | 
 84 | ### Update minRecommendationRequestsPerSecond notification
 85 | 
 86 | The following event is published to EventBridge to signal that an update to a recommender has been initiated.
 87 | 
 88 | ```javascript
 89 | {
 90 |     "source": "personalize.monitor",
 91 |     "detail_type": "PersonalizeRecommenderMinRecommendationRPSUpdated",
 92 |     "resources": [ RECOMMENDER_ARN_UPDATED ],
 93 |     "detail": {
 94 |         "ARN": RECOMMENDER_ARN_UPDATED,
 95 |         "NewMinTPS": NEW_TPS,
 96 |         "Reason": DESCRIPTIVE_REASON_FOR_DELETE
 97 |     }
 98 | }
 99 | ```
100 | 
101 | An EventBridge rule is setup that will target an SNS topic with `NotificationEndpoint` as the subscriber. This is the email address you provided at deployment time. If you'd like, you can customize how these notification events are handled or add your own targets in the EventBridge and SNS consoles.
102 | 


--------------------------------------------------------------------------------
/src/personalize_update_tps_function/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-personalize-monitor/24b85287123ac09369a22d61cfca7d179a789f33/src/personalize_update_tps_function/__init__.py


--------------------------------------------------------------------------------
/src/personalize_update_tps_function/personalize_update_tps.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | """
 5 | Utility Lambda function that can be used to update a Personalize campaign's minProvisionedTPS value
 6 | based on triggers such as CloudWatch event rules (i.e. cron) or application events.
 7 | """
 8 | 
 9 | import json
10 | import json
11 | import logging
12 | 
13 | from aws_lambda_powertools import Logger
14 | 
15 | from common import (
16 |     extract_region,
17 |     extract_resource_type,
18 |     get_client,
19 |     put_event
20 | )
21 | 
22 | logger = Logger()
23 | 
24 | @logger.inject_lambda_context(log_event=True)
25 | def lambda_handler(event, _):
26 |     ''' Updates the minProvisionedTPS value for an existing Personalize campaign '''
27 | 
28 |     if event.get('detail'):
29 |         arn = event['detail']['ARN']
30 |         min_tps = event['detail']['NewMinTPS']
31 |         reason = event['detail']['Reason']
32 |     else:
33 |         arn = event['ARN']
34 |         min_tps = event['NewMinTPS']
35 |         reason = event.get('Reason')
36 | 
37 |     region = extract_region(arn)
38 |     if not region:
39 |         raise Exception('Region could not be extracted from ARN in event')
40 | 
41 |     resource_type = extract_resource_type(arn)
42 |     if not resource_type:
43 |         raise Exception('Resource type could not be extracted from ARN in event')
44 | 
45 |     if resource_type not in ['campaign', 'recommender']:
46 |         raise Exception('Resource type represented by ARN in event is not "campaign" or "recommender"')
47 | 
48 |     if min_tps < 1:
49 |         raise ValueError(f'"NewMinTPS" must be >= 1')
50 | 
51 |     personalize = get_client(service_name = 'personalize', region_name = region)
52 | 
53 |     if resource_type == 'campaign':
54 |         response = personalize.update_campaign(campaignArn = arn, minProvisionedTPS = min_tps)
55 |         notification_detail_type = 'PersonalizeCampaignMinProvisionedTPSUpdated'
56 |     else:
57 |         response = personalize.describe_recommender(recommenderArn = arn)
58 | 
59 |         config = response['recommender']['recommenderConfig']
60 |         config['minRecommendationRequestsPerSecond'] = min_tps
61 | 
62 |         response = personalize.update_recommender(recommenderArn = arn, recommenderConfig = config)
63 |         notification_detail_type = 'PersonalizeRecommenderMinRecommendationRPSUpdated'
64 | 
65 |     if logger.isEnabledFor(logging.DEBUG):
66 |         logger.debug(json.dumps(response, indent = 2, default = str))
67 | 
68 |     if not reason:
69 |         reason = f'Amazon Personalize {resource_type} {arn} min TPS update initiated (reason unspecified)'
70 | 
71 |     put_event(
72 |         detail_type = notification_detail_type,
73 |         detail = json.dumps({
74 |             'ARN': arn,
75 |             'NewMinTPS': min_tps,
76 |             'Reason': reason
77 |         }),
78 |         resources = [ arn ]
79 |     )
80 | 
81 |     logger.info({
82 |         'arn': arn,
83 |         'newMinTPS': min_tps
84 |     })
85 | 
86 |     return f'Successfully initiated update of min TPS to {min_tps} for {resource_type} {arn}'


--------------------------------------------------------------------------------
/src/personalize_update_tps_function/requirements.txt:
--------------------------------------------------------------------------------
1 | # Note: AWS Lambda Power Tools dependency is satisfied by Lambda layer at runtime (part of deployment).


--------------------------------------------------------------------------------
/template.yaml:
--------------------------------------------------------------------------------
  1 | AWSTemplateFormatVersion: '2010-09-09'
  2 | Transform: AWS::Serverless-2016-10-31
  3 | Description: >
  4 |   (P9E-MONITOR) -Personalize monitoring tools including CloudWatch metrics, alarms, and dashboard; optional automated cost optimization
  5 | 
  6 | Metadata:
  7 |   AWS::ServerlessRepo::Application:
  8 |     Name: Amazon-Personalize-Monitor
  9 |     Description: >
 10 |       Creates a CloudWatch dashboard for monitoring the utilization of Amazon Personalize
 11 |       campaigns and recommenders; creates CloudWatch alarms based on a user-defined threshold; and
 12 |       includes automated cost optimization actions.
 13 |     Author: AWS Applied AI - Personalize
 14 |     SpdxLicenseId: MIT-0
 15 |     LicenseUrl: LICENSE
 16 |     ReadmeUrl: README-SAR.md
 17 |     Labels: ['Personalize', 'CloudWatch', 'Monitoring']
 18 |     HomePageUrl: https://github.com/aws-samples/amazon-personalize-monitor
 19 |     SemanticVersion: 1.2.1
 20 |     SourceCodeUrl: https://github.com/aws-samples/amazon-personalize-monitor
 21 | 
 22 |   AWS::CloudFormation::Interface:
 23 |     ParameterGroups:
 24 |       - Label:
 25 |           default: "Amazon Personalize inference resources to monitor"
 26 |         Parameters:
 27 |           - CampaignARNs
 28 |           - RecommenderARNs
 29 |           - Regions
 30 |       - Label:
 31 |           default: "CloudWatch alarm configuration"
 32 |         Parameters:
 33 |           - AutoCreateUtilizationAlarms
 34 |           - UtilizationThresholdAlarmLowerBound
 35 |           - AutoCreateIdleAlarms
 36 |           - IdleThresholdHours
 37 |       - Label:
 38 |           default: "Cost optimization actions"
 39 |         Parameters:
 40 |           - AutoAdjustMinTPS
 41 |           - AutoDeleteOrStopIdleResources
 42 |       - Label:
 43 |           default: "Notifications"
 44 |         Parameters:
 45 |           - NotificationEndpoint
 46 |     ParameterLabels:
 47 |       CampaignARNs:
 48 |         default: "Personalize campaign ARNs to monitor"
 49 |       RecommenderARNs:
 50 |         default: "Personalize recommender ARNs to monitor"
 51 |       Regions:
 52 |         default: "AWS regions to monitor"
 53 |       AutoCreateUtilizationAlarms:
 54 |         default: "Automatically create campaign/recommender utilization CloudWatch alarms?"
 55 |       UtilizationThresholdAlarmLowerBound:
 56 |         default: "Campaign/recommender utilization alarm lower bound threshold"
 57 |       AutoCreateIdleAlarms:
 58 |         default: "Automatically create idle campaign/recommender CloudWatch alarms?"
 59 |       IdleThresholdHours:
 60 |         default: "Number of hours without requests to be considered idle"
 61 |       AutoDeleteOrStopIdleResources:
 62 |         default: "Automatically delete idle campaigns and stop idle recommenders in idle alarm state?"
 63 |       AutoAdjustMinTPS:
 64 |         default: "Automatically adjust/lower minProvisionedTPS/minRecommendationRequestsPerSecond for campaigns/recommenders in utilization alarm state?"
 65 |       NotificationEndpoint:
 66 |         default: "Email address to receive notifications"
 67 | 
 68 | Parameters:
 69 |   CampaignARNs:
 70 |     Type: String
 71 |     Description: >
 72 |       Comma separated list of Amazon Personalize campaign ARNs to monitor or 'all' to dynamically monitor all active campaigns.
 73 |     Default: 'all'
 74 | 
 75 |   RecommenderARNs:
 76 |     Type: String
 77 |     Description: >
 78 |       Comma separated list of Amazon Personalize recommender ARNs to monitor or 'all' to dynamically monitor all active recommenders.
 79 |     Default: 'all'
 80 | 
 81 |   Regions:
 82 |     Type: String
 83 |     Description: >
 84 |       Comma separated list of AWS region names. When using 'all' for CampaignARNs or RecommenderARNs, this parameter can be used
 85 |       to control the region(s) where the Personalize Monitor looks for active Personalize campaigns and recommenders. When not specified,
 86 |       the region where you deploy this application will be used.
 87 | 
 88 |   AutoCreateUtilizationAlarms:
 89 |     Type: String
 90 |     Description: >
 91 |       Whether to automatically create CloudWatch alarms for campaign/recommender utilization for monitored campaigns/recommenders. Valid values: Yes/No.
 92 |     AllowedValues:
 93 |       - 'Yes'
 94 |       - 'No'
 95 |     Default: 'Yes'
 96 | 
 97 |   UtilizationThresholdAlarmLowerBound:
 98 |     Type: Number
 99 |     Description: >
100 |       Utilization alarm threshold value (in percent). When a monitored campaign's or recommender's utilization falls below this value,
101 |       the alarm state will be set to ALARM. Valid values: 0-1000 (integer).
102 |     MinValue: 0
103 |     MaxValue: 1000
104 |     Default: 100
105 | 
106 |   AutoAdjustMinTPS:
107 |     Type: String
108 |     Description: >
109 |       Whether to automatically adjust minProvisionedTPS (campaigns) or minRecommendationRequestsPerSecond (recommenders) down to lowest average TPS over
110 |       rolling 24 hour window. The minProvisionedTPS/minRecommendationRequestsPerSecond will never be increased. Valid values: Yes/No.
111 |     AllowedValues:
112 |       - 'Yes'
113 |       - 'No'
114 |     Default: 'Yes'
115 | 
116 |   AutoCreateIdleAlarms:
117 |     Type: String
118 |     Description: >
119 |       Whether to automatically create CloudWatch alarms for detecting idle campaigns and recommenders. Valid values: Yes/No.
120 |     AllowedValues:
121 |       - 'Yes'
122 |       - 'No'
123 |     Default: 'Yes'
124 | 
125 |   IdleThresholdHours:
126 |     Type: Number
127 |     Description: >
128 |       Number of consecutive idle hours before a campaign is automatically deleted or recommender is automatically stopped only if AutoDeleteOrStopIdleResources
129 |       is Yes. Valid values: 2-48 (integer).
130 |     MinValue: 2
131 |     MaxValue: 48
132 |     Default: 24
133 | 
134 |   AutoDeleteOrStopIdleResources:
135 |     Type: String
136 |     Description: >
137 |       Whether to automatically delete campaigns and stop recommenders that have been idle for IdleThresholdHours consecutive hours. Valid values: Yes/No.
138 |     AllowedValues:
139 |       - 'Yes'
140 |       - 'No'
141 |     Default: 'No'
142 | 
143 |   NotificationEndpoint:
144 |     Type: String
145 |     Description: >
146 |       Email address to receive CloudWatch alarm and other monitoring notifications.
147 | 
148 | Globals:
149 |   Function:
150 |     Timeout: 5
151 |     Runtime: python3.9
152 | 
153 | Resources:
154 |   CommonLayer:
155 |     Type: AWS::Serverless::LayerVersion
156 |     Properties:
157 |       ContentUri: src/layer
158 |       CompatibleRuntimes:
159 |         - python3.9
160 |     Metadata:
161 |       BuildMethod: python3.9
162 | 
163 |   MonitorFunction:
164 |     Type: AWS::Serverless::Function
165 |     Properties:
166 |       Description: Amazon Personalize monitor function that updates custom CloudWatch metrics and monitors campaign utilization every 5 minutes
167 |       Timeout: 30
168 |       CodeUri: src/personalize_monitor_function
169 |       Handler: personalize_monitor.lambda_handler
170 |       Layers:
171 |         - !Sub 'arn:${AWS::Partition}:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV2:24'
172 |         - !Ref CommonLayer
173 |       Policies:
174 |         - Statement:
175 |           - Sid: PersonalizePolicy
176 |             Effect: Allow
177 |             Action:
178 |               - personalize:DescribeCampaign
179 |               - personalize:DescribeRecommender
180 |               - personalize:DescribeSolutionVersion
181 |               - personalize:ListCampaigns
182 |               - personalize:ListRecommenders
183 |             Resource: '*'
184 |           - Sid: CloudWatchPolicy
185 |             Effect: Allow
186 |             Action:
187 |               - cloudwatch:DescribeAlarmsForMetric
188 |               - cloudwatch:DisableAlarmActions
189 |               - cloudwatch:EnableAlarmActions
190 |               - cloudwatch:GetMetricData
191 |               - cloudwatch:PutMetricAlarm
192 |               - cloudwatch:PutMetricData
193 |             Resource: '*'
194 |           - Sid: EventBridgePolicy
195 |             Effect: Allow
196 |             Action:
197 |               - events:DescribeRule
198 |               - events:PutEvents
199 |               - events:PutRule
200 |               - events:PutTargets
201 |             Resource: '*'
202 |           - Sid: SnsPolicy
203 |             Effect: Allow
204 |             Action:
205 |               - sns:CreateTopic
206 |               - sns:ListSubscriptionsByTopic
207 |               - sns:SetTopicAttributes
208 |               - sns:Subscribe
209 |             Resource: !Sub 'arn:${AWS::Partition}:sns:*:${AWS::AccountId}:PersonalizeMonitorNotifications'
210 |           - Sid: SnsSubPolicy
211 |             Effect: Allow
212 |             Action:
213 |               - sns:GetSubscriptionAttributes
214 |             Resource: '*'
215 |       Events:
216 |         ScheduledEvent:
217 |           Type: Schedule
218 |           Properties:
219 |             Description: Triggers primary Personalize Monitor monitoring logic
220 |             Schedule: cron(0/5 * * * ? *)
221 |             Enabled: True
222 |       Environment:
223 |         Variables:
224 |           CampaignARNs: !Ref CampaignARNs
225 |           RecommenderARNs: !Ref RecommenderARNs
226 |           Regions: !Ref Regions
227 |           NotificationEndpoint: !Ref NotificationEndpoint
228 |           AutoCreateUtilizationAlarms: !Ref AutoCreateUtilizationAlarms
229 |           UtilizationThresholdAlarmLowerBound: !Ref UtilizationThresholdAlarmLowerBound
230 |           AutoCreateIdleAlarms: !Ref AutoCreateIdleAlarms
231 |           IdleThresholdHours: !Ref IdleThresholdHours
232 |           AutoDeleteOrStopIdleResources: !Ref AutoDeleteOrStopIdleResources
233 |           AutoAdjustMinTPS: !Ref AutoAdjustMinTPS
234 | 
235 |   DashboardManagementFunction:
236 |     Type: AWS::Serverless::Function
237 |     Properties:
238 |       Description: Amazon Personalize monitor function that updates the CloudWatch dashboard hourly and when campaigns are added/deleted
239 |       Timeout: 15
240 |       CodeUri: src/dashboard_mgmt_function
241 |       Handler: dashboard_mgmt.lambda_handler
242 |       AutoPublishAlias: live
243 |       Layers:
244 |         - !Sub 'arn:${AWS::Partition}:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV2:24'
245 |         - !Ref CommonLayer
246 |       Policies:
247 |         - Statement:
248 |           - Sid: PersonalizePolicy
249 |             Effect: Allow
250 |             Action:
251 |               - personalize:DescribeCampaign
252 |               - personalize:DescribeDatasetGroup
253 |               - personalize:DescribeRecommender
254 |               - personalize:DescribeSolutionVersion
255 |               - personalize:ListCampaigns
256 |               - personalize:ListRecommenders
257 |             Resource: '*'
258 |           - Sid: DashboardPolicy
259 |             Effect: Allow
260 |             Action:
261 |               - cloudwatch:DeleteDashboards
262 |               - cloudwatch:PutDashboard
263 |             Resource: '*'
264 |       Environment:
265 |         Variables:
266 |           CampaignARNs: !Ref CampaignARNs
267 |           RecommenderARNs: !Ref RecommenderARNs
268 |           Regions: !Ref Regions
269 |       Events:
270 |         EBRule:
271 |           Type: EventBridgeRule
272 |           Properties:
273 |             Pattern:
274 |               source:
275 |                 - personalize.monitor
276 |               detail-type:
277 |                 - BuildPersonalizeMonitorDashboard
278 |         ScheduledEvent:
279 |           Type: Schedule
280 |           Properties:
281 |             Description: Hourly rebuild of Personalize Monitor CloudWatch dashboard
282 |             Schedule: cron(3 * * * ? *)
283 |             Enabled: True
284 | 
285 |   DeployDashboardCustomResource:
286 |     Type: Custom::DashboardCreate
287 |     Properties:
288 |       ServiceToken: !GetAtt DashboardManagementFunction.Arn
289 |       CampaignARNs: !Ref CampaignARNs
290 |       RecommenderARNs: !Ref RecommenderARNs
291 |       Regions: !Ref Regions
292 |       AutoCreateUtilizationAlarms: !Ref AutoCreateUtilizationAlarms
293 |       UtilizationThresholdAlarmLowerBound: !Ref UtilizationThresholdAlarmLowerBound
294 |       AutoCreateIdleAlarms: !Ref AutoCreateIdleAlarms
295 |       IdleThresholdHours: !Ref IdleThresholdHours
296 |       AutoDeleteOrStopIdleResources: !Ref AutoDeleteOrStopIdleResources
297 |       AutoAdjustMinTPS: !Ref AutoAdjustMinTPS
298 | 
299 |   UpdateTPSFunction:
300 |     Type: AWS::Serverless::Function
301 |     Properties:
302 |       Description: Amazon Personalize monitor function that updates the minProvisionedTPS for a campaign or the minRecommendationRequestsPerSecond for a recommender in response to an event
303 |       CodeUri: src/personalize_update_tps_function
304 |       Handler: personalize_update_tps.lambda_handler
305 |       Layers:
306 |         - !Sub 'arn:${AWS::Partition}:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV2:24'
307 |         - !Ref CommonLayer
308 |       Policies:
309 |         - Statement:
310 |           - Sid: PersonalizePolicy
311 |             Effect: Allow
312 |             Action:
313 |               - personalize:DescribeRecommender
314 |               - personalize:UpdateCampaign
315 |               - personalize:UpdateRecommender
316 |             Resource: '*'
317 |           - Sid: EventBridgePolicy
318 |             Effect: Allow
319 |             Action:
320 |               - events:PutEvents
321 |             Resource: '*'
322 |       Events:
323 |         EBRule:
324 |           Type: EventBridgeRule
325 |           Properties:
326 |             Pattern:
327 |               source:
328 |                 - personalize.monitor
329 |               detail-type:
330 |                 - UpdatePersonalizeCampaignMinProvisionedTPS
331 |                 - UpdatePersonalizeRecommenderMinRecommendationRPS
332 | 
333 |   DeleteCampaignFunction:
334 |     Type: AWS::Serverless::Function
335 |     Properties:
336 |       Description: Amazon Personalize monitor function that deletes a campaign in response to an event
337 |       CodeUri: src/personalize_delete_campaign_function
338 |       Handler: personalize_delete_campaign.lambda_handler
339 |       Layers:
340 |         - !Sub 'arn:${AWS::Partition}:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV2:24'
341 |         - !Ref CommonLayer
342 |       Policies:
343 |         - Statement:
344 |           - Sid: PersonalizePolicy
345 |             Effect: Allow
346 |             Action:
347 |               - personalize:DeleteCampaign
348 |             Resource: '*'
349 |           - Sid: EventBridgePolicy
350 |             Effect: Allow
351 |             Action:
352 |               - events:PutEvents
353 |             Resource: '*'
354 |           - Sid: CloudWatchFindAlarmsPolicy
355 |             Effect: Allow
356 |             Action:
357 |               - cloudwatch:DescribeAlarms
358 |               - cloudwatch:ListTagsForResource
359 |             Resource: '*'
360 |           - Sid: CloudWatchDeletePolicy
361 |             Effect: Allow
362 |             Action:
363 |               - cloudwatch:DeleteAlarms
364 |             Resource: !Sub 'arn:${AWS::Partition}:cloudwatch:*:${AWS::AccountId}:alarm:PersonalizeMonitor-*'
365 |       Events:
366 |         EBCustomRule:
367 |           Type: EventBridgeRule
368 |           Properties:
369 |             Pattern:
370 |               source:
371 |                 - personalize.monitor
372 |               detail-type:
373 |                 - DeletePersonalizeCampaign
374 | 
375 |   StopRecommenderFunction:
376 |     Type: AWS::Serverless::Function
377 |     Properties:
378 |       Description: Amazon Personalize monitor function that stops a recommender in response to an event
379 |       CodeUri: src/personalize_stop_recommender_function
380 |       Handler: personalize_stop_recommender.lambda_handler
381 |       Layers:
382 |         - !Sub 'arn:${AWS::Partition}:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV2:24'
383 |         - !Ref CommonLayer
384 |       Policies:
385 |         - Statement:
386 |           - Sid: PersonalizePolicy
387 |             Effect: Allow
388 |             Action:
389 |               - personalize:StopRecommender
390 |             Resource: '*'
391 |           - Sid: EventBridgePolicy
392 |             Effect: Allow
393 |             Action:
394 |               - events:PutEvents
395 |             Resource: '*'
396 |           - Sid: CloudWatchFindAlarmsPolicy
397 |             Effect: Allow
398 |             Action:
399 |               - cloudwatch:DescribeAlarms
400 |               - cloudwatch:ListTagsForResource
401 |             Resource: '*'
402 |           - Sid: CloudWatchDeletePolicy
403 |             Effect: Allow
404 |             Action:
405 |               - cloudwatch:DeleteAlarms
406 |             Resource: !Sub 'arn:${AWS::Partition}:cloudwatch:*:${AWS::AccountId}:alarm:PersonalizeMonitor-*'
407 |       Events:
408 |         EBCustomRule:
409 |           Type: EventBridgeRule
410 |           Properties:
411 |             Pattern:
412 |               source:
413 |                 - personalize.monitor
414 |               detail-type:
415 |                 - StopPersonalizeRecommender
416 | 
417 |   CleanupFunction:
418 |     Type: AWS::Serverless::Function
419 |     Properties:
420 |       Description: Amazon Personalize monitor custom resource function that cleans up directly created resources when the application is deleted
421 |       Timeout: 15
422 |       CodeUri: src/cleanup_resources_function
423 |       Handler: cleanup_resources.lambda_handler
424 |       AutoPublishAlias: live
425 |       Layers:
426 |         - !Sub 'arn:${AWS::Partition}:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV2:24'
427 |         - !Ref CommonLayer
428 |       Policies:
429 |         - Statement:
430 |           - Sid: PersonalizePolicy
431 |             Effect: Allow
432 |             Action:
433 |               - personalize:ListCampaigns
434 |               - personalize:ListRecommenders
435 |             Resource: '*'
436 |           - Sid: CloudWatchFindAlarmsPolicy
437 |             Effect: Allow
438 |             Action:
439 |               - cloudwatch:DescribeAlarms
440 |               - cloudwatch:ListTagsForResource
441 |             Resource: '*'
442 |           - Sid: CloudWatchDeletePolicy
443 |             Effect: Allow
444 |             Action:
445 |               - cloudwatch:DeleteAlarms
446 |             Resource: !Sub 'arn:${AWS::Partition}:cloudwatch:*:${AWS::AccountId}:alarm:PersonalizeMonitor-*'
447 |           - Sid: EventBridgePolicy
448 |             Effect: Allow
449 |             Action:
450 |               - events:DeleteRule
451 |               - events:RemoveTargets
452 |             Resource: !Sub 'arn:${AWS::Partition}:events:*:${AWS::AccountId}:rule/PersonalizeMonitor-NotificationsRule'
453 |           - Sid: SnsPolicy
454 |             Effect: Allow
455 |             Action:
456 |               - sns:DeleteTopic
457 |             Resource: !Sub 'arn:${AWS::Partition}:sns:*:${AWS::AccountId}:PersonalizeMonitorNotifications'
458 |       Environment:
459 |         Variables:
460 |           CampaignARNs: !Ref CampaignARNs
461 |           RecommenderARNs: !Ref RecommenderARNs
462 |           Regions: !Ref Regions
463 | 
464 |   CleanupCustomResource:
465 |     Type: Custom::Cleanup
466 |     Properties:
467 |       ServiceToken: !GetAtt CleanupFunction.Arn
468 |       CampaignARNs: !Ref CampaignARNs
469 |       RecommenderARNs: !Ref RecommenderARNs
470 |       Regions: !Ref Regions
471 |       AutoCreateUtilizationAlarms: !Ref AutoCreateUtilizationAlarms
472 |       UtilizationThresholdAlarmLowerBound: !Ref UtilizationThresholdAlarmLowerBound
473 |       AutoCreateIdleAlarms: !Ref AutoCreateIdleAlarms
474 |       IdleThresholdHours: !Ref IdleThresholdHours
475 |       AutoDeleteOrStopIdleResources: !Ref AutoDeleteOrStopIdleResources
476 |       AutoAdjustMinTPS: !Ref AutoAdjustMinTPS
477 | 
478 | Outputs:
479 |   MonitorFunction:
480 |     Description: "Personalize monitor Function ARN"
481 |     Value: !GetAtt MonitorFunction.Arn
482 | 
483 |   DashboardManagementFunction:
484 |     Description: "CloudWatch Dashboard Management Function ARN"
485 |     Value: !GetAtt DashboardManagementFunction.Arn
486 | 
487 |   UpdateTPSFunction:
488 |     Description: "Update Personalize Campaign/Recommender TPS Function ARN"
489 |     Value: !GetAtt UpdateTPSFunction.Arn
490 | 
491 |   DeleteCampaignFunction:
492 |     Description: "Delete Personalize Campaign Function ARN"
493 |     Value: !GetAtt DeleteCampaignFunction.Arn
494 | 
495 |   StopRecommenderFunction:
496 |     Description: "Stop Personalize Recommender Function ARN"
497 |     Value: !GetAtt StopRecommenderFunction.Arn
498 | 


--------------------------------------------------------------------------------