├── VERSION ├── libraries ├── RW │ ├── AWS │ │ ├── __init__.py │ │ ├── mixins │ │ │ └── __init__.py │ │ ├── strategies │ │ │ ├── __init__.py │ │ │ ├── GetClientStrategy.py │ │ │ ├── UserGetClientStrategy.py │ │ │ └── RoleGetClientStrategy.py │ │ └── robot_tests │ │ │ ├── test_queries.py │ │ │ └── cloudwatchlinks.robot │ ├── HashiCorp │ │ ├── __init__.py │ │ ├── robot_tests │ │ │ └── health.robot │ │ └── Vault.py │ ├── Chat │ │ ├── strategies │ │ │ ├── __init__.py │ │ │ ├── ChatProviderStrategy.py │ │ │ ├── GoogleChatProviderStrategy.py │ │ │ ├── DiscordChatProviderStrategy.py │ │ │ ├── SlackChatProviderStrategy.py │ │ │ └── RocketChatProviderStrategy.py │ │ └── robot_tests │ │ │ └── notify_multi.robot │ ├── Curl │ │ ├── __init__.py │ │ └── Curl.py │ ├── Rest │ │ └── __init__.py │ ├── Sysdig │ │ ├── __init__.py │ │ └── robot_tests │ │ │ └── get.robot │ ├── Utils │ │ ├── __init__.py │ │ └── Check.py │ ├── ArgoCD │ │ ├── __init__.py │ │ └── argocd.py │ ├── Discord │ │ ├── __init__.py │ │ ├── robot_tests │ │ │ └── send.robot │ │ └── Discord.py │ ├── Patroni │ │ └── __init__.py │ ├── Postgres │ │ └── __init__.py │ ├── RunWhen │ │ └── __init__.py │ ├── gRPC │ │ ├── __init__.py │ │ └── grpcurl.py │ ├── Datadog │ │ └── __init__.py │ ├── Artifactory │ │ ├── __init__.py │ │ ├── Artifactory.py │ │ └── robot_tests │ │ │ └── health.robot │ ├── CertManager │ │ ├── __init__.py │ │ ├── robot_tests │ │ │ └── certs.robot │ │ └── cert_manager.py │ ├── Prometheus │ │ └── __init__.py │ ├── Uptime │ │ ├── __init__.py │ │ ├── robot_tests │ │ │ └── component_status.robot │ │ └── StatusPage.py │ ├── SocialScrape │ │ ├── __init__.py │ │ └── SocialScrape.py │ ├── K8s │ │ ├── __init__.py │ │ ├── pdb_tasks_mixin.py │ │ ├── statefulset_tasks_mixin.py │ │ ├── robot_tests │ │ │ └── exec.robot │ │ ├── k8s.py │ │ ├── job_tasks_mixin.py │ │ └── daemonset_tasks_mixin.py │ ├── GCP │ │ ├── __init__.py │ │ ├── robot_tests │ │ │ ├── chat.robot │ │ │ └── servicehealth.robot │ │ ├── Chat.py │ │ └── GCloudCLI.py │ ├── __init__.py │ ├── MyTest.py │ ├── Slack.py │ ├── MSTeams.py │ ├── GitHub │ │ └── robot_tests │ │ │ ├── status.robot │ │ │ └── actions.robot │ ├── GitLab.py │ ├── DNS.py │ └── Pingdom.py ├── __init__.py ├── pyproject.toml └── README.md ├── .gitbook.yaml ├── MANIFEST.in ├── setup.cfg ├── .gitignore ├── docs └── GitHub_Banner.jpg ├── codebundles ├── grafana-health │ ├── README.md │ └── sli.robot ├── dns-latency │ ├── README.md │ └── sli.robot ├── opsgenie-alert │ ├── README.md │ └── runbook.robot ├── pingdom-health │ ├── README.md │ └── sli.robot ├── http-latency │ ├── README.md │ └── sli.robot ├── slo-default │ ├── README.md │ └── queries.yaml ├── elasticsearch-health │ ├── README.md │ └── sli.robot ├── datadog-system-load │ ├── README.md │ └── sli.robot ├── k8s-kubectl-top │ └── README.md ├── k8s-triage-patroni │ └── README.md ├── msteams-send-message │ ├── README.md │ └── runbook.robot ├── ping-host-availability │ ├── README.md │ └── sli.robot ├── sysdig-monitor-metric │ ├── README.md │ └── sli.robot ├── http-ok │ ├── README.md │ └── sli.robot ├── cert-manager-healthcheck │ ├── README.md │ └── sli.robot ├── artifactory-ok │ ├── README.md │ └── sli.robot ├── aws-billing-tagcosts │ └── README.md ├── aws-vm-triage │ └── README.md ├── gitlab-get-repos-latency │ ├── README.md │ └── sli.robot ├── uptimecom-component-ok │ ├── README.md │ └── sli.robot ├── aws-cloudwatch-logquery │ └── README.md ├── aws-s3-stalecheck │ ├── README.md │ └── runbook.robot ├── gcp-opssuite-logquery │ ├── README.md │ └── sli.robot ├── k8s-triage-deploymentreplicas │ └── README.md ├── aws-cloudformation-triage │ └── README.md ├── aws-cloudwatch-metricquery │ └── README.md ├── aws-ec2-securitycheck │ └── README.md ├── gcp-opssuite-logquery-dashboard │ ├── README.md │ └── runbook.robot ├── k8s-kubectl-apiserverhealth │ └── README.md ├── sysdig-monitor-promqlmetric │ └── README.md ├── github-status-maintenances │ ├── README.md │ └── sli.robot ├── aws-billing-costsacrosstags │ └── README.md ├── cert-manager-expirations │ ├── README.md │ └── sli.robot ├── k8s-triage-statefulset │ └── README.md ├── aws-cloudwatch-logquery-rowcount-zeroerror │ └── README.md ├── web-triage │ └── README.md ├── aws-cloudwatch-metricquery-dashboard │ ├── README.md │ └── runbook.robot ├── k8s-decommission-workloads │ └── README.md ├── k8s-patroni-healthcheck │ └── README.md ├── aws-cloudformation-stackevents-count │ └── README.md ├── gitlab-availability │ ├── README.md │ ├── sli.robot │ └── runbook.robot ├── aws-account-limit │ ├── README.md │ ├── sli.robot │ └── runbook.robot ├── github-status-incidents │ ├── README.md │ └── sli.robot ├── jira-search-issues-latency │ ├── README.md │ ├── sli.robot │ └── runbook.robot ├── prometheus-queryrange-transform │ └── README.md ├── vault-ok │ ├── README.md │ └── sli.robot ├── aws-cloudwatch-tagmetricquery │ └── README.md ├── github-get-repos-latency │ ├── README.md │ ├── sli.robot │ └── runbook.robot ├── github-actions-workflowtiming │ ├── README.md │ └── sli.robot ├── k8s-kubectl-canaryvolumemount │ ├── canary_pvc.yaml │ ├── README.md │ └── canary_job.yaml ├── k8s-troubleshoot-deployment │ └── README.md ├── rest-basicauth │ └── README.md ├── gcp-opssuite-metricquery │ ├── .runwhen │ │ ├── generation-rules │ │ │ └── gcp-quota-generation-rule.yaml │ │ └── templates │ │ │ ├── gcp-quota-slo.yaml │ │ │ ├── gcp-quota-slx.yaml │ │ │ └── gcp-quota-sli.yaml │ └── README.md ├── rest-explicitoauth2-basicauth │ └── README.md ├── rest-explicitoauth2-tokenheader │ └── README.md ├── kong-ingress-health-gcp-promql │ └── .runwhen │ │ ├── templates │ │ ├── kong-ingress-health-gcp-promql-slo.yaml │ │ ├── kong-ingress-health-gcp-promql-slx.yaml │ │ ├── kong-ingress-health-gcp-promql-taskset.yaml │ │ └── kong-ingress-health-gcp-promql-sli.yaml │ │ └── generation-rules │ │ └── kong-ingress-health-gcp-promql.yaml ├── slack-sendmessage │ └── README.md ├── discord-sendmessage │ └── README.md ├── googlechat-sendmessage │ └── README.md ├── rocketchat-sendmessage │ └── README.md ├── README.md ├── gcp-serviceshealth │ ├── README.md │ └── sli.robot ├── k8s-daemonset-healthcheck │ └── README.md ├── prometheus-queryinstant-transform │ └── README.md ├── k8s-kubectl-eventquery │ └── README.md ├── twitter-query-tweets │ ├── README.md │ ├── runbook.robot │ └── sli.robot ├── k8s-kubectl-run │ ├── README.md │ └── runbook.robot ├── github-status-components │ ├── sli.robot │ └── README.md ├── curl-generic │ ├── README.md │ ├── runbook.robot │ └── sli.robot ├── grpc-grpcurl-unary │ ├── README.md │ ├── runbook.robot │ └── sli.robot ├── gcp-gcloudcli-generic │ ├── README.md │ ├── runbook.robot │ └── sli.robot ├── k8s-cortexmetrics-ingestor-health │ └── README.md └── sli-alert-threshold │ └── README.md ├── .github ├── scripts │ ├── index-config.yaml │ └── index.sh ├── workflows │ ├── pypi.yaml │ ├── release.yml │ └── slack-notify-readme-updates.yml ├── release.yaml └── CODEOWNERS ├── .sourceignore ├── requirements.txt ├── CHANGELOG.md ├── setup.py ├── README.md └── README_HOWTO.md /VERSION: -------------------------------------------------------------------------------- 1 | 0.0.20 2 | -------------------------------------------------------------------------------- /libraries/RW/AWS/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /.gitbook.yaml: -------------------------------------------------------------------------------- 1 | root: ./codebundles/ -------------------------------------------------------------------------------- /libraries/RW/HashiCorp/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /libraries/RW/AWS/mixins/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /libraries/RW/AWS/strategies/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /libraries/RW/Chat/strategies/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include LICENSE 2 | recursive-include RW * -------------------------------------------------------------------------------- /libraries/RW/Curl/__init__.py: -------------------------------------------------------------------------------- 1 | from .Curl import Curl -------------------------------------------------------------------------------- /libraries/RW/Rest/__init__.py: -------------------------------------------------------------------------------- 1 | from .rest import Rest -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | description-file = README.md -------------------------------------------------------------------------------- /libraries/RW/Sysdig/__init__.py: -------------------------------------------------------------------------------- 1 | from .Sysdig import Sysdig -------------------------------------------------------------------------------- /libraries/RW/Utils/__init__.py: -------------------------------------------------------------------------------- 1 | from .utils import * 2 | -------------------------------------------------------------------------------- /libraries/RW/ArgoCD/__init__.py: -------------------------------------------------------------------------------- 1 | from .argocd import ArgoCD 2 | -------------------------------------------------------------------------------- /libraries/RW/Discord/__init__.py: -------------------------------------------------------------------------------- 1 | from .Discord import Discord -------------------------------------------------------------------------------- /libraries/RW/Patroni/__init__.py: -------------------------------------------------------------------------------- 1 | from .patroni import Patroni -------------------------------------------------------------------------------- /libraries/RW/Postgres/__init__.py: -------------------------------------------------------------------------------- 1 | from .postgres import Postgres -------------------------------------------------------------------------------- /libraries/RW/RunWhen/__init__.py: -------------------------------------------------------------------------------- 1 | from .papi import Papi 2 | -------------------------------------------------------------------------------- /libraries/RW/gRPC/__init__.py: -------------------------------------------------------------------------------- 1 | from .grpcurl import gRPCurl 2 | -------------------------------------------------------------------------------- /libraries/RW/Datadog/__init__.py: -------------------------------------------------------------------------------- 1 | from .datadog import Datadog 2 | -------------------------------------------------------------------------------- /libraries/RW/Artifactory/__init__.py: -------------------------------------------------------------------------------- 1 | from .Artifactory import Artifactory -------------------------------------------------------------------------------- /libraries/RW/CertManager/__init__.py: -------------------------------------------------------------------------------- 1 | from .cert_manager import CertManager -------------------------------------------------------------------------------- /libraries/RW/Prometheus/__init__.py: -------------------------------------------------------------------------------- 1 | from .Prometheus import Prometheus -------------------------------------------------------------------------------- /libraries/RW/Uptime/__init__.py: -------------------------------------------------------------------------------- 1 | from .StatusPage import StatusPage 2 | -------------------------------------------------------------------------------- /libraries/RW/SocialScrape/__init__.py: -------------------------------------------------------------------------------- 1 | from .SocialScrape import SocialScrape -------------------------------------------------------------------------------- /libraries/RW/K8s/__init__.py: -------------------------------------------------------------------------------- 1 | from .k8s import K8s 2 | from .k8sutils import K8sUtils -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *codecollection.yaml 2 | __pycache__ 3 | output.xml 4 | log.html 5 | report.html -------------------------------------------------------------------------------- /docs/GitHub_Banner.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/runwhen-contrib/rw-public-codecollection/HEAD/docs/GitHub_Banner.jpg -------------------------------------------------------------------------------- /libraries/RW/GCP/__init__.py: -------------------------------------------------------------------------------- 1 | from RW.GCP import OpsSuite 2 | from RW.GCP import Chat 3 | from RW.GCP import ServiceHealth 4 | from RW.GCP.GCloudCLI import * 5 | -------------------------------------------------------------------------------- /codebundles/grafana-health/README.md: -------------------------------------------------------------------------------- 1 | # Grafana Health 2 | 3 | ## SLI 4 | Check Grafana server health. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/dns-latency/README.md: -------------------------------------------------------------------------------- 1 | # DNS Latency 2 | 3 | ## SLI 4 | Check DNS latency for Google Resolver. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/opsgenie-alert/README.md: -------------------------------------------------------------------------------- 1 | # Opsgenie Alert 2 | 3 | ## TaskSet 4 | Create an alert in Opsgenie. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/pingdom-health/README.md: -------------------------------------------------------------------------------- 1 | # Pingdom Health 2 | 3 | ## SLI 4 | Check health of Pingdom platform. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/http-latency/README.md: -------------------------------------------------------------------------------- 1 | # HTTP Latency 2 | 3 | ## SLI 4 | Measure HTTP latency against a given URL. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/slo-default/README.md: -------------------------------------------------------------------------------- 1 | # SLO Default 2 | 3 | ## SLO 4 | Default SLO query used for multi-window multi-burn. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/elasticsearch-health/README.md: -------------------------------------------------------------------------------- 1 | # Elasticsearch Health 2 | 3 | ## SLI 4 | Check Elasticsearch cluster health 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /libraries/RW/AWS/robot_tests/test_queries.py: -------------------------------------------------------------------------------- 1 | SAMPLE_METRIC_QUERY = { 2 | "view": "timeseries", 3 | "regon": "us-west-1", 4 | "metrics": [ 5 | {"expression": "SELECT MAX(CPUUtilization) FROM \"AWS/EC2\""}, 6 | ] 7 | } -------------------------------------------------------------------------------- /codebundles/datadog-system-load/README.md: -------------------------------------------------------------------------------- 1 | # Datadog System Load 2 | 3 | ## SLI 4 | Retrieve a DataDog instance's "System Load" metric 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/k8s-kubectl-top/README.md: -------------------------------------------------------------------------------- 1 | # Kubernetes kubectl Top 2 | 3 | ## SLI 4 | Retreieve aggregate data via kubectl top command. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/slo-default/queries.yaml: -------------------------------------------------------------------------------- 1 | errorQuery: "sum_over_time((count({metric_name} unless {metric_name} {operand} {threshold}))[{window}:]) OR on() vector(0)" 2 | totalQuery: "sum_over_time((count({metric_name}))[{window}:]) OR on() vector(1)" -------------------------------------------------------------------------------- /codebundles/k8s-triage-patroni/README.md: -------------------------------------------------------------------------------- 1 | # Kubernetes Triage Patroni 2 | 3 | ## TaskSet 4 | Taskset to triage issues related to patroni. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/msteams-send-message/README.md: -------------------------------------------------------------------------------- 1 | # Microsoft Teams Send Message 2 | 3 | ## TaskSet 4 | Send a message to an MS Teams channel. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/ping-host-availability/README.md: -------------------------------------------------------------------------------- 1 | # Ping Host Availability 2 | 3 | ## SLI 4 | Ping a host and retrieve packet loss percentage. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/sysdig-monitor-metric/README.md: -------------------------------------------------------------------------------- 1 | # Sysdig Monitor Metric 2 | 3 | ## SLI 4 | Queries the Sysdig data API to fetch metric data. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/http-ok/README.md: -------------------------------------------------------------------------------- 1 | # HTTP OK 2 | 3 | ## SLI 4 | Check if an HTTP request against a URL fails or times out of a given latency window. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /.github/scripts/index-config.yaml: -------------------------------------------------------------------------------- 1 | # in the repos, the key specifies the name of the temp directory name 2 | repos: 3 | rw-public-codecollection: https://github.com/runwhen-contrib/rw-public-codecollection.git 4 | robot_file_pattern: 5 | codebundles: .robot -------------------------------------------------------------------------------- /codebundles/cert-manager-healthcheck/README.md: -------------------------------------------------------------------------------- 1 | # Cert-Manager Health Check 2 | 3 | ## SLI 4 | Check the health of pods deployed by cert-manager. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/artifactory-ok/README.md: -------------------------------------------------------------------------------- 1 | # Artifactory OK 2 | ## SLI 3 | Checks an Artifactory instance health endpoint to determine its operational status. 4 | 5 | ## Use Cases 6 | 7 | ## Requirements 8 | 9 | ## TODO 10 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/aws-billing-tagcosts/README.md: -------------------------------------------------------------------------------- 1 | # AWS Billing Tag Costs 2 | 3 | ## SLI 4 | Monitors AWS cost and usage data for the latest billing period. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/aws-vm-triage/README.md: -------------------------------------------------------------------------------- 1 | # AWS Instance Triage 2 | 3 | ## TaskSet 4 | Triage and troubleshoot performance and usage of an AWS EC2 instance 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/gitlab-get-repos-latency/README.md: -------------------------------------------------------------------------------- 1 | # GitLab Get Repository Latency 2 | 3 | ## SLI 4 | Check GitLab latency by getting a list of repo names. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/uptimecom-component-ok/README.md: -------------------------------------------------------------------------------- 1 | # Uptime.com Component OK 2 | 3 | ## SLI 4 | Check the status of an Uptime.com component for a given site. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/aws-cloudwatch-logquery/README.md: -------------------------------------------------------------------------------- 1 | # AWS CloudWatch Log Query 2 | 3 | ## SLI 4 | Retrieve number of results from an AWS CloudWatch Insights query. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/aws-s3-stalecheck/README.md: -------------------------------------------------------------------------------- 1 | # AWS S3 Stale Check 2 | 3 | ## TaskSet 4 | Identify stale AWS S3 buckets, based on last modified object timestamp. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/gcp-opssuite-logquery/README.md: -------------------------------------------------------------------------------- 1 | # GCP Operations Suite Log Query 2 | 3 | ## SLI 4 | Retrieve the number of results of a GCP Log Explorer query. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/k8s-triage-deploymentreplicas/README.md: -------------------------------------------------------------------------------- 1 | # Kubernetes Triage Deployments 2 | 3 | ## TaskSet 4 | Triages issues related to a deployment's replicas. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/aws-cloudformation-triage/README.md: -------------------------------------------------------------------------------- 1 | # AWS CloudFormation Triage 2 | 3 | ## TaskSet 4 | Triage and troubleshoot various issues with AWS CloudFormation 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/aws-cloudwatch-metricquery/README.md: -------------------------------------------------------------------------------- 1 | # AWS CloudWatch Metric Query 2 | 3 | ## SLI 4 | Retrieve the result of an AWS CloudWatch Metrics Insights query. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/aws-ec2-securitycheck/README.md: -------------------------------------------------------------------------------- 1 | # AWS EC2 Security Check 2 | 3 | ## TaskSet 4 | Performs a suite of security checks against a set of AWS EC2 instances. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/gcp-opssuite-logquery-dashboard/README.md: -------------------------------------------------------------------------------- 1 | # GCP Operations Suite Log Query Dashboard 2 | 3 | ## TaskSet 4 | Generate a link to the GCP Log Explorer. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/k8s-kubectl-apiserverhealth/README.md: -------------------------------------------------------------------------------- 1 | # Kubernetes kubectl API Server Health 2 | 3 | ## SLI 4 | Check the health of a Kubernetes API server using kubectl. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/sysdig-monitor-promqlmetric/README.md: -------------------------------------------------------------------------------- 1 | # Sysdig Monitor PromQL Metric 2 | 3 | ## SLI 4 | Queries the Sysdig data API with a PromQL query to fetch metric data. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/github-status-maintenances/README.md: -------------------------------------------------------------------------------- 1 | # GitHub Status - Maintenance 2 | 3 | ## SLI 4 | Retrieve number of upcoming Github platform maintenances over a given window. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/aws-billing-costsacrosstags/README.md: -------------------------------------------------------------------------------- 1 | # AWS Billing Costs Across Tags 2 | 3 | ## TaskSet 4 | Creates a report of AWS line item costs filtered to a list of tagged resources 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/cert-manager-expirations/README.md: -------------------------------------------------------------------------------- 1 | # Cert-Manager Expirations 2 | 3 | ## SLI 4 | Retrieve number of expired TLS certificates managed by cert-manager within a given window. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/k8s-triage-statefulset/README.md: -------------------------------------------------------------------------------- 1 | # Kubernetes Triage StatefulSet 2 | 3 | ## TaskSet 4 | A taskset for troubleshooting issues for StatefulSets and their related resources. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/aws-cloudwatch-logquery-rowcount-zeroerror/README.md: -------------------------------------------------------------------------------- 1 | # AWS CloudWatch Log Query Row Count Zero Error 2 | 3 | ## SLI 4 | Retrieve binary result from an AWS CloudWatch Insights query. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/web-triage/README.md: -------------------------------------------------------------------------------- 1 | # Web Triage 2 | 3 | ## TaskSet 4 | Troubleshoot and triage a URL to inspect it for common issues such as an expired certification, missing DNS records, etc. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/aws-cloudwatch-metricquery-dashboard/README.md: -------------------------------------------------------------------------------- 1 | # AWS CloudWatch Metric Query Dashboard 2 | 3 | ## TaskSet 4 | Creates a URL to a AWS CloudWatch metrics dashboard with a running query. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/k8s-decommission-workloads/README.md: -------------------------------------------------------------------------------- 1 | # Kubernetes Decomission Workload 2 | 3 | ## TaskSet 4 | Searches a namespace for matching objects and provides the commands to decommission them. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/k8s-patroni-healthcheck/README.md: -------------------------------------------------------------------------------- 1 | # Kubernetes Patroni Health Check 2 | 3 | ## SLI 4 | Uses kubectl (or equivalent) to query the state of a patroni cluster and determine if it's healthy. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /libraries/RW/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | This line is required so that we can have RW.Core in one directory 3 | and the other RW libs in other directories 4 | See - https://packaging.python.org/en/latest/guides/packaging-namespace-packages/ 5 | """ 6 | __path__ = __import__("pkgutil").extend_path(__path__, __name__) 7 | -------------------------------------------------------------------------------- /libraries/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | This line is required so that we can have RW.Core in one directory 3 | and the other RW libs in other directories 4 | See - https://packaging.python.org/en/latest/guides/packaging-namespace-packages/ 5 | """ 6 | __path__ = __import__("pkgutil").extend_path(__path__, __name__) 7 | -------------------------------------------------------------------------------- /codebundles/aws-cloudformation-stackevents-count/README.md: -------------------------------------------------------------------------------- 1 | # AWS CloudFormation Stack Events Count 2 | 3 | ## SLI 4 | Retrieve the number of detected AWS CloudFormation stack events over a given history 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/gitlab-availability/README.md: -------------------------------------------------------------------------------- 1 | # GitLab Availability 2 | 3 | ## SLI 4 | Check availability of a GitLab server. 5 | 6 | ## TaskSet 7 | Troubleshoot issues with GitLab server availability. 8 | 9 | ## Use Cases 10 | 11 | ## Requirements 12 | 13 | ## TODO 14 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/aws-account-limit/README.md: -------------------------------------------------------------------------------- 1 | # AWS Account Limit 2 | 3 | ## SLI 4 | Retrieve all recently created AWS accounts. 5 | 6 | ## TaskSet 7 | Retrieve the count of all AWS accounts in an organization. 8 | 9 | ## Use Cases 10 | 11 | ## Requirements 12 | 13 | ## TODO 14 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/github-status-incidents/README.md: -------------------------------------------------------------------------------- 1 | # GitHub Status - Incidents 2 | 3 | ## SLI 4 | Check for unresolved incidents related to GitHub services, and provides a count of ongoing incidents as a metric. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/jira-search-issues-latency/README.md: -------------------------------------------------------------------------------- 1 | # Jira Search Issues Latency 2 | 3 | ## SLI 4 | Check Jira latency when searching issues by current user. 5 | 6 | ## TaskSet 7 | Create an issue in Jira. 8 | 9 | ## Use Cases 10 | 11 | ## Requirements 12 | 13 | ## TODO 14 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/prometheus-queryrange-transform/README.md: -------------------------------------------------------------------------------- 1 | # Prometheus Range Query 2 | 3 | ## SLI 4 | Run a PromQL query against Prometheus range query API, perform a provided transform, and return the result. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/vault-ok/README.md: -------------------------------------------------------------------------------- 1 | # Vault OK 2 | 3 | ## SLI 4 | Check the health of a Vault server. The response code is used to determine if the service is healthy, resulting in a metric of 1 if it is, or 0 if not. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/aws-cloudwatch-tagmetricquery/README.md: -------------------------------------------------------------------------------- 1 | # AWS CloudWatch Tag Metric Query 2 | 3 | ## SLI 4 | Retrieve aggregate results from multiple AWS Cloudwatch Metrics Insights queries ran against tagged resources. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/github-get-repos-latency/README.md: -------------------------------------------------------------------------------- 1 | # GitHub Get Repository Latency 2 | 3 | ## SLI 4 | Check GitHub latency by getting a list of repo names. 5 | 6 | ## TaskSet 7 | Create a new issue in GitHub Issues. 8 | 9 | ## Use Cases 10 | 11 | ## Requirements 12 | 13 | ## TODO 14 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/github-actions-workflowtiming/README.md: -------------------------------------------------------------------------------- 1 | # GitHub Actions Workflow Timing 2 | 3 | ## SLI 4 | Monitors the average timing of a github actions workflow file within a repo and returns the average runtime in minutes. 5 | 6 | ## Use Cases 7 | 8 | ## Requirements 9 | 10 | ## TODO 11 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /.sourceignore: -------------------------------------------------------------------------------- 1 | # This file is used by the flux GitRepository controller to 2 | # ignore .yaml files that would otherwise be queued for 3 | # parsing by the Kustomization operator 4 | # 5 | # In the case of codecollections, we want to ignore all files 6 | # that are not codecollection.yaml 7 | ** 8 | !codecollection.yaml 9 | !.sourceignore 10 | -------------------------------------------------------------------------------- /codebundles/k8s-kubectl-canaryvolumemount/canary_pvc.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: PersistentVolumeClaim 3 | metadata: 4 | labels: 5 | service: canary 6 | name: canary 7 | spec: 8 | accessModes: 9 | - ReadWriteOnce 10 | resources: 11 | requests: 12 | storage: 128Mi 13 | storageClassName: standard 14 | volumeMode: Filesystem -------------------------------------------------------------------------------- /libraries/RW/Chat/strategies/ChatProviderStrategy.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | 3 | class ChatProviderStrategy(ABC): 4 | def __init__(self, **kwargs): 5 | for k, v in kwargs.items(): 6 | setattr(self, k, v) 7 | self.client = None 8 | 9 | @abstractmethod 10 | def send_message(self, message: str, **kwargs): 11 | pass 12 | -------------------------------------------------------------------------------- /codebundles/k8s-troubleshoot-deployment/README.md: -------------------------------------------------------------------------------- 1 | # Kubernetes Troubleshoot Deployment 2 | 3 | ## TaskSet 4 | A taskset for troubleshooting general issues associated with typical kubernetes deployment resources. 5 | Supports API interactions via both the API client and Kubectl binary through RunWhen Shell Services. 6 | 7 | ## Use Cases 8 | 9 | ## Requirements 10 | 11 | ## TODO 12 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /libraries/RW/AWS/strategies/GetClientStrategy.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | 3 | class GetClientStrategy(ABC): 4 | def __init__(self, **kwargs): 5 | self.client = None 6 | self.client_config_cache = {} 7 | for k, v in kwargs.items(): 8 | setattr(self, k, v) 9 | 10 | @abstractmethod 11 | def get_client(self, service_name: str, **kwargs): 12 | pass 13 | -------------------------------------------------------------------------------- /libraries/RW/Chat/strategies/GoogleChatProviderStrategy.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | 3 | from RW.Chat.strategies.ChatProviderStrategy import ChatProviderStrategy 4 | from RW.GCP.Chat import Chat 5 | 6 | class GoogleChatProviderStrategy(ChatProviderStrategy): 7 | def send_message(self, message: str, **kwargs): 8 | self.client = Chat() 9 | rsp = self.client.send_message(self.webhook_url, message) 10 | return rsp -------------------------------------------------------------------------------- /libraries/RW/Chat/strategies/DiscordChatProviderStrategy.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | 3 | from RW.Chat.strategies.ChatProviderStrategy import ChatProviderStrategy 4 | from RW.Discord import Discord 5 | 6 | class DiscordChatProviderStrategy(ChatProviderStrategy): 7 | def send_message(self, message: str, **kwargs): 8 | self.client = Discord() 9 | rsp = self.client.send_message(self.webhook_url, message) 10 | return rsp -------------------------------------------------------------------------------- /libraries/RW/Chat/strategies/SlackChatProviderStrategy.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | 3 | from RW.Chat.strategies.ChatProviderStrategy import ChatProviderStrategy 4 | from RW.Slack import Slack 5 | 6 | 7 | class SlackChatProviderStrategy(ChatProviderStrategy): 8 | def send_message(self, message: str, **kwargs): 9 | self.client = Slack() 10 | rsp = self.client.post_message(self.token, self.channel, message) 11 | return rsp 12 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | robotframework>=4.1.2 2 | prometheus-client>=0.11.0 3 | ruamel.base>=1.0.0 4 | ruamel.yaml>=0.17.20 5 | kubernetes>=18.20.0 6 | google-cloud-monitoring>=2.0.0 7 | google-cloud-logging>=3.0.0 8 | protobuf>=3.20.0 9 | rocketchat_API>=1.16.0 10 | boto3>=1.20.0 11 | dnspython>=2.0.0 12 | pyopenssl>=21.0.0 13 | slack_sdk>=3.19.0 14 | python-benedict>=0.25.0 15 | sdcclient>=0.16 16 | snscrape>=0.4.3.20220106 17 | pandas>=1.5.2 18 | jmespath>=1.0.1 19 | datadog-api-client==2.8.0 -------------------------------------------------------------------------------- /codebundles/rest-basicauth/README.md: -------------------------------------------------------------------------------- 1 | # Rest Basic Authentication 2 | 3 | ## SLI 4 | A general purpose rest codebundle for extracting data from a rest endpoint. See the [generic](https://docs.runwhen.com/public/v/codebundles/rest-generic) codebundle variant for more details. 5 | 6 | ## Use Cases 7 | Refer to the generic rest codebundle [here](https://docs.runwhen.com/public/v/codebundles/rest-generic) for a suite of setups and use cases. 8 | 9 | ## Requirements 10 | -- 11 | 12 | ## TODO 13 | -- -------------------------------------------------------------------------------- /libraries/RW/Discord/robot_tests/send.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Library RW.Discord 3 | Suite Setup Suite Initialization 4 | 5 | *** Tasks *** 6 | Send Discord Message 7 | ${rsp}= RW.Discord.Send Message ${DISCORD_WEBHOOK_URL} ${DISCORD_MESSAGE} 8 | Log ${rsp} 9 | 10 | *** Keywords *** 11 | Suite Initialization 12 | Set Suite Variable ${DISCORD_WEBHOOK_URL} %{DISCORD_WEBHOOK_URL} 13 | Set Suite Variable ${DISCORD_MESSAGE} %{DISCORD_MESSAGE} 14 | -------------------------------------------------------------------------------- /codebundles/gcp-opssuite-metricquery/.runwhen/generation-rules/gcp-quota-generation-rule.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: runwhen.com/v1 2 | kind: GenerationRules 3 | spec: 4 | generationRules: 5 | - resourceTypes: 6 | - namespace 7 | matchRules: 8 | - type: pattern 9 | pattern: "gmp-system" 10 | properties: [name, labels] 11 | mode: substring 12 | slxs: 13 | - baseName: gcp-quota 14 | levelOfDetail: detailed 15 | outputItems: 16 | - type: slx 17 | - type: sli 18 | -------------------------------------------------------------------------------- /codebundles/rest-explicitoauth2-basicauth/README.md: -------------------------------------------------------------------------------- 1 | # Rest OAuth2 With Basic Authentication 2 | 3 | ## SLI 4 | A general purpose rest codebundle for extracting data from a rest endpoint. See the [generic](https://docs.runwhen.com/public/v/codebundles/rest-generic) codebundle variant for more details. 5 | 6 | ## Use Cases 7 | Refer to the generic rest codebundle [here](https://docs.runwhen.com/public/v/codebundles/rest-generic) for a suite of setups and use cases. 8 | 9 | ## Requirements 10 | -- 11 | 12 | ## TODO 13 | -- -------------------------------------------------------------------------------- /libraries/RW/Chat/strategies/RocketChatProviderStrategy.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | 3 | from RW.Chat.strategies.ChatProviderStrategy import ChatProviderStrategy 4 | from RW.Rocketchat import Rocketchat 5 | 6 | 7 | class RocketChatProviderStrategy(ChatProviderStrategy): 8 | def send_message(self, message: str, **kwargs): 9 | self.client = Rocketchat() 10 | rsp = self.client.incoming_webhook( 11 | webhook_url=self.webhook_url, message=message 12 | ) 13 | return rsp 14 | -------------------------------------------------------------------------------- /codebundles/rest-explicitoauth2-tokenheader/README.md: -------------------------------------------------------------------------------- 1 | # Rest OAuth2 with Bearer-to-Access Authentication 2 | 3 | ## SLI 4 | A general purpose rest codebundle for extracting data from a rest endpoint. See the [generic](https://docs.runwhen.com/public/v/codebundles/rest-generic) codebundle variant for more details. 5 | 6 | ## Use Cases 7 | Refer to the generic rest codebundle [here](https://docs.runwhen.com/public/v/codebundles/rest-generic) for a suite of setups and use cases. 8 | 9 | ## Requirements 10 | -- 11 | 12 | ## TODO 13 | -- -------------------------------------------------------------------------------- /codebundles/gcp-opssuite-metricquery/.runwhen/templates/gcp-quota-slo.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: runwhen.com/v1 2 | kind: ServiceLevelObjective 3 | metadata: 4 | name: {{slx_name}} 5 | labels: 6 | {% include "common-labels.yaml" %} 7 | annotations: 8 | {% include "common-annotations.yaml" %} 9 | spec: 10 | codeBundle: 11 | repoUrl: https://github.com/runwhen-contrib/rw-public-codecollection.git 12 | pathToYaml: codebundles/slo-default/queries.yaml 13 | ref: main 14 | sloSpecType: simple-mwmb 15 | objective: 99.5 16 | threshold: 0 17 | operand: eq 18 | 19 | -------------------------------------------------------------------------------- /codebundles/kong-ingress-health-gcp-promql/.runwhen/templates/kong-ingress-health-gcp-promql-slo.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: runwhen.com/v1 2 | kind: ServiceLevelObjective 3 | metadata: 4 | name: {{slx_name}} 5 | labels: 6 | {% include "common-labels.yaml" %} 7 | annotations: 8 | {% include "common-annotations.yaml" %} 9 | spec: 10 | codeBundle: 11 | repoUrl: https://github.com/runwhen-contrib/rw-public-codecollection.git 12 | pathToYaml: codebundles/slo-default/queries.yaml 13 | ref: main 14 | sloSpecType: simple-mwmb 15 | objective: 99 16 | threshold: 1 17 | operand: eq -------------------------------------------------------------------------------- /codebundles/msteams-send-message/runbook.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Send a message to an MS Teams channel. 3 | Metadata Display Name Microsoft Teams Send Message 4 | Metadata Supports Microsoft,MS-TEAMS 5 | Metadata Author Vui Lee 6 | Library RW.Core 7 | Library RW.MSTeams 8 | #TODO: Refactor for new platform use 9 | 10 | *** Tasks *** 11 | Send a Message to an MS Teams Channel 12 | Import User Variable MSTEAMS_ALERTS_CHANNEL_URL 13 | RW.MSTeams.Send Message Red alert!!! url=${MSTEAMS_ALERTS_CHANNEL_URL} 14 | -------------------------------------------------------------------------------- /libraries/RW/MyTest.py: -------------------------------------------------------------------------------- 1 | """ 2 | MyTest keyword library 3 | 4 | Scope: Global 5 | """ 6 | from RW.Utils import utils 7 | from RW import platform 8 | 9 | 10 | class MyTest: 11 | """MyTest keyword library is used for internal testing.""" 12 | 13 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 14 | 15 | def __init__(self, counter: int = 0) -> None: 16 | self.counter = counter 17 | 18 | def my_test_kw(self) -> bool: 19 | """ 20 | TBD 21 | """ 22 | self.counter += 1 23 | platform.debug_log(f"In my_test_kw: counter={self.counter}") 24 | return True 25 | -------------------------------------------------------------------------------- /codebundles/pingdom-health/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Check health of Pingdom platform. 3 | Metadata Display Name Pingdom Health 4 | Metadata Supports Pingdom 5 | Metadata Type SLI 6 | Metadata Author Vui Le 7 | Force Tags Pingdom health 8 | Library RW.Core 9 | Library RW.Pingdom 10 | #TODO: Refactor for new platform use 11 | 12 | *** Tasks *** 13 | Check Pingdom Health 14 | ${res} = RW.Pingdom.Get Health Status 15 | Info Log ${res} 16 | Console Log ${res.status_code} 17 | Console Log ${res.content} 18 | -------------------------------------------------------------------------------- /.github/workflows/pypi.yaml: -------------------------------------------------------------------------------- 1 | name: Publish Pypi Package 2 | 3 | on: 4 | push: 5 | branches: 6 | - main 7 | paths: 8 | - VERSION 9 | workflow_dispatch: 10 | 11 | jobs: 12 | publish-pypi: 13 | runs-on: ubuntu-latest 14 | steps: 15 | - name: Checkout 16 | uses: actions/checkout@v2 17 | - name: Build Package 18 | run: |- 19 | pip install setuptools wheel twine 20 | ln -s libraries/RW RW 21 | python setup.py sdist bdist_wheel 22 | - name: Publish 23 | run: |- 24 | twine upload dist/* -u __token__ -p ${{ secrets.PYPI_TOKEN }} -------------------------------------------------------------------------------- /codebundles/grafana-health/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Check Grafana server health. 3 | Metadata Display Name Grafana Health 4 | Metadata Supports Grafana 5 | Metadata Type SLI 6 | Metadata Author Vui Le 7 | Force Tags Grafana health 8 | Library RW.Core 9 | Library RW.Grafana 10 | #TODO: Refactor for new platform use 11 | 12 | *** Tasks *** 13 | Check Grafana Server Health 14 | ${res} = RW.Grafana.Get Health Status 15 | Info Log ${res} 16 | Console Log ${res.status_code} 17 | Console Log ${res.content} 18 | -------------------------------------------------------------------------------- /libraries/RW/HashiCorp/robot_tests/health.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Library RW.HashiCorp.Vault 3 | Library RW.platform 4 | Library RW.Core 5 | Library OperatingSystem 6 | Suite Setup Suite Initialization 7 | 8 | *** Keywords *** 9 | Suite Initialization 10 | Set Suite Variable ${VAULT_URL} %{VAULT_URL} 11 | 12 | *** Tasks *** 13 | Health Check Vault 14 | ${rsp}= RW.HashiCorp.Vault.Get Health url=${VAULT_URL} 15 | ${vault_health}= Set Variable ${rsp} 16 | ${rsp}= RW.HashiCorp.Vault.Check Health url=${VAULT_URL} 17 | ${status}= Set Variable ${rsp} 18 | -------------------------------------------------------------------------------- /codebundles/k8s-kubectl-canaryvolumemount/README.md: -------------------------------------------------------------------------------- 1 | # Kubernetes kubectl Canary Volume Mount 2 | 3 | ## SLI 4 | An SLI which periodically creates a job which lists the contents of a directory on a pvc, if the list command succeeds than the SLI 5 | returns a score of 1, otherwise a 0 when it fails. 6 | 7 | ## Use Cases 8 | - Validate that system storage is working and can be provisioned on the cluster. 9 | 10 | ## Requirements 11 | - A kubeconfig with get/list access on deployment, pod, and PVC objects in the chosen namespace. 12 | - A chosen `namespace` and `context` to use from the kubeconfig. 13 | 14 | ## TODO 15 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /.github/release.yaml: -------------------------------------------------------------------------------- 1 | # .github/release.yml 2 | # see https://docs.github.com/en/repositories/releasing-projects-on-github/automatically-generated-release-notes#configuring-automatically-generated-release-notes 3 | 4 | changelog: 5 | exclude: 6 | labels: 7 | - ignore-for-release 8 | authors: 9 | - octocat 10 | - github-actions 11 | categories: 12 | - title: Breaking Changes 🛠 13 | labels: 14 | - Semver-Major 15 | - breaking-change 16 | - title: Exciting New Features 🎉 17 | labels: 18 | - Semver-Minor 19 | - enhancement 20 | - title: Other Changes 21 | labels: 22 | - "*" -------------------------------------------------------------------------------- /codebundles/gcp-opssuite-metricquery/.runwhen/templates/gcp-quota-slx.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: runwhen.com/v1 2 | kind: ServiceLevelX 3 | metadata: 4 | name: {{slx_name}} 5 | labels: 6 | {% include "common-labels.yaml" %} 7 | annotations: 8 | {% include "common-annotations.yaml" %} 9 | spec: 10 | imageURL: https://storage.googleapis.com/runwhen-nonprod-shared-images/icons/google-cloud.svg 11 | alias: GCP Quota Errors 12 | asMeasuredBy: Quota Issues reported in Google Managed Prometheus 13 | configProvided: 14 | - name: SLX_PLACEHOLDER 15 | value: gcp-quota 16 | owners: 17 | - {{workspace.owner_email}} 18 | statement: GCP Quota usage should not be approaching 100% for any object. -------------------------------------------------------------------------------- /codebundles/slack-sendmessage/README.md: -------------------------------------------------------------------------------- 1 | # Slack Send Message 2 | 3 | ## TaskSet 4 | Sends a static message to a Slack chat channel via webhook. There is optional configuration for including live runsession info and links 5 | for team members to quickly access running sessions. 6 | 7 | ## Use Cases 8 | - Send an alert when an SLO is burning too much budget which contains a link to the active runsession. 9 | - Let your team members know you're in a live runsession and provide them with a link to join you. 10 | 11 | ## Requirements 12 | - A `webhook_url` secret which allows the codebundle to perform an incoming webhook post request against the service API. 13 | 14 | ## TODO 15 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/discord-sendmessage/README.md: -------------------------------------------------------------------------------- 1 | # Discord Send Message 2 | 3 | ## TaskSet 4 | Sends a static message to a Discord chat channel via webhook. There is optional configuration for including live runsession info and links 5 | for team members to quickly access running sessions. 6 | 7 | ## Use Cases 8 | - Send an alert when an SLO is burning too much budget which contains a link to the active runsession. 9 | - Let your team members know you're in a live runsession and provide them with a link to join you. 10 | 11 | ## Requirements 12 | - A `webhook_url` secret which allows the codebundle to perform an incoming webhook post request against the service API. 13 | 14 | ## TODO 15 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/ping-host-availability/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Ping a host and retrieve packet loss percentage. 3 | Metadata Display Name Ping Host Availability 4 | Metadata Supports ping 5 | Metadata Type SLI 6 | Metadata Author Vui Le 7 | Force Tags ping availability 8 | Library RW.Core 9 | #TODO: Refactor for new platform use 10 | 11 | *** Tasks *** 12 | Ping host and collect packet lost percentage 13 | RW.Core.Import User Variable HOST_NAME 14 | ${result} = RW.Core.Ping ${HOST_NAME} count=10 15 | RW.Core.Info Log ${result["stdout"]} 16 | RW.Core.Push Metric ${result["packet_loss_percentage"]} 17 | -------------------------------------------------------------------------------- /codebundles/googlechat-sendmessage/README.md: -------------------------------------------------------------------------------- 1 | # Google Chat Send Message 2 | 3 | ## TaskSet 4 | Sends a static message to a Google chat channel via webhook. There is optional configuration for including live runsession info and links 5 | for team members to quickly access running sessions. 6 | 7 | ## Use Cases 8 | - Send an alert when an SLO is burning too much budget which contains a link to the active runsession. 9 | - Let your team members know you're in a live runsession and provide them with a link to join you. 10 | 11 | ## Requirements 12 | - A `webhook_url` secret which allows the codebundle to perform an incoming webhook post request against the service API. 13 | 14 | ## TODO 15 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/rocketchat-sendmessage/README.md: -------------------------------------------------------------------------------- 1 | # Rocketchat Send Message 2 | 3 | ## TaskSet 4 | Sends a static message to a Rocketchat chat channel via webhook. There is optional configuration for including live runsession info and links 5 | for team members to quickly access running sessions. 6 | 7 | ## Use Cases 8 | - Send an alert when an SLO is burning too much budget which contains a link to the active runsession. 9 | - Let your team members know you're in a live runsession and provide them with a link to join you. 10 | 11 | ## Requirements 12 | - A `webhook_url` secret which allows the codebundle to perform an incoming webhook post request against the service API. 13 | 14 | ## TODO 15 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /libraries/RW/GCP/robot_tests/chat.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Library RW.GCP.Chat 3 | 4 | Suite Setup Suite Initialization 5 | Suite Teardown Suite Teardown 6 | 7 | 8 | *** Variables *** 9 | ${CHAT_MESSAGE} 10 | 11 | 12 | *** Tasks *** 13 | Send Hello World1 14 | Set Suite Variable ${CHAT_MESSAGE} ${CHAT_MESSAGE}Hello World1 15 | 16 | Send Hello World2 17 | Set Suite Variable ${CHAT_MESSAGE} ${CHAT_MESSAGE}Hello World2 18 | 19 | 20 | *** Keywords *** 21 | Suite Initialization 22 | Set Suite Variable ${GCP_CHAT_WEBHOOK} %{GCP_CHAT_WEBHOOK} 23 | 24 | Suite Teardown 25 | ${rsp}= RW.GCP.Chat.Send Message ${GCP_CHAT_WEBHOOK} ${CHAT_MESSAGE} 26 | Log ${rsp} 27 | -------------------------------------------------------------------------------- /.github/CODEOWNERS: -------------------------------------------------------------------------------- 1 | # This is a comment. 2 | # Each line is a file pattern followed by one or more owners. 3 | # Read more: https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners 4 | 5 | # These owners will be the default owners for everything in 6 | # the repo. Unless a later match takes precedence, 7 | # these owners will be requested for 8 | # review when someone opens a pull request. 9 | * @runwhen-contrib/runwhen-team 10 | 11 | # Order is important; the last matching pattern takes the most 12 | # precedence. 13 | 14 | # When someone opens a pull request that only 15 | # modifies JS files, only @js-owner and not the global 16 | # owner(s) will be requested for a review. 17 | -------------------------------------------------------------------------------- /codebundles/datadog-system-load/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Retrieve a DataDog instance's "System Load" metric 3 | Metadata Display Name Datadog System Load 4 | Metadata Supports datadog 5 | Metadata Type SLI 6 | Metadata Author Vui Le 7 | Force Tags datadog load system 8 | Library RW.Core 9 | Library RW.Datadog 10 | #TODO: Refactor for new platform use 11 | 12 | *** Tasks *** 13 | Check Datadog System Load 14 | Import User Variable DATADOG_API_KEY 15 | Import User Variable DATADOG_APP_KEY 16 | Import User Variable SERVICE_DESCR 17 | ${result} = RW.Datadog.Get Metrics avg:system.load.1{host:my-minion1} 60 verbose=true 18 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | ## 0.0.14 2024-01-15 2 | - Add pypi publish config and workflow (#135) 1c6634b 3 | 4 | ## 0.0.13 2023-09-18 5 | 6 | ## 0.0.12 2023-04-18 7 | - Add new index v2 generation (#112) 4e537e9 8 | 9 | ## 0.0.11 2023-04-11 10 | - Add semver & changelog automation (#105) 822fcf3 11 | - Feature/k8s mongodb health (#103) 49f9575 12 | - Add codebundles for gcloud cli (#102) 85c8983 13 | - Add unary grpcurl codebundles (#100) a3b82a0 14 | - Add incident sli (#90) a71a18b 15 | - Add Codebundles for handling Laggy Patroni Replicas (#74) 0571d68 16 | - Add datadog codebundle (#61) 2d99c0a 17 | - Add argocd healthcheck codebundle (#39) d55f682 18 | - Add SLI and Runbook for K8s based Cortex Metrics Ingester Health (#34) 6a34183 19 | - Add Raw option to promql codebundles (#23) e7c2005 20 | 21 | -------------------------------------------------------------------------------- /codebundles/README.md: -------------------------------------------------------------------------------- 1 | # RunWhen Public Codecollection 2 | This content contains all of the open source codebundles that are published as the [RunWhen public codecollection](https://github.com/runwhen-contrib/rw-public-codecollection). Each codebundle folder contains code and documentation to build an SLI, SLO, or TaskSet. Each folder can contain a dedicated README.md file, which can detail usage of the codebundle, requirements, and use cases. This content is automatically published on docs.runwhen.com 3 | 4 | Updates to this content can be submitted as a Pull Request against the main [RunWhen Public Codecollection repo](https://github.com/runwhen-contrib/rw-public-codecollection) or by creating an [issue](https://github.com/runwhen-contrib/rw-public-codecollection/issues) with the `documentation` label. -------------------------------------------------------------------------------- /codebundles/dns-latency/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Check DNS latency for Google Resolver. 3 | Metadata Display Name DNS Latency 4 | Metadata Supports dns 5 | Metadata Type SLI 6 | Metadata Author Vui Le 7 | Force Tags dns latency 8 | Library RW.Core 9 | Library RW.DNS 10 | #TODO: Refactor for new platform use 11 | 12 | *** Tasks *** 13 | Check DNS latency for Google Resolver 14 | [Documentation] Get DNS latency for Google resolver 15 | RW.Core.Import User Variable HOSTNAME_TO_RESOLVE 16 | ${latency_ms} = RW.DNS.Lookup Latency In Milliseconds 17 | ... host=${HOSTNAME_TO_RESOLVE} nameservers=8.8.8.8 18 | RW.Core.Debug Log Latency in milliseconds: ${latency_ms} 19 | RW.Core.Push Metric ${latency_ms} 20 | -------------------------------------------------------------------------------- /codebundles/gcp-serviceshealth/README.md: -------------------------------------------------------------------------------- 1 | # GCP Service Health 2 | 3 | 4 | ## SLI 5 | This codebundle sets up a monitor for a specific region and GCP Product, which is then periodically checked for ongoing incidents based on the history available at https://status.cloud.google.com/incidents.json filtered based on severity level. 6 | 7 | ## Use Cases 8 | ### Use Case: SLI: Monitor for GCP Incidents with Google Kubernetes Engine & Google Compute Engine in 3 Regions 9 | This sample configuration is used to demostrate how to monitor incidents for multiple GCP products in multiple regions within the last 15m: 10 | 11 | ``` 12 | WITHIN_TIME: 15m 13 | PRODUCTS: Google Kubernetes Engine,Google Compute Engine 14 | REGIONS: us-central1,us-west2,us-west1 15 | SEVERITY: low 16 | ``` 17 | 18 | ## Requirements 19 | 20 | ## TODO 21 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/gitlab-get-repos-latency/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Check GitLab latency by getting a list of repo names. 3 | Metadata Display Name GitLab Get Repo Latency 4 | Metadata Supports GitLab 5 | Metadata Type SLI 6 | Metadata Author Vui Le 7 | Force Tags gitlab latency 8 | Library RW.Core 9 | Library RW.GitLab 10 | #TODO: Refactor for new platform use 11 | 12 | *** Tasks *** 13 | Check GitLab Latency With Get Repos 14 | Import User Variable GITLAB_TOKEN 15 | Import User Variable GITLAB_URL 16 | Import User Variable SERVICE_DESCR 17 | RW.GitLab.Create Session ${GITLAB_URL} ${GITLAB_TOKEN} 18 | ${res} = RW.GitLab.Get Projects 19 | Info Log ${res.names} 20 | Push Metric ${res.latency} descr=${SERVICE_DESCR} 21 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | 3 | with open("requirements.txt") as f: 4 | required = f.read().splitlines() 5 | 6 | setup( 7 | name="runwhen-public-keywords", 8 | version=open("VERSION").read(), 9 | packages=["RW"], 10 | package_dir={"RW": "RW"}, 11 | license="Apache License 2.0", 12 | description="A set of RunWhen published keywords for interacting with various APIs.", 13 | long_description=open("README.md").read(), 14 | long_description_content_type="text/markdown", 15 | author="Kyle Forster", 16 | author_email="kyle.forster@runwhen.com", 17 | url="https://github.com/runwhen-contrib/rw-public-codecollection", 18 | install_requires=required, 19 | include_package_data=True, 20 | classifiers=["Programming Language :: Python :: 3", "License :: OSI Approved :: Apache Software License"], 21 | ) 22 | -------------------------------------------------------------------------------- /codebundles/k8s-daemonset-healthcheck/README.md: -------------------------------------------------------------------------------- 1 | # Kubernetes Daemonset Healthcheck 2 | 3 | ## SLI 4 | Periodically checks the state of a daemonset and returns a score of 1 (healthy) or 0 (unhealthy). For a daemonset to be considered healthy it must: 5 | 6 | - Should not be above the allowed max unavailable count 7 | - Have 0 misscheduled pods 8 | - Have at least the minimum allowed pods 9 | - All scheduled pods should ready and available, indicating successful rollouts 10 | 11 | ## Use Cases 12 | - Check your vault csi driver is healthy and properly deployed across your nodes. 13 | 14 | ## Requirements 15 | - A kubeconfig with get/list access on daemonset objects in the chosen namespace. 16 | - A chosen `namespace` and `context` to use from the kubeconfig 17 | - A `daemonset name` to monitor within the chosen `namespace`. 18 | 19 | ## TODO 20 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/prometheus-queryinstant-transform/README.md: -------------------------------------------------------------------------------- 1 | # Prometheus Instant Query 2 | 3 | ## SLI 4 | Run a PromQL query against Prometheus instant query API, perform a provided transform, and return the result. 5 | 6 | ## Use Cases 7 | 8 | ### Use Case: SLI: Kubernetes Node Heartbeats with Kube State Metrics 9 | If you want to monitor the number of heartbeats failing across nodes, provided your kube_state metrics are submitted to the prometheus instance, then you can enter this query, which will give you a count of failing heartbeats across the node fleet: 10 | 11 | ```((max(sum by(condition) (kube_node_status_condition{condition!="Ready", status="false"}))+min(kube_node_status_condition{condition="Ready", status="true"}))*-1) + count( sum( kube_node_status_condition ) by (condition) )``` 12 | 13 | 14 | ## Requirements 15 | 16 | ## TODO 17 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/github-get-repos-latency/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Check GitHub latency by getting a list of repo names. 3 | Metadata Display Name GitHub API Latency 4 | Metadata Supports GitHub 5 | Metadata Type SLI 6 | Metadata Author Vui Le 7 | Force Tags github latency 8 | Library RW.Core 9 | Library RW.GitHub 10 | #TODO: Refactor for new platform use 11 | 12 | *** Tasks *** 13 | Check GitHub Latency With Get Repos 14 | Import User Variable GITHUB_TOKEN 15 | Import User Variable GITHUB_USER 16 | Import User Variable GITHUB_REPO_NAME 17 | Import User Variable SERVICE_DESCR 18 | ${res} = RW.GitHub.Get Repo user=${GITHUB_USER} name=${GITHUB_REPO_NAME} token=${GITHUB_TOKEN} 19 | Log ${res} 20 | Push Metric ${res.latency} descr=${SERVICE_DESCR} 21 | -------------------------------------------------------------------------------- /codebundles/jira-search-issues-latency/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Check Jira latency when searching issues by current user. 3 | Metadata Display Name Jira Search Issue Latency 4 | Metadata Supports Jira 5 | Metadata Type SLI 6 | Metadata Author Vui Le 7 | Force Tags jira latency 8 | Library RW.Core 9 | Library RW.Jira 10 | #TODO: Refactor for new platform use 11 | 12 | *** Tasks *** 13 | Search Jira Issues By Current User 14 | Import User Variable SERVICE_DESCR 15 | Import User Variable JIRA_URL 16 | Import User Variable JIRA_USER 17 | Import User Variable JIRA_USER_TOKEN 18 | Connect to Jira server=${JIRA_URL} user=${JIRA_USER} token=${JIRA_USER_TOKEN} 19 | ${res} = Search Issues 20 | Log ${res} 21 | Push Metric ${10} descr=${SERVICE_DESCR} 22 | -------------------------------------------------------------------------------- /codebundles/k8s-kubectl-eventquery/README.md: -------------------------------------------------------------------------------- 1 | # Kubernetes Kubectl Event Query 2 | 3 | ## SLI 4 | This codebundle returns the number of events in a Kubernetes namespace which have messages matching a regex pattern. 5 | Note that this does not sum up the message occurence count, only the Kubernetes object count. 6 | 7 | Pattern examples: 8 | - Return results which contain string: `mystring` 9 | - Return results for matches on 1 or 2: `(Search1|Search2)` 10 | 11 | ## Use Cases 12 | - Measure the number of failed volume mounts occuring by setting the pattern to "FailedMount" 13 | 14 | ## Requirements 15 | - A kubeconfig with get/list access on event objects in the chosen namespace. 16 | - A chosen `namespace` and `context` to use from the kubeconfig 17 | - A `event pattern` to use for selecting the event objects; refer to extended grep patterns for details on how to write these. run `man grep`. 18 | 19 | ## TODO 20 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /libraries/RW/Uptime/robot_tests/component_status.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Library RW.Uptime.StatusPage 3 | Library RW.platform 4 | Library RW.Core 5 | Library OperatingSystem 6 | Suite Setup Suite Initialization 7 | 8 | *** Keywords *** 9 | Suite Initialization 10 | Set Suite Variable ${UPTIME_COMPONENT_URL} %{UPTIME_COMPONENT_URL} 11 | ${UPTIME_TOKEN}= Evaluate RW.platform.Secret("uptime_token", """%{UPTIME_TOKEN}""") 12 | Set Suite Variable ${UPTIME_TOKEN} ${UPTIME_TOKEN} 13 | 14 | *** Tasks *** 15 | Check Component Status 16 | ${rsp}= RW.Uptime.StatusPage.Get Component Status auth_token=${UPTIME_TOKEN} url=${UPTIME_COMPONENT_URL} 17 | ${component_status}= Set Variable ${rsp} 18 | ${status}= RW.Uptime.StatusPage.Validate Component Status status_data=${component_status} allowed_status=operational,under-maintenance 19 | Log ${status} 20 | -------------------------------------------------------------------------------- /codebundles/http-latency/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Metadata Display Name HTTP Latency 4 | Metadata Supports HTTP 5 | Documentation Measure HTTP latency against a given URL. 6 | ... The returned metric is the number of seconds the request took as a float value. 7 | Force Tags Url HTTP Latency Metric 8 | Library RW.Core 9 | Library RW.HTTP 10 | 11 | *** Tasks *** 12 | Check HTTP Latency to Well Known URL 13 | ${URL}= RW.Core.Import User Variable URL 14 | ... type=string 15 | ... description=What URL to perform requests against. 16 | ... pattern=\w* 17 | ... default=https://www.runwhen.com 18 | ... example=https://www.runwhen.com 19 | ${rsp}= RW.HTTP.Get ${URL} expected_status=200 20 | RW.Core.Debug Log Latency in seconds: ${rsp.latency} 21 | RW.Core.Push Metric ${rsp.latency} 22 | -------------------------------------------------------------------------------- /codebundles/twitter-query-tweets/README.md: -------------------------------------------------------------------------------- 1 | # Twitter Query Tweets 2 | 3 | ## SLI 4 | Queries Twitter to count amount of tweets within a specified time range for a specific user handle. 5 | 6 | ## TaskSet 7 | Queries Twitter to fetch tweets within a specified time range for a specific user handle add them to a report. 8 | 9 | 10 | ## Use Cases 11 | ### SLI & TaskSet: Count and fetch tweets within the last day 12 | In our use case, the twitter handle [gitbookstatus](https://twitter.com/gitbookstatus) uses twitter to post updates about their service. The SLI can be configured to fetch and count any tweets within the last day, and the Runbook can be configured in the same way, but delivering the tweet content. 13 | 14 | Example configuration parameters for both the SLI and TaskSet: 15 | ``` 16 | Handle: gitbookstatus 17 | Max Tweets: 5 18 | Max Tweet Age: 1 19 | Min Tweet Age: 0 20 | ``` 21 | 22 | 23 | ## Requirements 24 | 25 | ## TODO 26 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /codebundles/k8s-kubectl-canaryvolumemount/canary_job.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: batch/v1 2 | kind: Job 3 | metadata: 4 | name: canary 5 | labels: 6 | job-name: canary 7 | spec: 8 | backoffLimit: 1 9 | completionMode: NonIndexed 10 | completions: 1 11 | parallelism: 1 12 | suspend: false 13 | template: 14 | metadata: 15 | labels: 16 | job-name: canary 17 | spec: 18 | containers: 19 | - command: 20 | - ls 21 | - /canary_pvc 22 | image: busybox 23 | imagePullPolicy: Always 24 | name: canary 25 | volumeMounts: 26 | - mountPath: /canary_pvc 27 | name: canary-pvc 28 | - mountPath: /tmp 29 | name: cache-volume 30 | terminationGracePeriodSeconds: 30 31 | restartPolicy: Never 32 | volumes: 33 | - emptyDir: {} 34 | name: cache-volume 35 | - name: canary-pvc 36 | persistentVolumeClaim: 37 | claimName: canary -------------------------------------------------------------------------------- /libraries/RW/Discord/Discord.py: -------------------------------------------------------------------------------- 1 | import json 2 | import requests 3 | 4 | class Discord: 5 | """ 6 | Discord integration to send messages via webhook to channels. 7 | 8 | """ 9 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 10 | 11 | def send_message(self, webhook_url, message, timeout=30): 12 | """ 13 | Send a message to a webhook-enabled Discord channel using the webhook URL. 14 | 15 | Examples: 16 | | RW.Discord.Send Message | https://discord.com/api/webhooks/...example... | Hello World! | 17 | | RW.Discord.Send Message | ${DISCORD_WEBHOOK_URL} | ${CHAT_MESSAGE} | 18 | 19 | Return Value: 20 | | response: requests.response | 21 | """ 22 | message = { 23 | "content": f"{message}"} 24 | headers = {'Content-Type': 'application/json; charset=UTF-8'} 25 | rsp = requests.post(webhook_url, headers=headers, json=message, timeout=timeout) 26 | return rsp -------------------------------------------------------------------------------- /codebundles/aws-account-limit/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Retrieve the count of all AWS accounts in an organization. 3 | Metadata Display Name AWS Organization Accounts 4 | Metadata Type SLI 5 | Metadata Author Vui Le 6 | Force Tags aws accounts limit 7 | Library RW.Core 8 | Library RW.AWS 9 | #TODO: Refactor for new platform use 10 | 11 | *** Tasks *** 12 | Get Count Of AWS Accounts In Organization 13 | Import User Variable SERVICE_DESCR 14 | Import User Variable AWS_ACCESS_KEY_ID 15 | Import User Variable AWS_SECRET_ACCESS_KEY 16 | Import User Variable REGION_NAME 17 | Set Credentials ${AWS_ACCESS_KEY_ID} ${AWS_SECRET_ACCESS_KEY} ${REGION_NAME} 18 | ${res} = Get Accounts verbose=True 19 | Push Metric ${res.count} descr=${SERVICE_DESCR} 20 | ... status_code=${res.status_code} 21 | ... ok=${res.ok} 22 | ... ok_status=${res.ok_status} 23 | -------------------------------------------------------------------------------- /libraries/RW/HashiCorp/Vault.py: -------------------------------------------------------------------------------- 1 | """ 2 | HashiCorp Vault keyword library 3 | 4 | Scope: Global 5 | """ 6 | 7 | import requests 8 | 9 | 10 | class Vault: 11 | #TODO: update docstrings 12 | """ 13 | HashiCorp Vault keyword library 14 | """ 15 | 16 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 17 | 18 | def get_health(self, url: str) -> dict: 19 | """_summary_ 20 | 21 | Args: 22 | url (str): _description_ 23 | 24 | Returns: 25 | dict: _description_ 26 | """ 27 | rsp: requests.Response = requests.get(url=url, timeout=30) 28 | return rsp.json() 29 | 30 | def check_health(self, url: str) -> bool: 31 | """_summary_ 32 | 33 | Args: 34 | url (str): _description_ 35 | 36 | Returns: 37 | dict: _description_ 38 | """ 39 | rsp: requests.Response = requests.get(url=url, timeout=30) 40 | if rsp.status_code in [200, 429]: 41 | return True 42 | return False 43 | -------------------------------------------------------------------------------- /codebundles/vault-ok/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Metadata Display Name HahiCorp Vault Health 4 | Metadata Supports vault 5 | Documentation Check the health of a Vault server. 6 | ... The response code is used to determine if the service is healthy, resulting in a metric of 1 if it is, or 0 if not. 7 | Force Tags HashiCorp Vault health HTTP 8 | Library RW.Core 9 | Library RW.HashiCorp.Vault 10 | 11 | *** Tasks *** 12 | Check If Vault Endpoint Is Healthy 13 | ${VAULT_HEALTH_URL}= RW.Core.Import User Variable VAULT_HEALTH_URL 14 | ... type=string 15 | ... description=What URL to retrieve health data from. 16 | ... pattern=\w* 17 | ... default=https://my-vault/v1/sys/health 18 | ... example=https://my-vault/v1/sys/health 19 | ${rsp}= RW.HashiCorp.Vault.Check Health url=${VAULT_HEALTH_URL} 20 | ${score}= Evaluate 1 if ${rsp} is True else 0 21 | RW.Core.Push Metric ${score} 22 | -------------------------------------------------------------------------------- /codebundles/gitlab-availability/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Check availability of a GitLab server. 3 | Metadata Display Name GitLab Availability 4 | Metadata Supports GitLab 5 | Metadata Type SLI 6 | Metadata Author Vui Le 7 | Force Tags gitlab availability 8 | Library RW.Core 9 | Library RW.HTTP 10 | #TODO: Refactor for new platform use 11 | 12 | *** Tasks *** 13 | Check GitLab Server Status 14 | Import User Variable SERVICE_DESCR 15 | Import User Variable GITLAB_URL 16 | Import User Variable GITLAB_ACCESS_TOKEN 17 | ${session} = Create Authenticated Session url=${GITLAB_URL} headers={"PRIVATE-TOKEN": "${GITLAB_ACCESS_TOKEN}"} verbose=true 18 | ${res} = GET ${GITLAB_URL} session=${session} verbose=true 19 | Debug Log ${res} 20 | Push Metric ${res.status_code} descr=${SERVICE_DESCR} 21 | # ... status_code=${res.status_code} 22 | # ... ok=${res.ok} 23 | # ... ok_status=${res.ok_status} 24 | -------------------------------------------------------------------------------- /codebundles/kong-ingress-health-gcp-promql/.runwhen/generation-rules/kong-ingress-health-gcp-promql.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: runwhen.com/v1 2 | kind: GenerationRules 3 | spec: 4 | generationRules: 5 | - resourceTypes: 6 | - ingress 7 | matchRules: 8 | - type: and 9 | matches: 10 | - type: pattern 11 | pattern: "." 12 | properties: [name] 13 | mode: substring 14 | - type: pattern 15 | pattern: "kong" 16 | properties: [spec/ingressClassName] 17 | mode: substring 18 | - resourceType: variables 19 | type: pattern 20 | pattern: "gcp" 21 | properties: [custom/cloud_provider] 22 | mode: substring 23 | slxs: 24 | - baseName: kong-ing-health 25 | qualifiers: ["resource", "namespace", "cluster"] 26 | baseTemplateName: kong-ingress-health-gcp-promql 27 | levelOfDetail: detailed 28 | outputItems: 29 | - type: slx 30 | - type: sli 31 | - type: slo 32 | - type: runbook 33 | templateName: kong-ingress-health-gcp-promql-taskset.yaml 34 | -------------------------------------------------------------------------------- /codebundles/kong-ingress-health-gcp-promql/.runwhen/templates/kong-ingress-health-gcp-promql-slx.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: runwhen.com/v1 2 | kind: ServiceLevelX 3 | metadata: 4 | name: {{slx_name}} 5 | labels: 6 | {% include "common-labels.yaml" %} 7 | annotations: 8 | {% include "common-annotations.yaml" %} 9 | spec: 10 | imageURL: https://storage.googleapis.com/runwhen-nonprod-shared-images/icons/kong-logomark-color.svg 11 | alias: {{match_resource.resource.metadata.name}}-Kong Ingress HTTP Errors 12 | asMeasuredBy: The combined score of http error rate, upstream errors, and request latency as reported by Google Managed Prometheus (GMP). 13 | configProvided: 14 | - name: OBJECT_NAME 15 | value: {{match_resource.resource.metadata.name}} 16 | owners: 17 | - {{workspace.owner_email}} 18 | statement: Kong Ingress objects should available and performant 99.5% of the time. 19 | additionalContext: 20 | namespace: "{{match_resource.resource.metadata.namespace}}" 21 | labelMap: "{{match_resource.resource.metadata.labels}}" 22 | cluster: "{{ cluster.name }}" 23 | context: "{{ cluster.context }}" -------------------------------------------------------------------------------- /libraries/pyproject.toml: -------------------------------------------------------------------------------- 1 | [tool.poetry] 2 | name = "runwhen-keywords" 3 | version = "0.0.1" 4 | description = "A set of RunWhen published keywords and python libraries" 5 | license = "Apache-2.0" 6 | authors = ["Kyle Forster ","Jonathan Funk ", "Shea Stewart "] 7 | packages = [ 8 | { include = "RW"} 9 | ] 10 | 11 | [tool.poetry.dependencies] 12 | python = ">=3.8,<4.0" 13 | robotframework = ">=4.1.2" 14 | prometheus-client = ">=0.11.0" 15 | ruamel-base = ">=1.0.0" 16 | ruamel-yaml = ">=0.17.20" 17 | kubernetes = ">=18.20.0" 18 | google-cloud-monitoring = ">=2.0.0" 19 | google-cloud-logging = ">=3.0.0" 20 | protobuf = ">=3.20.0" 21 | rocketchat-api = ">=1.16.0" 22 | boto3 = ">=1.20.0" 23 | dnspython = ">=2.0.0" 24 | pyopenssl = ">=21.0.0" 25 | slack-sdk = ">=3.19.0" 26 | python-benedict = ">=0.25.0" 27 | sdcclient = ">=0.16" 28 | snscrape = ">=0.4.3.20220106" 29 | pandas = ">=1.5.2" 30 | jmespath = ">=1.0.1" 31 | datadog-api-client = "2.8.0" 32 | 33 | [build-system] 34 | requires = ["poetry-core>=1.0.0"] 35 | build-backend = "poetry.core.masonry.api" 36 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 |

3 |
4 | 5 | Join Slack 6 | 7 |

8 | 9 | # CodeCollection Registry 10 | To explore all CodeCollections and tasks, please visit the [CodeCollection Registry](https://registry.runwhen.com/). 11 | 12 | [![Explore CodeCollection Registry](https://storage.googleapis.com/runwhen-nonprod-shared-images/screenshots/registry.png)](https://registry.runwhen.com) 13 | 14 | ## RunWhen Public Codecollection 15 | This repository is **one of many** CodeCollections that is used with the [RunWhen Platform](https://www.runwhen.com) and [RunWhen Local](https://docs.runwhen.com/public/v/runwhen-local). It contains CodeBundles that are maintained by the RunWhen team and perform health, operational, and troubleshooting tasks. 16 | 17 | Please see the **[contributing](CONTRIBUTING.md)** and **[code of conduct](CODE_OF_CONDUCT.md)** for details on adding your contributions to this project. 18 | 19 | -------------------------------------------------------------------------------- /codebundles/gcp-opssuite-metricquery/README.md: -------------------------------------------------------------------------------- 1 | # GCP Operations Suite Metric Query 2 | 3 | 4 | ## SLI 5 | Performs a metric query using a Google MQL statement on the Ops Suite API 6 | 7 | ## Use Cases 8 | 9 | ### Use Case: SLI: QCP Exceeded Quotas 10 | If quotas are being exeeced, you might be experienceing issues with providsioning new services. Use this code bundle with the following configuration to identify if any quotas are exceeded in the GCP project. 11 | 12 | - MQL Statement: 13 | ```fetch consumer_quota | metric 'serviceruntime.googleapis.com/quota/exceeded' | group_by 10m, [value_exceeded_count_true: count_true(value.exceeded)] | every 10m | group_by [],[value_exceeded_count_true_aggregate: aggregate(value_exceeded_count_true)]``` 14 | - No Result Overwite: `True` 15 | - No Result Value: `0` 16 | 17 | With this query, it's a *good* sign when no data is returned, meaning that no quotas have been exceeded. With that said, you must set to the no `result overwrite` and `no result values` so that the codebundle doesn't error out when no data is returned. 18 | 19 | ## Requirements 20 | 21 | ## TODO 22 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /libraries/RW/Slack.py: -------------------------------------------------------------------------------- 1 | """ 2 | Slack keyword library 3 | 4 | Scope: Global 5 | """ 6 | import logging 7 | import slack_sdk 8 | from slack_sdk.errors import SlackApiError 9 | from typing import Optional 10 | 11 | logging.basicConfig(level=logging.DEBUG) 12 | 13 | 14 | class Slack: 15 | """Slack keyword library can be used to send messages to Slack.""" 16 | 17 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 18 | 19 | def post_message( 20 | self, 21 | token: str, 22 | channel: str, 23 | msg: str, 24 | ) -> object: 25 | """ 26 | Post a message to a Slack channel. 27 | Examples: 28 | | Import User Variable | SLACK_BOT_TOKEN | 29 | | RW.Slack.Post Message | token=${SLACK_BOT_TOKEN} | channel='#alerts' | message=Message XYZ | 30 | """ 31 | client = slack_sdk.WebClient(token=token) 32 | try: 33 | client.chat_postMessage(channel=channel, text=f"{msg}") 34 | except SlackApiError as e: 35 | # You will get a SlackApiError if "ok" is False 36 | assert e.response["error"] # str like 'invalid_auth', 'channel_not_found' 37 | -------------------------------------------------------------------------------- /codebundles/github-status-incidents/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Check for unresolved incidents related to GitHub services, and provides a count of ongoing incidents as a metric. 3 | Metadata Display Name GitHub Status Incidents 4 | Metadata Supports GitHub,Status 5 | Metadata Type SLI 6 | Metadata Author Paul Dittaro 7 | Force Tags github availability 8 | Library RW.Core 9 | Library RW.GitHub.Status 10 | 11 | *** Tasks *** 12 | Get Number of Incidents Affecting GitHub 13 | Log Importing config variables... 14 | RW.Core.Import User Variable INCIDENT_IMPACT 15 | ... type=string 16 | ... enum=[None,Minor,Major,Critical] 17 | ... description=Impact level to filter unresolved incidents to. Filtering to a lower level will include all incidents of a higher impact level. 18 | ... example=Minor 19 | ... default=None 20 | ${incidents}= RW.GitHub.Status.Get Unresolved Incidents ${INCIDENT_IMPACT} 21 | ${metric}= Evaluate len($incidents) 22 | Log count: ${metric} 23 | RW.Core.Push Metric ${metric} 24 | -------------------------------------------------------------------------------- /libraries/RW/GCP/Chat.py: -------------------------------------------------------------------------------- 1 | """ 2 | Google Chat keyword library 3 | 4 | Scope: Global 5 | """ 6 | import json 7 | import requests 8 | 9 | from RW import platform 10 | 11 | 12 | class Chat: 13 | """ 14 | Google Chat integration to send messages via webhook to channels. 15 | 16 | To allow a channel to receive a webhook follow: https://developers.google.com/chat/how-tos/webhooks 17 | """ 18 | 19 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 20 | 21 | def send_message(self, webhook_url: platform.Secret, message, timeout=30): 22 | """ 23 | Send a message to a Google Chat channel using the webhook URL. 24 | 25 | Examples: 26 | | RW.GCP.Chat.Send Message | https://chat.googleapis.com/v1/spaces/...example... | Hello World! | 27 | | RW.GCP.Chat.Send Message | ${GCP_CHAT_WEBHOOK} | ${CHAT_MESSAGE} | 28 | 29 | Return Value: 30 | | response: requests.response | 31 | """ 32 | message = {"text": f"{message}"} 33 | headers = {"Content-Type": "application/json; charset=UTF-8"} 34 | rsp = requests.post(webhook_url.value, headers=headers, json=message, timeout=timeout) 35 | return rsp 36 | -------------------------------------------------------------------------------- /libraries/RW/AWS/strategies/UserGetClientStrategy.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import boto3 3 | from RW.AWS.strategies.GetClientStrategy import GetClientStrategy 4 | 5 | # silence verbose logging 6 | logging.getLogger('boto3').setLevel(logging.CRITICAL) 7 | logging.getLogger('botocore').setLevel(logging.CRITICAL) 8 | logging.getLogger('s3transfer').setLevel(logging.CRITICAL) 9 | logging.getLogger('urllib3').setLevel(logging.CRITICAL) 10 | 11 | class UserGetClientStrategy(GetClientStrategy): 12 | def get_client(self, service_name: str, **kwargs): 13 | client_config = { 14 | "service_name": service_name, 15 | **kwargs 16 | } 17 | if self.client and self.client_config_cache == client_config: 18 | return self.client 19 | else: 20 | self.client_config_cache = client_config 21 | self.client = None # clear cache 22 | self.client = boto3.client( 23 | service_name=service_name, 24 | region_name=self.region_name, 25 | aws_access_key_id=self.aws_access_key_id, 26 | aws_secret_access_key=self.aws_secret_access_key, 27 | **kwargs, 28 | ) 29 | return self.client -------------------------------------------------------------------------------- /codebundles/artifactory-ok/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Metadata Display Name Artifactory OK 4 | Metadata Supports Artifactory 5 | Documentation Checks an Artifactory instance health endpoint to determine its operational status. 6 | ... The response is parsed to determine if the service is healthy, resulting in a metric of 1 if it is, or 0 if not. 7 | Force Tags Arty Artifactory health HTTP 8 | Library RW.Core 9 | Library RW.Artifactory 10 | 11 | *** Tasks *** 12 | Check If Artifactory Endpoint Is Healthy 13 | ${ARTIFACTORY_HEALTH_URL}= RW.Core.Import User Variable ARTIFACTORY_HEALTH_URL 14 | ... type=string 15 | ... description=What URL to retrieve health data from. 16 | ... pattern=\w* 17 | ... default=https://my-artifactory.com/router/api/v1/system/health 18 | ... example=https://my-artifactory.com/router/api/v1/system/health 19 | ${rsp}= RW.Artifactory.Get Health url=${ARTIFACTORY_HEALTH_URL} 20 | ${status}= RW.Artifactory.Validate Health health_data=${rsp} 21 | ${score}= Evaluate 1 if ${status} is True else 0 22 | RW.Core.Push Metric ${score} 23 | -------------------------------------------------------------------------------- /libraries/RW/MSTeams.py: -------------------------------------------------------------------------------- 1 | """ 2 | MS Teams keyword library 3 | 4 | Scope: Global 5 | """ 6 | import pymsteams 7 | 8 | 9 | class MSTeams: 10 | """ 11 | MS Teams keyword library can be used to send alerts/notifications 12 | to a channel in Teams. 13 | 14 | * You need to define a team in Microsoft 365, then this team will show up 15 | in MS Teams. 16 | * In MS Teams, select the team and create a channel for it. 17 | * In the channel, set up a Connector and choose Incoming Webhook. 18 | * After configuring the Incoming Webhook, you'll get a Webhook URL 19 | which can be used by pymsteams to send a message to the channel. 20 | 21 | See https://github.com/rveachkc/pymsteams for more information. 22 | """ 23 | 24 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 25 | 26 | def send_message(self, msg: str, url: str) -> None: 27 | """ 28 | Send a message to an MS Teams channel designated by the MS Teams Webhook URL. 29 | Examples: 30 | | Import User Variable | MSTEAMS_ALERTS_CHANNEL_URL | | 31 | | RW.MSTeams.Send Message | Hello, World! | ${MSTEAMS_ALERTS_CHANNEL_URL} | 32 | """ 33 | m = pymsteams.connectorcard(url) 34 | m.text(msg) 35 | m.send() 36 | -------------------------------------------------------------------------------- /codebundles/k8s-kubectl-run/README.md: -------------------------------------------------------------------------------- 1 | # Kubernetes kubectl Run 2 | A highly generic codebundle used for running bare kubectl commands (or equivalent binaries) and presenting the stdout as a report. This allows users to take their commonly used `kubectl` triage commands for their workloads and paste them into the codebundle config, both automating and version controlling their triage process as code, which can then be shared with their team. 3 | 4 | ## TaskSet 5 | ### Use Case: TaskSet: Fetch Pod Error Logs 6 | We can generate a report containing pod logs who's entries have `Exception` or `Error` in the log line. Given the config: 7 | 8 | ``` 9 | configProvided: 10 | - name: DISTRIBUTION 11 | value: Kubernetes 12 | - name: KUBECTL_COMMAND 13 | value: >- 14 | kubectl logs deployment/my-app -n default -n my-namespace --tail=200 | grep -E -i "(Exception|Error)" 15 | ``` 16 | 17 | Which will fetch us the last 200 logs lines and parse them for issues and present those in the taskset report for us to view on the platform. 18 | 19 | ## Use Cases 20 | 21 | ## Requirements 22 | - A kubeconfig with appropriate RBAC permissions to perform the desired command. 23 | 24 | ## TODO 25 | - [ ] link to kubeconfig rbac doc 26 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /libraries/RW/Utils/Check.py: -------------------------------------------------------------------------------- 1 | from enum import Enum 2 | 3 | class Check: 4 | CHECKMARK = '\u2713' 5 | X = '\u2717' 6 | 7 | def __init__(self, title:str, value=None, symbol=None, description:str="", indented:bool=True, required:bool=False): 8 | self.title = title 9 | self.value = value 10 | self.symbol = symbol 11 | self.description = description 12 | self.indented = indented 13 | self.required = required 14 | self.passed = None 15 | self.doc_link = None 16 | self.commands = [] 17 | 18 | def __str__(self): 19 | check_str = [""] 20 | if self.indented: 21 | check_str.append("\t") 22 | if self.title: 23 | check_str.append(self.title) 24 | if self.value: 25 | check_str.append(self.value) 26 | if isinstance(self.symbol, bool): 27 | if self.symbol: 28 | check_str.append(Check.CHECKMARK) 29 | if not self.symbol: 30 | check_str.append(Check.X) 31 | if self.description: 32 | check_str.append("\n") 33 | if self.indented: 34 | check_str.append("\t") 35 | check_str.append(self.description) 36 | return " ".join(check_str) -------------------------------------------------------------------------------- /codebundles/github-status-maintenances/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Retrieve number of upcoming Github platform maintenances over a given window. 3 | Metadata Display Name GitHub Status Maintenance 4 | Metadata Supports GitHub,Status 5 | Metadata Type SLI 6 | Metadata Author Paul Dittaro 7 | Force Tags github availability 8 | Library RW.Core 9 | Library RW.GitHub.Status 10 | 11 | *** Tasks *** 12 | Get Scheduled and Active GitHub Maintenance Windows 13 | Log Importing config variables... 14 | RW.Core.Import User Variable DURATION 15 | ... type=string 16 | ... pattern=((\d+?)d)?((\d+?)h)?((\d+?)m)?((\d+?)s)? 17 | ... description=How far ahead to retrieve scheduled maintenances, in the format "1d7h10m", with possible unit values being 'd' representing days, 'h' representing hours, 'm' representing minutes, and 's' representing seconds. 18 | ... example=1d7h10m 19 | ${PARSED_DURATION}= Evaluate $DURATION if $DURATION is not "" else None 20 | ${maintenances}= RW.GitHub.Status.Get Scheduled Maintenances ${PARSED_DURATION} 21 | ${metric}= Evaluate len($maintenances) 22 | Log count: ${metric} 23 | RW.Core.Push Metric ${metric} 24 | -------------------------------------------------------------------------------- /libraries/RW/K8s/pdb_tasks_mixin.py: -------------------------------------------------------------------------------- 1 | from benedict import benedict 2 | 3 | class PdbTasksMixin: 4 | def check_pdb( 5 | self, 6 | pdbs, 7 | ): 8 | # TODO: finish pdbs 9 | pdbs = benedict(pdbs, keypath_separator=None) 10 | results = benedict({}, keypath_separator=None) 11 | results["check_passed"] = True 12 | results["exists"] = False 13 | results["maps"] = False 14 | results["pdbs"] = [] 15 | return results 16 | 17 | def format_pdb_report( 18 | self, 19 | report_data=benedict({}, keypath_separator=None), 20 | pdb_doc_link="https://kubernetes.io/docs/tasks/run-application/configure-pdb/", 21 | mute_suggestions:bool=False, 22 | ): 23 | report_lines = [] 24 | report_data = benedict(report_data, keypath_separator=None) 25 | # TODO: finish pdb 26 | # exists 27 | # maps to deployment 28 | # not 0 29 | # not 100% 30 | if not mute_suggestions and not report_data["check_passed"]: 31 | report_lines.append( 32 | f"\tNot all containers have resources fully set, consider reviewing: {pdb_doc_link}" 33 | ) 34 | report_lines.append("Pod Disruption Budget Checks") 35 | return "\n".join(report_lines) 36 | -------------------------------------------------------------------------------- /codebundles/opsgenie-alert/runbook.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Create an alert in Opsgenie. 3 | Metadata Display Name OpsGenie Create Alert 4 | Metadata Supports opsgenie 5 | Metadata Author Vui Lee 6 | Suite Setup Runbook Setup 7 | Library RW.Core 8 | Library RW.Opsgenie 9 | #TODO: Refactor for new platform use 10 | 11 | *** Keywords *** 12 | Runbook Setup 13 | RW.Core.Import User Variable OPSGENIE_API_KEY 14 | RW.Core.Import User Variable OPSGENIE_TEAM_INTEGRATION_API_KEY 15 | 16 | *** Tasks *** 17 | Get Opsgenie System Info 18 | [Documentation] Get information about the Opsgenie system. 19 | #[Tags] skipped 20 | RW.Opsgenie.Create Session ${OPSGENIE_API_KEY} 21 | ${res} = RW.Opsgenie.Get Info 22 | RW.Core.Info Project name: ${res.data.name} 23 | RW.Core.Info Opsgenie plan: ${res.data.plan.name} 24 | RW.Core.Info User count: ${res.data.user_count} 25 | 26 | Create An Alert 27 | [Documentation] Create a new alert in Opsgenie. 28 | #[Tags] skipped 29 | RW.Opsgenie.Create Session ${OPSGENIE_TEAM_INTEGRATION_API_KEY} 30 | ${res} = RW.Opsgenie.Create Alert 31 | ... summary=backend-service is down 32 | ... description=HTTP status code: 500 33 | ... priority=P2 34 | RW.Core.Info Request ID: ${res.request_id} 35 | -------------------------------------------------------------------------------- /libraries/RW/GitHub/robot_tests/status.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Library RW.GitHub.Status 3 | 4 | *** Variables *** 5 | ${GITHUB_COMPONENTS} {"Webhooks", "Actions"} 6 | ${INCIDENT_IMPACT} Minor 7 | ${DURATION} 2d4h 8 | 9 | *** Tasks *** 10 | Get Availability of GitHub: 11 | ${availability} = Get Github Availability 12 | Log To Console ${availability} 13 | 14 | Get Availability of Select GitHub Components: 15 | ${availability} = Get Github Availability ${GITHUB_COMPONENTS} 16 | Log To Console ${availability} 17 | 18 | Get Number of Unresolved GitHub Incidents: 19 | ${incidents}= Get Unresolved Incidents 20 | ${metric}= Evaluate len($incidents) 21 | Log To Console ${metric} 22 | 23 | Get Number of Unresolved GitHub Incidents of at least Minor impact: 24 | ${incidents}= Get Unresolved Incidents ${INCIDENT_IMPACT} 25 | ${metric}= Evaluate len($incidents) 26 | Log To Console ${metric} 27 | 28 | Get Number of Active Scheduled Maintenances: 29 | ${maintenances}= Get Scheduled Maintenances 30 | ${metric}= Evaluate len($maintenances) 31 | Log To Console ${metric} 32 | 33 | Get Number of Active Scheduled Maintenances Over The Next Week: 34 | ${maintenances}= Get Scheduled Maintenances ${DURATION} 35 | ${metric}= Evaluate len($maintenances) 36 | Log To Console ${metric} 37 | -------------------------------------------------------------------------------- /codebundles/github-status-components/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Check status of the GitHub platform (https://www.githubstatus.com/) for a specified set of GitHub service components. 3 | ... The metric supplied is a aggregated percentage indicating the availability of the components with 1 = 100% available. 4 | Metadata Display Name GitHub Service Status 5 | Metadata Supports GitHub,Status 6 | Metadata Type SLI 7 | Metadata Author Paul Dittaro 8 | Force Tags github availability statuspage status 9 | Library RW.Core 10 | Library RW.GitHub.Status 11 | 12 | *** Tasks *** 13 | Get Availability of GitHub or Individual GitHub Components 14 | Log Importing config variables... 15 | RW.Core.Import User Variable GITHUB_COMPONENTS 16 | ... type=string 17 | ... description=The CSV list of GitHub Components to use to determine availability. Visit https://www.githubstatus.com/ for complete list. 18 | ... example=Webhooks,Actions,Git Operations,API Requests,Issues,Pull Requests,Packages,Pages,Codespaces,Copilot 19 | ${PARSED_GITHUB_COMPONENTS}= Evaluate set($GITHUB_COMPONENTS.split(',')) if $GITHUB_COMPONENTS != "" else None 20 | ${metric}= RW.GitHub.Status.Get Github Availability ${PARSED_GITHUB_COMPONENTS} 21 | Log metric: ${metric} 22 | RW.Core.Push Metric ${metric} 23 | -------------------------------------------------------------------------------- /codebundles/elasticsearch-health/sli.robot: -------------------------------------------------------------------------------- 1 | ** Settings ** 2 | Documentation Check Elasticsearch cluster health 3 | Metadata Display Name ElasticSearch Health 4 | Metadata Supports elasticsearch 5 | Metadata Type SLI 6 | Metadata Author Vui Le 7 | Force Tags Elasticsearch cluster health 8 | Library RW.Core 9 | Library RW.Elasticsearch 10 | 11 | ** Tasks ** 12 | Check Elasticsearch Cluster Health 13 | Import User Variable SERVICE_DESCR 14 | Import User Variable ELASTICSEARCH_URL 15 | # ${res} = RW.Elasticsearch.Get Health Status ${ELASTICSEARCH_URL} verbose=True 16 | ${res} = RW.Elasticsearch.Get Shard Health Status ${ELASTICSEARCH_URL} index=.geoip_databases verbose=True 17 | Info Log ${res} 18 | 19 | Console Log HTTP status code: ${res.status_code} (${res.reason}) 20 | Console Log Elasticsearch cluster health status: ${res.cluster_status} 21 | Console Log ${res.content} 22 | 23 | Push Metric ${res.ok} descr=${SERVICE_DESCR} 24 | ... status_code=${res.status_code} 25 | ... cluster_name=${res.cluster_name} 26 | ... cluster_status=${res.cluster_status} 27 | ... ok_status=${res.ok_status} 28 | 29 | -------------------------------------------------------------------------------- /libraries/RW/Curl/Curl.py: -------------------------------------------------------------------------------- 1 | import requests 2 | import logging 3 | import urllib 4 | import json 5 | import dateutil.parser 6 | from RW import platform, Utils 7 | 8 | logger = logging.getLogger(__name__) 9 | 10 | 11 | 12 | class Curl: 13 | """ 14 | A keyword library for housing general-purpose Curl keywords. 15 | """ 16 | 17 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 18 | def run_curl( 19 | self, cmd: str, 20 | optional_headers: platform.Secret, 21 | target_service: platform.Service 22 | ) -> dict: 23 | """Robot Keyword to manipulate curl before passing to rwplatform.execute_shell_command. 24 | 25 | """ 26 | optional_headers = Utils.secret_to_curl_headers(optional_headers=optional_headers) 27 | curl_str: str = Utils.create_curl(cmd=cmd, optional_headers=optional_headers) 28 | request_optional_headers = platform.ShellServiceRequestSecret(optional_headers) 29 | rsp = platform.execute_shell_command( 30 | cmd=curl_str, 31 | service=target_service, 32 | request_secrets=[request_optional_headers] 33 | ) 34 | if rsp.status != 200: 35 | raise ValueError(f"Received HTTP status of {rsp.status} from response {rsp}") 36 | if rsp.returncode > 0: 37 | raise ValueError(f"Recieved return code of {rsp.returncode} from response {rsp}") 38 | rsp = json.loads(rsp.stdout) 39 | return rsp 40 | -------------------------------------------------------------------------------- /codebundles/curl-generic/README.md: -------------------------------------------------------------------------------- 1 | # CURL Generic 2 | A generic curl codebundle that uses the curl service. Supports jq for processing output and expects to output in json format. 3 | 4 | ## SLI 5 | A curl SLI for querying and extracting data from a generic curl call. Uses the hosted curl service, supports jq for parsing, and should prodice a single metric. 6 | 7 | ## TaskSet 8 | A curl TaskSet for querying and extracting data from a generic curl call. Uses the hosted curl service, supports jq for parsing, will output in json. 9 | 10 | ## Use Cases 11 | ### SLI: Count the number GitHub Repo Stargazers 12 | This example uses the SLI to collect the list of stargazers on a GitHub repo, uses jq to count them up, and pushes the metric. 13 | 14 | ``` 15 | CURL_COMMAND="curl --silent -X GET https://api.github.com/repos/runwhen-contrib/rw-public-codecollection/stargazers | jq length" 16 | ``` 17 | ### TaskSet: Generate a report of GitHub Repo Stargazers by login-name 18 | This example uses the SLI to collect the list of stargazers on a GitHub repo, uses jq to count them up, and pushes the metric. 19 | 20 | ``` 21 | CURL_COMMAND="curl -X GET https://api.github.com/repos/runwhen-contrib/rw-public-codecollection/stargazers | jq '.[] | .login'" 22 | ``` 23 | 24 | ## Requirements 25 | 26 | ## TODO 27 | - [ ] Add additional filtering capabilities, SLI math (e.g. avg, sum, count)to mimic k8s-kubectl-get 28 | - [ ] Add additional report formatting so that it's not just json -------------------------------------------------------------------------------- /codebundles/github-get-repos-latency/runbook.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Create a new issue in GitHub Issues. 3 | Metadata Display Name GitHub API Latency 4 | Metadata Supports GitHub 5 | Metadata Type Runbook 6 | Metadata Author Vui Le 7 | Force Tags github latency troubleshooting 8 | Suite Setup Runbook Setup 9 | Library RW.Core 10 | Library RW.GitHub 11 | #TODO: Refactor for new platform use 12 | 13 | *** Tasks *** 14 | Check Latency When Creating a New GitHub Issue 15 | ${body} = Catenate SEPARATOR=\n 16 | ... **Testing** *1 2 3* 17 | ... 1. item 1 18 | ... 1. item 2 19 | ... ``` 20 | ... a : int = 1 21 | ... b : int = 2 22 | ... c : int = a + b 23 | ... print(f"c is {c}") 24 | ... ``` 25 | ${res} = RW.GitHub.Create Issue 26 | ... token=${GITHUB_TOKEN} 27 | ... repo_name=${GITHUB_REPO_NAME} 28 | ... title=[Troubleshooting] Runbook: github-get-repos-latency 29 | ... assignee=${USER} 30 | ... labels=troubleshooting 31 | ... body=${body} 32 | Info Log GitHub Create Issue result: ${res} 33 | Info Log GitHub Create Issue latency: ${res.latency} 34 | 35 | *** Keywords *** 36 | Runbook Setup 37 | Import User Variable GITHUB_REPO_NAME 38 | Import User Variable GITHUB_TOKEN 39 | Import User Variable GITHUB_USER 40 | -------------------------------------------------------------------------------- /libraries/RW/Artifactory/Artifactory.py: -------------------------------------------------------------------------------- 1 | import requests 2 | from RW.K8s import K8s 3 | from benedict import benedict 4 | 5 | 6 | class Artifactory: 7 | """_summary_ 8 | 9 | Returns: 10 | _type_: _description_ 11 | """ 12 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 13 | 14 | HEALTHY = "HEALTHY" 15 | UNHEALTHY = "UNHEALTHY" 16 | 17 | def __init__(self): 18 | self._k8s = K8s() 19 | 20 | def get_health(self, url: str) -> dict: 21 | """_summary_ 22 | 23 | Args: 24 | url (str): _description_ 25 | 26 | Returns: 27 | dict: _description_ 28 | """ 29 | rsp = requests.get(url=url, timeout=30) 30 | return rsp.json() 31 | 32 | def validate_health(self, health_data: dict) -> bool: 33 | """_summary_ 34 | 35 | Args: 36 | health_data (dict): _description_ 37 | 38 | Returns: 39 | bool: _description_ 40 | """ 41 | health_data: benedict = benedict(health_data) 42 | if "router.state" not in health_data: 43 | return False 44 | if health_data["router.state"] != Artifactory.HEALTHY: 45 | return False 46 | if "services" in health_data: 47 | services: list[dict] = health_data["services"] 48 | for service in services: 49 | if service["state"] != Artifactory.HEALTHY: 50 | return False 51 | return True 52 | -------------------------------------------------------------------------------- /codebundles/grpc-grpcurl-unary/README.md: -------------------------------------------------------------------------------- 1 | # gRPC Unary 2 | A generic gRPC codebundle that uses the the grpcurl service to send requests to gRPC services. The user can paste in their favorite grpcurl shell commands and fetch data with them. 3 | Supports jq for processing output and expects to output in json format. 4 | 5 | ## SLI 6 | A grpcurl SLI for querying and extracting data from a generic grpcurl call. Uses the hosted grpcurl service, supports jq for parsing, and should produce a single metric. 7 | 8 | ## TaskSet 9 | A gprcurl TaskSet for querying and extracting data from a generic grpcurl call. Uses the hosted grpcurl service, supports jq for parsing, will typically output in json. 10 | 11 | ## Use Cases 12 | ### SLI: Use gRPC result as metric 13 | This example uses the SLI to fetch json data from an arbitrary gRPC service and submit a value from the json payload as a metric. 14 | 15 | ``` 16 | GRPCURL_COMMAND="grpcurl -plaintext -d '{"greeting": "1"}' grpc.postman-echo.com:443 HelloService/SayHello | jq '(.reply | split(" "))[1]'" 17 | ``` 18 | ### TaskSet: Show gRPC service proto information 19 | This example uses the TaskSet to show the proto information of a gRPC service. 20 | 21 | ``` 22 | GRPCURL_COMMAND="grpcurl -plaintext grpc.postman-echo.com:443 describe" 23 | ``` 24 | 25 | ## Requirements 26 | - The gRPCurl command to run 27 | - A gRPC service with server reflection enabled 28 | 29 | ## TODO 30 | - [ ] Support proto file uploads 31 | - [ ] Add support for other streaming methods 32 | - [ ] Add additional report formatting so that it's not just json -------------------------------------------------------------------------------- /libraries/RW/K8s/statefulset_tasks_mixin.py: -------------------------------------------------------------------------------- 1 | from benedict import benedict 2 | class StatefuletTasksMixin: 3 | def stateful_sets_ready( 4 | self, statefulsets 5 | ): 6 | if "items" in statefulsets: 7 | statefulsets = statefulsets["items"] 8 | # validate list of statefulsets 9 | if isinstance(statefulsets, list): 10 | for statefulset in statefulsets: 11 | statefulset = benedict(statefulset, keypath_separator=None) 12 | desired = int(statefulset["status", "replicas"]) 13 | if ["status", "readyReplicas"] in statefulset: 14 | ready = int(statefulset["status", "readyReplicas"]) 15 | else: 16 | ready = 0 17 | if desired != ready: 18 | return False 19 | # validate singular statefulset 20 | elif isinstance(statefulsets, dict) and "items" not in statefulsets: 21 | statefulset = benedict(statefulset, keypath_separator=None) 22 | desired = int(statefulset["status", "replicas"]) 23 | if ["status", "readyReplicas"] in statefulset: 24 | ready = int(statefulset["status", "readyReplicas"]) 25 | else: 26 | ready = 0 27 | if desired != ready: 28 | return False 29 | else: 30 | raise KeyError( 31 | f"Stateful sets object is malformed {statefulsets}, is it well-formed and unpacked from items?" 32 | ) 33 | return True 34 | -------------------------------------------------------------------------------- /codebundles/jira-search-issues-latency/runbook.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Display Name Jira Search Issue Latency 3 | Metadata Supports Jira 4 | Metadata Display Name gRPC cURL Unary 5 | Metadata Author Vui Lee 6 | Documentation Create an issue in Jira. 7 | Suite Setup Runbook Setup 8 | Library RW.Core 9 | Library RW.Jira 10 | #TODO: Refactor for new platform use 11 | 12 | *** Keywords *** 13 | Runbook Setup 14 | Import User Variable JIRA_URL 15 | Import User Variable JIRA_USER 16 | Import User Variable JIRA_USER_TOKEN 17 | 18 | *** Tasks *** 19 | Create a new Jira Issue 20 | [Documentation] Create a new issue in Jira 21 | Connect to Jira server=${JIRA_URL} user=${JIRA_USER} token=${JIRA_USER_TOKEN} 22 | ${res} = RW.Jira.Create Issue 23 | ... project=TJ summary=This is a test description=Add more details here. 24 | Info Log Created issue: ${res.key} 25 | # Get all fields for issue. 26 | ${res} = RW.Jira.Get Issue issue_id=${res.key} verbose=${true} 27 | # Get specific fields for issue. 28 | ${res} = RW.Jira.Get Issue 29 | ... issue_id=${res.key} fields=assignee,summary,status,priority 30 | ${msg} = Catenate Issue details: 31 | ... assignee=${res.fields.assignee}, 32 | ... summary=${res.fields.summary}, 33 | ... status=${res.fields.status}, 34 | ... priority=${res.fields.priority} 35 | Info Log ${msg} 36 | Assign Issue ${res.key} Vui Le 37 | -------------------------------------------------------------------------------- /codebundles/http-ok/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Metadata Display Name HTTP OK 4 | Metadata Supports HTTP 5 | Documentation Check if an HTTP request against a URL fails or times out of a given latency window. 6 | ... A return of 1 is considered a success, while a 0 is failure. 7 | Force Tags Url Errors HTTP Status Latency Metric 8 | Library RW.Core 9 | Library RW.HTTP 10 | 11 | *** Tasks *** 12 | Checking HTTP URL Is Available And Timely 13 | ${URL}= RW.Core.Import User Variable URL 14 | ... type=string 15 | ... description=What URL to perform requests against. 16 | ... pattern=\w* 17 | ... default=https://www.runwhen.com 18 | ... example=https://www.runwhen.com 19 | ${TARGET_LATENCY}= RW.Core.Import User Variable TARGET_LATENCY 20 | ... type=string 21 | ... description=The maximum latency in seconds as a float value allowed for requests to have. 22 | ... pattern=\w* 23 | ... default=1.2 24 | ... example=1.2 25 | ${rsp}= RW.HTTP.Get ${URL} 26 | ${latency}= Set Variable ${rsp.latency} 27 | ${latency_within_target}= Evaluate 1 if ${latency} <= ${TARGET_LATENCY} else 0 28 | ${status_code}= Set Variable ${rsp.status_code} 29 | ${ok}= Set Variable ${rsp.ok} 30 | ${ok_int}= Evaluate 1 if ${ok} else 0 31 | ${score}= Evaluate int(${latency_within_target}*${ok_int}) 32 | RW.Core.Push Metric ${score} 33 | -------------------------------------------------------------------------------- /codebundles/aws-account-limit/runbook.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Retrieve all recently created AWS accounts. 3 | Metadata Display Name AWS Account Creation Notification 4 | Metadata Type Runbook 5 | Metadata Author Vui Le 6 | Metadata Supports aws,iam 7 | Force Tags aws accounts 8 | Suite Setup Runbook Setup 9 | Suite Teardown Runbook Teardown 10 | Library RW.Core 11 | Library RW.AWS 12 | Library RW.Slack 13 | Library RW.Report 14 | #TODO: Refactor for new platform use 15 | 16 | *** Tasks *** 17 | Get The Recently Created AWS Accounts 18 | ${res} = Get Recently Created Accounts verbose=true 19 | Add To Report *Accounts* 20 | Add To Report ${res.accounts} # fields=Name Id Email Arn Status JoinedDatetime OrganizationUnitFullName 21 | 22 | *** Keywords *** 23 | Runbook Setup 24 | Import User Variable SERVICE_DESCR 25 | Import User Variable AWS_ACCESS_KEY_ID 26 | Import User Variable AWS_SECRET_ACCESS_KEY 27 | Import User Variable REGION_NAME 28 | Import User Variable SLACK_CHANNEL 29 | Import User Variable SLACK_BOT_TOKEN 30 | Set Credentials ${AWS_ACCESS_KEY_ID} ${AWS_SECRET_ACCESS_KEY} ${REGION_NAME} 31 | 32 | Runbook Teardown 33 | ${report} = Get Report 34 | RW.Slack.Post Message 35 | ... token=${SLACK_BOT_TOKEN} 36 | ... channel=${SLACK_CHANNEL} 37 | ... flag=red 38 | ... title=${SERVICE_DESCR} Troubleshooting Report 39 | ... msg=${report} 40 | -------------------------------------------------------------------------------- /codebundles/gcp-opssuite-metricquery/.runwhen/templates/gcp-quota-sli.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: runwhen.com/v1 2 | kind: ServiceLevelIndicator 3 | metadata: 4 | name: {{slx_name}} 5 | labels: 6 | {% include "common-labels.yaml" %} 7 | annotations: 8 | {% include "common-annotations.yaml" %} 9 | spec: 10 | displayUnitsLong: Number 11 | displayUnitsShort: '#' 12 | locations: 13 | - {{default_location}} 14 | description: Measures services that have quota alerts firing in GCP 15 | codeBundle: 16 | {% if repo_url %} 17 | repoUrl: {{repo_url}} 18 | {% else %} 19 | repoUrl: https://github.com/runwhen-contrib/rw-public-codecollection.git 20 | {% endif %} 21 | {% if ref %} 22 | ref: {{ref}} 23 | {% else %} 24 | ref: main 25 | {% endif %} 26 | pathToRobot: codebundles/gcp-opssuite-metricquery/sli.robot 27 | intervalStrategy: intermezzo 28 | intervalSeconds: 30 29 | configProvided: 30 | - name: NO_RESULT_OVERWRITE 31 | value: 'Yes' 32 | - name: PROJECT_ID 33 | value: {{custom.gcp_project_id}} 34 | - name: MQL_STATEMENT 35 | value: >- 36 | fetch consumer_quota | metric 37 | 'serviceruntime.googleapis.com/quota/exceeded' | group_by 10m, 38 | [value_exceeded_count_true: count_true(value.exceeded)] | every 10m | 39 | group_by [],[value_exceeded_count_true_aggregate: 40 | aggregate(value_exceeded_count_true)] 41 | - name: NO_RESULT_VALUE 42 | value: '0' 43 | secretsProvided: 44 | - name: ops-suite-sa 45 | workspaceKey: {{custom.gcp_ops_suite_sa}} 46 | servicesProvided: [] 47 | 48 | -------------------------------------------------------------------------------- /codebundles/kong-ingress-health-gcp-promql/.runwhen/templates/kong-ingress-health-gcp-promql-taskset.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: runwhen.com/v1 2 | kind: Runbook 3 | metadata: 4 | name: {{slx_name}} 5 | labels: 6 | {% include "common-labels.yaml" %} 7 | annotations: 8 | {% include "common-annotations.yaml" %} 9 | spec: 10 | location: {{default_location}} 11 | codeBundle: 12 | repoUrl: https://github.com/runwhen-contrib/rw-cli-codecollection.git 13 | ref: main 14 | pathToRobot: codebundles/curl-gmp-kong-ingress-inspection/runbook.robot 15 | configProvided: 16 | - name: TIME_SLICE 17 | value: '1m' 18 | - name: GCP_PROJECT_ID 19 | value: {{custom.gcp_project_id}} 20 | - name: HTTP_ERROR_CODES 21 | value: 5.* 22 | - name: HTTP_ERROR_RATE_THRESHOLD 23 | value: '0.5' 24 | - name: INGRESS_UPSTREAM 25 | value: {{match_resource.resource.spec.rules[0].http.paths[0].backend.service.name}}.{{match_resource.resource.metadata.namespace}}.{{match_resource.resource.spec.rules[0].http.paths[0].backend.service.port.number}} 26 | - name: INGRESS_SERVICE 27 | value: {{match_resource.resource.metadata.namespace}}.{{match_resource.resource.metadata.name}}.{{match_resource.resource.spec.rules[0].http.paths[0].backend.service.name}}.{{match_resource.resource.spec.rules[0].http.paths[0].backend.service.port.number}} 28 | - name: REQUEST_LATENCY_THRESHOLD 29 | value: '100' 30 | secretsProvided: 31 | - name: gcp_credentials_json 32 | workspaceKey: {{custom.gcp_ops_suite_sa}} 33 | servicesProvided: 34 | - name: gcloud 35 | locationServiceName: gcloud-service.shared -------------------------------------------------------------------------------- /libraries/RW/GCP/robot_tests/servicehealth.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Library RW.GCP.ServiceHealth 3 | 4 | *** Variables *** 5 | ${SECONDS_IN_PAST} 1m 6 | ${PRODUCT_LIST} Google Cloud Console, Google Cloud SQL, Google Kubernetes Engine 7 | ${REGION} us-central1, us-west2 8 | 9 | *** Tasks *** 10 | Get Number Of GCP Incidents 11 | ${history}= RW.GCP.ServiceHealth.Get Status Json 12 | ${filtered}= RW.GCP.ServiceHealth.Filter Status Results ${history} ${SECONDS_IN_PAST} 13 | ${metric}= Evaluate len($filtered) 14 | 15 | Get Number Of GCP Incidents For 2 Products 16 | ${history}= RW.GCP.ServiceHealth.Get Status Json 17 | ${filtered}= RW.GCP.ServiceHealth.Filter Status Results 18 | ... ${history} 19 | ... ${SECONDS_IN_PAST} 20 | ... products=${PRODUCT_LIST} 21 | ${metric}= Evaluate len($filtered) 22 | 23 | Get Number Of GCP Incidents For 2 Products In 2 Regions 24 | ${history}= RW.GCP.ServiceHealth.Get Status Json 25 | ${filtered}= RW.GCP.ServiceHealth.Filter Status Results 26 | ... ${history} 27 | ... ${SECONDS_IN_PAST} 28 | ... products=${PRODUCT_LIST} 29 | ... regions=${REGION} 30 | ${metric}= Evaluate len($filtered) 31 | 32 | Get Large Amount Of History Of Incidents 33 | ${history}= RW.GCP.ServiceHealth.Get Status Json 34 | ${filtered}= RW.GCP.ServiceHealth.Filter Status Results 35 | ... ${history} 36 | ... 31556952 37 | ... products=Google Cloud Console 38 | ... check_ongoing=False 39 | Log ${filtered} 40 | ${metric}= Evaluate len($filtered) 41 | -------------------------------------------------------------------------------- /codebundles/sysdig-monitor-metric/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Metadata Display Name Sysdig Monitor Metric 4 | Metadata Supports sysdig,sysdig-monitor 5 | Documentation Queries the Sysdig data API to fetch metric data. 6 | Force Tags Prometheus Prom PromQL Query Metric Aggregate 7 | Suite Setup Suite Initialization 8 | Library RW.Core 9 | Library RW.Sysdig 10 | 11 | *** Keywords *** 12 | Suite Initialization 13 | ${SYSDIG_TOKEN}= RW.Core.Import Secret SYSDIG_TOKEN 14 | ... type=string 15 | ... description=The sysdig API bearer token used in requests to authenticate. 16 | ... pattern=\w* 17 | ... example=my-token 18 | RW.Core.Import User Variable SYSDIG_URL 19 | ... type=string 20 | ... description=The sysdig URL to perform requests against. 21 | ... pattern=\w* 22 | ... example=https://app.sysdigcloud.com 23 | RW.Core.Import User Variable API_QUERY 24 | ... type=string 25 | ... description=The sysdig data api query to use. See https://docs.sysdig.com/en/docs/developer-tools/working-with-the-data-api/ 26 | ... pattern=\w* 27 | ... example=[{"id": "cpu.used.percent", "aggregations": {"time": "timeAvg", "group": "avg"}}] 28 | Set Suite Variable ${SYSDIG_TOKEN} ${SYSDIG_TOKEN} 29 | 30 | *** Tasks *** 31 | Query Sysdig Metric Data And Pushing Metric 32 | ${rsp}= RW.Sysdig.Get Metric Data token=${SYSDIG_TOKEN} sdc_url=${SYSDIG_URL} query_str=${API_QUERY} 33 | ${metric}= Set Variable ${rsp} 34 | RW.Core.Push Metric ${metric} 35 | -------------------------------------------------------------------------------- /.github/workflows/release.yml: -------------------------------------------------------------------------------- 1 | name: Release (Semver from VERSION) 2 | 3 | on: 4 | workflow_dispatch: 5 | push: 6 | branches: [ main ] 7 | paths: 8 | - VERSION 9 | - .github/workflows/release.yaml 10 | 11 | permissions: 12 | contents: write 13 | 14 | jobs: 15 | release: 16 | runs-on: ubuntu-latest 17 | steps: 18 | - name: Checkout 19 | uses: actions/checkout@v4 20 | with: 21 | fetch-depth: 0 22 | persist-credentials: true 23 | 24 | - name: Fetch tags 25 | run: git fetch --force --tags --prune 26 | 27 | - name: Tag and release from VERSION 28 | env: 29 | GH_TOKEN: ${{ github.token }} 30 | run: | 31 | set -euo pipefail 32 | 33 | VERSION="$(tr -d ' \t\r\n' < VERSION)" 34 | if [[ ! "$VERSION" =~ ^[0-9]+\.[0-9]+\.[0-9]+$ ]]; then 35 | echo "VERSION must be MAJOR.MINOR.PATCH, got: '$VERSION'" 36 | exit 1 37 | fi 38 | TAG="v${VERSION}" 39 | 40 | # No-op if the tag already exists (safe reruns) 41 | if git rev-parse -q --verify "refs/tags/${TAG}" >/dev/null || \ 42 | git ls-remote --exit-code --tags origin "${TAG}" >/dev/null 2>&1; then 43 | echo "Tag ${TAG} already exists. Nothing to do." 44 | exit 0 45 | fi 46 | 47 | git config user.name "${GITHUB_ACTOR}" 48 | git config user.email "${GITHUB_ACTOR}@users.noreply.github.com" 49 | 50 | git tag -a "${TAG}" -m "Release ${TAG}" 51 | git push origin "refs/tags/${TAG}" 52 | 53 | # Create GitHub release with auto-generated notes 54 | gh release create "${TAG}" --generate-notes --latest 55 | -------------------------------------------------------------------------------- /codebundles/gitlab-availability/runbook.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Troubleshoot issues with GitLab server availability. 3 | Metadata Display Name GitLab Availability 4 | Metadata Supports GitLab 5 | Metadata Type Runbook 6 | Metadata Author Vui Le 7 | Force Tags gitlab availability troubleshooting 8 | Suite Setup Runbook Setup 9 | Suite Teardown Runbook Teardown 10 | Library RW.Core 11 | Library RW.HTTP 12 | Library RW.Report 13 | Library RW.Slack 14 | #TODO: Refactor for new platform use 15 | 16 | *** Tasks *** 17 | Check GitLab Server Status 18 | ${session} = Create Authenticated Session url=${GITLAB_URL} headers={"PRIVATE-TOKEN": "${GITLAB_ACCESS_TOKEN}"} verbose=true 19 | Debug Log ${session} 20 | ${res} = GET url=${GITLAB_URL} session=${session} 21 | Debug Log ${res} 22 | Add To Report URL: ${GITLAB_URL} 23 | Add To Report Error code: ${res.status_code} 24 | Add To Report Error message: ${res.reason} 25 | Close Session ${session} 26 | 27 | *** Keywords *** 28 | Runbook Setup 29 | Import User Variable SERVICE_DESCR 30 | Import User Variable GITLAB_URL 31 | Import User Variable GITLAB_ACCESS_TOKEN 32 | Import User Variable SLACK_CHANNEL 33 | Import User Variable SLACK_BOT_TOKEN 34 | 35 | Runbook Teardown 36 | ${report} = Get Report 37 | Debug Log ${report} console=true 38 | RW.Slack.Post Message 39 | ... token=${SLACK_BOT_TOKEN} 40 | ... channel=${SLACK_CHANNEL} 41 | ... flag=red 42 | ... title=${SERVICE_DESCR} Troubleshooting Report 43 | ... msg=${report} 44 | -------------------------------------------------------------------------------- /libraries/README.md: -------------------------------------------------------------------------------- 1 | # Coding Conventions 2 | 3 | ## Python vs Robot 4 | Modules found in the RW package use either uppercase filenames for Robot Keyword modules and lowercase 5 | filenames for python interface modules. 6 | 7 | We recognize this looks like considerable duplicate/boilerplate code as there are a number of cases where 8 | a Robot Keyword module will import a python module with a similar name, only to expose many of the same 9 | function calls as class methods. The rationale is below. 10 | 11 | Robot Framework libraries here are built as class libraries, i.e. the Robot Framework runtime environment 12 | controls their lifecycle. While this is well documented (use the 'ROBOT_LIBRARY_SCOPE' attribute), few 13 | of us read the docs that closely and thus we assume that many people will make mistakes and instantiate their 14 | own instances of these library classes. While in many cases this is harmless, think about the situation where 15 | a library class is written as ROBOT_LIBRARY_SCOPE="GLOBAL" but a keyword author creates instances of it 16 | in their class, which is scoped as "TASK". You will now have n instances of this library floating around 17 | where the authors explicitly expected only a singleton, an issue if that is making expensive set-up calls 18 | (think DDoS'ing the back-end asking to authenticate once or more per Task). 19 | 20 | As a result of this potential for harmful lifecycle errors, we decided that the boilerplate code effort 21 | was worth the safety. 22 | 23 | ## Core vs Utils 24 | Core is a set of keywords (and rw.core a set of python functions) intended to interface to the RunWhen platform 25 | and the various features it provides for Robot Authors / keyword Authors. Utils are general 26 | utility functions, available in this repo. -------------------------------------------------------------------------------- /codebundles/gcp-gcloudcli-generic/README.md: -------------------------------------------------------------------------------- 1 | # Run Generic Gcloud Commands 2 | These two codebundle can be used to run arbitrary gcloud commands to perform automated tasks, capture output for a report, or return a metric for surfacing in an SLI. 3 | 4 | > Note: the `gcloud auth activate-service-account` call is done for you implicitly, so there's no need to add it into your command string. 5 | 6 | ## SLI 7 | A gcloud SLI for querying and extracting data from a generic gcloud call. Uses the hosted gcloud service, supports jq for parsing, and should prodice a single metric. 8 | 9 | ## TaskSet 10 | Run a gcloud cli command and capture its output for use in a report, such as logs, restarting a VM, etc. 11 | 12 | ## Use Cases 13 | ### SLI: Get Number of Error Logs 14 | This example uses the SLI fetches the up to 20 warning/error log entries in the last 15 minutes as json, before counting the number of entries and providing it as a metric for your SLI. 15 | 16 | ``` 17 | GCLOUD_COMMAND='gcloud logging read "severity>=WARNING" --freshness=15m --limit=20 --format=json | jq length' 18 | ``` 19 | 20 | ### TaskSet: Fetch Last 5 Errors and Present in Report 21 | This example uses the TaskSet variant of the codebundle to fetch stdout and place it into a report on the platform for display to to users. In this case we're adding the last 5 warning/error log entries to a report (the entries will default to yaml) 22 | 23 | ``` 24 | GCLOUD_COMMAND='gcloud logging read "severity>=WARNING" --freshness=15m --limit=5' 25 | ``` 26 | 27 | ## Requirements 28 | - The gcloud command string you'd like to run 29 | - A service account credentials json file to be used for authentication 30 | 31 | ## TODO 32 | - [ ] Expand on examples 33 | - [ ] Determine if/what other gcloud plugins need to be installed for complex use cases -------------------------------------------------------------------------------- /codebundles/k8s-cortexmetrics-ingestor-health/README.md: -------------------------------------------------------------------------------- 1 | # Kubernetes Cortex Metrics Ingester Health 2 | 3 | ## SLI 4 | Periodically checks the state of the cortex metrics ingestors and returns a score of 1 (healthy) or 0 (unhealthy). This SLI performs the query by executing a `kubectl exec` into a Kubernetes resource, leveraging existing Kubernetes API authentication. For the ingesters to be considered healthy they must: 5 | 6 | - Be considered "ACTIVE" in the ingester ring as published by the http api endpoint `/ring` 7 | - Have as many "ACTIVE" ingester ring members as specified in the SLI configuration variable EXPECTED_RING_MEMBERS 8 | 9 | The defaults will target a distributor pod which can locally reach http://127.0.0.1:8080/ring to obtain the status, but this can be overridden if another pod is used to query this endpoint within the cluster. 10 | 11 | ## TaskSet 12 | Queries the state of ingestors and returns the state of each along with the latest timestamp . This TaskSet performs the query by executing a `kubectl exec` into a Kubernetes resource, leveraging existing Kubernetes API authentication. 13 | 14 | ## Use Cases 15 | ### Use Case: SLI: Monitoring Grafana Mimir Ingester Health 16 | As Grafana Mimir is based on Cortex metrics, this codebundle could be use in same way to inspect the health of Grafan Mimir ingesters. 17 | 18 | ## Requirements 19 | - A kubeconfig with `get, list` access on cortex objects in the chosen namespace, along with the verb `create` on resource `pods/exec` 20 | - A chosen `namespace` and `context` to use from the kubeconfig 21 | - A cortex pod resource that has access to the `ring` api endpoint to exec into within the chosen `namespace` (often the distributor pods) 22 | 23 | ## TODO 24 | - [ ] Add additional documentation 25 | - [ ] Add additional taskset checks -------------------------------------------------------------------------------- /libraries/RW/GitLab.py: -------------------------------------------------------------------------------- 1 | """ 2 | GitLab keyword library 3 | 4 | Scope: Global 5 | """ 6 | import gitlab 7 | from dataclasses import dataclass 8 | from RW.Utils import utils 9 | 10 | 11 | class GitLab: 12 | #TODO: refactor for new platform use 13 | """ 14 | GitLab is a keyword library for integrating with the GitLab system. 15 | You need to provide a GitLab URL and a GitLab API Token to use 16 | this library. 17 | The first step is to authenticate using `Create Session`. 18 | """ 19 | 20 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 21 | 22 | def __init__(self) -> None: 23 | self.url = None 24 | self.token = None 25 | self.gl = None 26 | 27 | def create_session(self, url: str, token: str) -> object: 28 | """ 29 | Create a GitLab session. 30 | Examples: 31 | | Import User Variable | GITLAB_URL | | 32 | | Import User Variable | GITLAB_TOKEN | | 33 | | RW.GitLab.Create Session | ${GITLAB_URL} | ${GITLAB_TOKEN} | 34 | Return Value: 35 | | GitLab handle | 36 | """ 37 | self.gl = gitlab.Gitlab(url=url, private_token=token) 38 | return self.gl 39 | 40 | def get_projects(self): 41 | """ 42 | Get all projects found in GitLab. 43 | Examples: 44 | | ${projects} = | RW.GitLab.Get Projects | 45 | """ 46 | latency, res = utils.latency( 47 | self.gl.projects.list, 48 | latency_params=[3, "s"], 49 | ) 50 | 51 | @dataclass 52 | class Result: 53 | original_content: object 54 | names: list[str] 55 | latency: float 56 | 57 | return Result( 58 | res, 59 | [x.name for x in res], 60 | latency, 61 | ) 62 | -------------------------------------------------------------------------------- /codebundles/twitter-query-tweets/runbook.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Shea Stewart 3 | Metadata Display Name Twitter Query Handle 4 | Metadata Supports twitter 5 | Documentation Queries Twitter to fetch tweets within a specified time range for a specific user handle add them to a report. 6 | Force Tags Twitter Social tweet 7 | Suite Setup Suite Initialization 8 | Library RW.Core 9 | Library RW.SocialScrape 10 | 11 | *** Keywords *** 12 | Suite Initialization 13 | RW.Core.Import User Variable HANDLE 14 | ... type=string 15 | ... description=The twitter handle to query. 16 | ... pattern=\w* 17 | ... example=gitbookstatus 18 | RW.Core.Import User Variable MAX_TWEETS 19 | ... type=int 20 | ... description=The number of the latest tweets to scrape. 21 | ... example=5 22 | ... default=5 23 | RW.Core.Import User Variable MAX_TWEET_AGE 24 | ... type=int 25 | ... description=The maximum age of the tweet in days. 26 | ... example=1 27 | ... default=1 28 | RW.Core.Import User Variable MIN_TWEET_AGE 29 | ... type=int 30 | ... description=The minimum age of the tweet in days. 31 | ... example=0 32 | ... default=0 33 | Set Suite Variable ${HANDLE} ${HANDLE} 34 | Set Suite Variable ${MAX_TWEETS} ${MAX_TWEETS} 35 | Set Suite Variable ${MAX_TWEET_AGE} ${MAX_TWEET_AGE} 36 | Set Suite Variable ${MIN_TWEET_AGE} ${MIN_TWEET_AGE} 37 | 38 | *** Tasks *** 39 | Query Twitter 40 | ${rsp}= RW.SocialScrape.Twitter Scrape Handle handle=${HANDLE} maxTweets=${MAX_TWEETS} max_tweet_age=${MAX_TWEET_AGE} min_tweet_age=${MIN_TWEET_AGE} 41 | Log ${rsp} 42 | RW.Core.Add Pre To Report ${rsp} 43 | -------------------------------------------------------------------------------- /codebundles/cert-manager-healthcheck/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Metadata Display Name Cert-Manager Health Check 4 | Metadata Supports K8s,cert-manager 5 | Documentation Check the health of pods deployed by cert-manager. 6 | Force Tags K8s Kubernetes Kube K8 Kubectl cert-manager 7 | Suite Setup Suite Initialization 8 | Library BuiltIn 9 | Library RW.Core 10 | Library RW.K8s 11 | Library RW.Utils 12 | Library RW.CertManager 13 | Library RW.platform 14 | Library OperatingSystem 15 | 16 | *** Keywords *** 17 | Suite Initialization 18 | ${kubeconfig}= RW.Core.Import Secret kubeconfig 19 | ${kubectl}= RW.Core.Import Service kubectl 20 | ${NAMESPACE}= RW.Core.Import User Variable NAMESPACE 21 | ... type=string 22 | ... description=The Kubernetes namespace your cert-manager resides in. 23 | ... pattern=\w* 24 | ... example=cert-manager 25 | ... default=cert-manager 26 | ${CONTEXT}= RW.Core.Import User Variable CONTEXT 27 | ... type=string 28 | ... description=Which Kubernetes context to operate within. 29 | ... pattern=\w* 30 | ... example=my-main-cluster 31 | 32 | *** Tasks *** 33 | Health Check cert-manager Pods 34 | ${rsp}= RW.K8s.Shell 35 | ... cmd=kubectl get pods --field-selector=status.phase=Running --selector=app.kubernetes.io/instance=cert-manager --context=${CONTEXT} --namespace=${NAMESPACE} -o yaml 36 | ... target_service=${kubectl} 37 | ... kubeconfig=${KUBECONFIG} 38 | ${pods}= RW.Utils.Yaml To Dict ${rsp} 39 | ${rsp}= RW.CertManager.Health Check 40 | ... cm_pods=${pods} 41 | ${metric}= Evaluate 1 if ${rsp} is True else 0 42 | RW.Core.Push Metric ${metric} 43 | -------------------------------------------------------------------------------- /libraries/RW/DNS.py: -------------------------------------------------------------------------------- 1 | """ 2 | DNS keyword library 3 | 4 | Scope: Global 5 | """ 6 | import socket 7 | import dns.resolver 8 | from RW.Utils import utils 9 | from typing import Optional 10 | 11 | 12 | class DNS: 13 | """ 14 | DNS keyword library 15 | """ 16 | 17 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 18 | 19 | def lookup( 20 | self, 21 | host: str, 22 | nameservers: Optional[str] = None, 23 | rtype: str = "A", 24 | verbose: bool = False, 25 | ) -> str: 26 | """ 27 | DNS name lookup. 28 | 29 | Examples: 30 | | RW.DNS.Lookup | host=${HOSTNAME_TO_RESOLVE} | nameservers=8.8.8.8 | 31 | 32 | Return Value: 33 | | IP address | 34 | """ 35 | if rtype not in ["A"]: 36 | NotImplementedError("Only A record is currently supported.") 37 | resolver = dns.resolver.Resolver() 38 | if nameservers is not None: 39 | nameservers = nameservers.split() 40 | resolver.nameservers = [ 41 | socket.gethostbyname(n) for n in nameservers 42 | ] 43 | answer = resolver.resolve(host, rtype) 44 | addresses = [n.address for n in answer] 45 | if verbose: 46 | platform.debug_log( 47 | f"DNS lookup result: {addresses}", console=False 48 | ) 49 | return addresses 50 | 51 | def lookup_latency_in_seconds(self, *args, **kwargs) -> float: 52 | """TBD""" 53 | latency, _ = utils.latency( 54 | self.lookup, *args, **kwargs, latency_params=[3, "s"] 55 | ) 56 | return latency 57 | 58 | def lookup_latency_in_milliseconds(self, *args, **kwargs) -> float: 59 | """TBD""" 60 | latency, _ = utils.latency( 61 | self.lookup, *args, **kwargs, latency_params=[None, "ms"] 62 | ) 63 | return latency 64 | -------------------------------------------------------------------------------- /libraries/RW/gRPC/grpcurl.py: -------------------------------------------------------------------------------- 1 | import logging 2 | from RW import platform, Utils 3 | 4 | logger = logging.getLogger(__name__) 5 | 6 | 7 | class gRPCurl: 8 | """ 9 | A keyword set for running dynamic gRPC calls against gRPC services using the gRPCurl 10 | """ 11 | 12 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 13 | 14 | @staticmethod 15 | def grpcurl_unary( 16 | cmd: str, 17 | target_service: platform.Service, 18 | optional_headers: platform.Secret = None, 19 | # TODO: support proto file sets 20 | # optional_proto_file=None, 21 | ): 22 | return gRPCurl.run_grpcurl(cmd, target_service, optional_headers) 23 | 24 | @staticmethod 25 | def run_grpcurl( 26 | cmd: str, 27 | target_service: platform.Service, 28 | optional_headers: platform.Secret = None, 29 | ): 30 | """Robot Keyword to manipulate gRPC curl before passing to rwplatform.execute_shell_command.""" 31 | # TODO: test changes on curl-generic 32 | cmd = Utils.quote_curl(cmd) # handle \" before inserted into eval 33 | optional_headers = Utils.secret_to_curl_headers(optional_headers=optional_headers, default_headers="{}") 34 | grpcurl_str: str = Utils.create_curl(cmd=cmd, optional_headers=optional_headers) 35 | request_optional_headers = platform.ShellServiceRequestSecret(optional_headers) 36 | request_secrets = [optional_headers] if optional_headers.value else None 37 | rsp = platform.execute_shell_command(cmd=grpcurl_str, service=target_service, request_secrets=request_secrets) 38 | if rsp.status != 200: 39 | raise ValueError(f"Received HTTP status of {rsp.status} from response {rsp}") 40 | if rsp.returncode > 0: 41 | raise ValueError(f"Recieved return code of {rsp.returncode} from response {rsp}") 42 | rsp = rsp.stdout 43 | return rsp 44 | -------------------------------------------------------------------------------- /codebundles/twitter-query-tweets/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Shea Stewart 3 | Metadata Display Name Twitter Query Handle 4 | Metadata Supports twitter 5 | Documentation Queries Twitter to count amount of tweets within a specified time range for a specific user handle. 6 | Force Tags Twitter Social tweet 7 | Suite Setup Suite Initialization 8 | Library RW.Core 9 | Library RW.SocialScrape 10 | 11 | *** Keywords *** 12 | Suite Initialization 13 | RW.Core.Import User Variable HANDLE 14 | ... type=string 15 | ... description=The twitter handle to query. 16 | ... pattern=\w* 17 | ... example=gitbookstatus 18 | RW.Core.Import User Variable MAX_TWEETS 19 | ... type=int 20 | ... description=The number of the latest tweets to scrape. 21 | ... example=5 22 | ... default=5 23 | RW.Core.Import User Variable MAX_TWEET_AGE 24 | ... type=int 25 | ... description=The maximum age of the tweet in days. 26 | ... example=1 27 | ... default=1 28 | RW.Core.Import User Variable MIN_TWEET_AGE 29 | ... type=int 30 | ... description=The minimum age of the tweet in days. 31 | ... example=0 32 | ... default=0 33 | Set Suite Variable ${HANDLE} ${HANDLE} 34 | Set Suite Variable ${MAX_TWEETS} ${MAX_TWEETS} 35 | Set Suite Variable ${MAX_TWEET_AGE} ${MAX_TWEET_AGE} 36 | Set Suite Variable ${MIN_TWEET_AGE} ${MIN_TWEET_AGE} 37 | 38 | *** Tasks *** 39 | Query Twitter 40 | ${rsp}= RW.SocialScrape.Twitter Scrape Handle handle=${HANDLE} maxTweets=${MAX_TWEETS} max_tweet_age=${MAX_TWEET_AGE} min_tweet_age=${MIN_TWEET_AGE} 41 | ${metric}= Get Length ${rsp} 42 | Log response: ${rsp} 43 | Log metric: ${metric} 44 | RW.Core.Push Metric ${metric} 45 | -------------------------------------------------------------------------------- /codebundles/gcp-opssuite-logquery-dashboard/runbook.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Metadata Display Name GCP Operations Suite Log Query Dashboard URL 4 | Metadata Supports GCP,Cloud-Logging,Operations-Suite,stackdriver 5 | Documentation Generate a link to the GCP Log Explorer. 6 | Force Tags GCP Logs Query Links 7 | Library DateTime 8 | Library RW.GCP.OpsSuite 9 | Library RW.Core 10 | Suite Setup Suite Initialization 11 | 12 | *** Tasks *** 13 | Get GCP Log Dashboard URL For Given Log Query 14 | ${query}= RW.GCP.OpsSuite.Add Time Range 15 | ... base_query=${LOG_QUERY} 16 | ... within_time=${WITHIN_TIME} 17 | ${dashboard_url}= RW.GCP.OpsSuite.Get Logs Dashboard Url 18 | ... ${PROJECT_ID} 19 | ... ${query} 20 | RW.Core.Add To Report GCP Log Explorer Dashboard Link For Query: ${query} 21 | RW.Core.Add To Report ${dashboard_url} 22 | 23 | *** Keywords *** 24 | Suite Initialization 25 | RW.Core.Import User Variable PROJECT_ID 26 | ... type=string 27 | ... description=The GCP Project ID to scope the API to. 28 | ... pattern=\w* 29 | ... example=myproject-ID 30 | RW.Core.Import User Variable LOG_QUERY 31 | ... type=string 32 | ... description=The log query used to create the dashboard URL with. 33 | ... pattern=\w* 34 | ... example=resource.labels.namespace_name:"my-namespace" 35 | RW.Core.Import User Variable WITHIN_TIME 36 | ... type=string 37 | ... pattern=((\d+?)d)?((\d+?)h)?((\d+?)m)?((\d+?)s)? 38 | ... description=How far back to retrieve log entries, in the format "1d1h15m", with possible unit values being 'd' representing days, 'h' representing hours, 'm' representing minutes, and 's' representing seconds. 39 | ... example=30m 40 | ... default=15m 41 | -------------------------------------------------------------------------------- /libraries/RW/AWS/strategies/RoleGetClientStrategy.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import boto3 3 | from RW.AWS.strategies.GetClientStrategy import GetClientStrategy 4 | 5 | # silence verbose logging 6 | logging.getLogger('boto3').setLevel(logging.CRITICAL) 7 | logging.getLogger('botocore').setLevel(logging.CRITICAL) 8 | logging.getLogger('s3transfer').setLevel(logging.CRITICAL) 9 | logging.getLogger('urllib3').setLevel(logging.CRITICAL) 10 | 11 | class RoleGetClientStrategy(GetClientStrategy): 12 | def get_client(self, service_name: str, **kwargs): 13 | client_config = { 14 | "service_name": service_name, 15 | **kwargs 16 | } 17 | if self.client and self.client_config_cache == client_config: 18 | return self.client 19 | else: 20 | self.client_config_cache = client_config 21 | self.client = None # clear cache 22 | session_token_service = boto3.client( 23 | "sts", 24 | aws_access_key_id=self.aws_access_key_id, 25 | aws_secret_access_key=self.aws_secret_access_key, 26 | **kwargs, 27 | ) 28 | session_credentials = session_token_service.assume_role( 29 | RoleArn=self.role_arn, RoleSessionName=self.session_name 30 | ) 31 | session_id = session_credentials["Credentials"]["AccessKeyId"] 32 | session_key = session_credentials["Credentials"]["SecretAccessKey"] 33 | session_token = session_credentials["Credentials"]["SessionToken"] 34 | self.client = boto3.client( 35 | service_name=service_name, 36 | region_name=self.region_name, 37 | aws_access_key_id=session_id, 38 | aws_secret_access_key=session_key, 39 | aws_session_token=session_token, 40 | **kwargs, 41 | ) 42 | return self.client 43 | -------------------------------------------------------------------------------- /libraries/RW/Pingdom.py: -------------------------------------------------------------------------------- 1 | """ 2 | Pingdom keyword library 3 | 4 | Scope: Global 5 | """ 6 | from typing import Union 7 | from dataclasses import dataclass 8 | from robot.libraries.BuiltIn import BuiltIn 9 | from .Utils import utils 10 | from RW.Utils.utils import Status 11 | 12 | 13 | class Pingdom: 14 | #TODO: refactor for new platform use 15 | """ 16 | Pingdom keyword library 17 | """ 18 | 19 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 20 | 21 | def __init__(self) -> None: 22 | self.session = None 23 | 24 | BuiltIn().import_library("RW.HTTP") 25 | self.rw_http = BuiltIn().get_library_instance("RW.HTTP") 26 | 27 | self.pingdom_url = utils.import_user_variable("PINGDOM_URL") 28 | self.pingdom_api_key = utils.import_user_variable("PINGDOM_API_KEY") 29 | 30 | self.session = self.rw_http.create_authenticated_session( 31 | token=self.pingdom_api_key 32 | ) 33 | 34 | def __exit__(self, exc_type, exc_value, traceback): 35 | if self.session is not None: 36 | self.rw_http.close_session(self.session) 37 | 38 | def get_health_status( 39 | self, 40 | verbose: Union[str, bool] = False, 41 | ) -> None: 42 | """ 43 | TBD 44 | """ 45 | verbose = utils.to_bool(verbose) 46 | r = self.rw_http.get(f"{self.ping_url}/api/health") 47 | if verbose is True: 48 | platform.debug_log(r) 49 | 50 | status: Status = Status.NOT_OK 51 | if r.status_code in [200] and r.json()["database"] == "ok": 52 | status = Status.OK 53 | 54 | @dataclass 55 | class Result: 56 | original_content: object 57 | content: dict 58 | status_code: int = r.status_code 59 | reason: str = r.reason 60 | ok_status: Status = status 61 | ok: int = status.value 62 | 63 | return Result(r, r.json()) 64 | -------------------------------------------------------------------------------- /libraries/RW/K8s/robot_tests/exec.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Library RW.K8s 3 | Library RW.Postgres 4 | Library RW.Utils 5 | Library RW.platform 6 | Library RW.Core 7 | Library OperatingSystem 8 | Suite Setup Suite Initialization 9 | 10 | *** Tasks *** 11 | Get Postgres Query Result 12 | ${templated_query}= RW.Postgres.Template Command 13 | ... query=${K8S_DB_QUERY} 14 | ... hostname=localhost 15 | ... database=${TEST_DB} 16 | ... username=${TEST_USER} 17 | ... password=${TEST_DB_PASSWORD} 18 | ${shell_secrets}= RW.Utils.Secrets List ${TEST_DB} ${TEST_USER} ${TEST_DB_PASSWORD} 19 | ${rsp}= RW.K8s.Shell 20 | ... cmd=kubectl exec ${TEST_DB_WORKLOAD} -- bash -c "${templated_query}" 21 | ... target_service=${kubectl} 22 | ... kubeconfig=${KUBECONFIG} 23 | ... shell_secrets=${shell_secrets} 24 | 25 | *** Keywords *** 26 | Suite Initialization 27 | RW.Core.Import Service kubectl 28 | Set Suite Variable ${kubectl} ${kubectl} 29 | Set Suite Variable ${KUBECONFIG_PATH} %{KUBECONFIG_PATH} 30 | Set Suite Variable ${TEST_DB_WORKLOAD} %{TEST_DB_WORKLOAD} 31 | Set Suite Variable ${K8S_DB_QUERY} %{K8S_DB_QUERY} 32 | ${KUBECONFIG}= Get File ${KUBECONFIG_PATH} 33 | ${KUBECONFIG}= Evaluate RW.platform.Secret("kubeconfig", """${KUBECONFIG}""") 34 | ${TEST_DB}= Evaluate RW.platform.Secret("test_db", """%{TEST_DB}""") 35 | ${TEST_USER}= Evaluate RW.platform.Secret("test_user", """%{TEST_DB_USER}""") 36 | ${TEST_DB_PASSWORD}= Evaluate RW.platform.Secret("test_pass", """%{TEST_DB_PASSWORD}""") 37 | Set Suite Variable ${KUBECONFIG} ${KUBECONFIG} 38 | Set Suite Variable ${TEST_DB} ${TEST_DB} 39 | Set Suite Variable ${TEST_USER} ${TEST_USER} 40 | Set Suite Variable ${TEST_DB_PASSWORD} ${TEST_DB_PASSWORD} 41 | -------------------------------------------------------------------------------- /libraries/RW/CertManager/robot_tests/certs.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Library RW.CertManager 3 | Library RW.K8s 4 | Library RW.platform 5 | Library RW.Core 6 | Library RW.Utils 7 | Library OperatingSystem 8 | Suite Setup Suite Initialization 9 | 10 | *** Keywords *** 11 | Suite Initialization 12 | Set Suite Variable ${KUBECONFIG_PATH} %{KUBECONFIG_PATH} 13 | Set Suite Variable ${K8S_TESTING_NS} %{K8S_TESTING_NS} 14 | Set Suite Variable ${K8S_TESTING_CONTEXT} %{K8S_TESTING_CONTEXT} 15 | Set Suite Variable ${K8S_TESTING_NAME} %{K8S_TESTING_NAME} 16 | Set Suite Variable ${K8S_TESTING_LABELS} %{K8S_TESTING_LABELS} 17 | ${KUBECONFIG}= Get File ${KUBECONFIG_PATH} 18 | ${KUBECONFIG}= Evaluate RW.platform.Secret("kubeconfig", """${KUBECONFIG}""") 19 | Set Suite Variable ${KUBECONFIG} ${KUBECONFIG} 20 | ${kubectl}= RW.Core.Import Service kubectl 21 | Set Suite Variable ${kubeconfig} ${kubeconfig} 22 | 23 | *** Tasks *** 24 | Check Certification Expiry 25 | ${rsp}= RW.K8s.Shell 26 | ... cmd=kubectl get Certificate --context=${K8S_TESTING_CONTEXT} --namespace=cert-manager -o yaml 27 | ... target_service=${kubectl} 28 | ... kubeconfig=${KUBECONFIG} 29 | ${certs}= RW.Utils.Yaml To Dict ${rsp} 30 | ${rsp}= RW.CertManager.Get Expiring Certs 31 | ... certs=${certs} 32 | ... days_left_allowed=60 33 | Log ${rsp} 34 | 35 | Health Check 36 | ${rsp}= RW.K8s.Shell 37 | ... cmd=kubectl get pods --field-selector=status.phase=Running --selector=app.kubernetes.io/instance=cert-manager --context=${K8S_TESTING_CONTEXT} --namespace=cert-manager -o yaml 38 | ... target_service=${kubectl} 39 | ... kubeconfig=${KUBECONFIG} 40 | ${pods}= RW.Utils.Yaml To Dict ${rsp} 41 | ${rsp}= RW.CertManager.Health Check 42 | ... cm_pods=${pods} 43 | Log ${rsp} 44 | -------------------------------------------------------------------------------- /codebundles/kong-ingress-health-gcp-promql/.runwhen/templates/kong-ingress-health-gcp-promql-sli.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: runwhen.com/v1 2 | kind: ServiceLevelIndicator 3 | metadata: 4 | name: {{slx_name}} 5 | labels: 6 | {% include "common-labels.yaml" %} 7 | annotations: 8 | {% include "common-annotations.yaml" %} 9 | spec: 10 | displayUnitsLong: OK 11 | displayUnitsShort: ok 12 | locations: 13 | - {{default_location}} 14 | description: Measures the overall health of an Kong managed ingress object. 15 | codeBundle: 16 | {% if repo_url %} 17 | repoUrl: {{repo_url}} 18 | {% else %} 19 | repoUrl: https://github.com/runwhen-contrib/rw-public-codecollection.git 20 | {% endif %} 21 | {% if ref %} 22 | ref: {{ref}} 23 | {% else %} 24 | ref: main 25 | {% endif %} 26 | pathToRobot: codebundles/kong-ingress-health-gcp-promql/sli.robot 27 | intervalStrategy: intermezzo 28 | intervalSeconds: 30 29 | configProvided: 30 | - name: HTTP_ERROR_CODES 31 | value: 5.* 32 | - name: HTTP_ERROR_RATE_WINDOW 33 | value: 1m 34 | - name: HTTP_ERROR_RATE_THRESHOLD 35 | value: '2' 36 | - name: PROJECT_ID 37 | value: {{custom.gcp_project_id}} 38 | - name: INGRESS_UPSTREAM 39 | value: {{match_resource.resource.spec.rules[0].http.paths[0].backend.service.name}}.{{match_resource.resource.metadata.namespace}}.{{match_resource.resource.spec.rules[0].http.paths[0].backend.service.port.number}}.svc 40 | - name: INGRESS_SERVICE 41 | value: {{match_resource.resource.metadata.namespace}}.{{match_resource.resource.metadata.name}}.{{match_resource.resource.spec.rules[0].http.paths[0].backend.service.name}}.{{match_resource.resource.spec.rules[0].http.paths[0].backend.service.port.number}} 42 | - name: REQUEST_LATENCY_THRESHOLD 43 | value: '100' 44 | secretsProvided: 45 | - name: ops-suite-sa 46 | workspaceKey: {{custom.gcp_ops_suite_sa}} 47 | servicesProvided: 48 | - name: curl 49 | locationServiceName: curl-service.shared -------------------------------------------------------------------------------- /libraries/RW/K8s/k8s.py: -------------------------------------------------------------------------------- 1 | """ 2 | K8s keyword library, version 2, based on shellservice base. 3 | 4 | Scope: Global 5 | """ 6 | import re, kubernetes, yaml, logging, json, jmespath 7 | from struct import unpack 8 | import dateutil.parser 9 | from benedict import benedict 10 | from typing import Optional, Union 11 | from RW import platform 12 | from RW.Utils import utils 13 | from enum import Enum 14 | from .namespace_tasks_mixin import NamespaceTasksMixin 15 | 16 | logger = logging.getLogger(__name__) 17 | 18 | class K8s( 19 | NamespaceTasksMixin, 20 | ): 21 | """ 22 | K8s keyword library can be used to interact with Kubernetes clusters. 23 | """ 24 | 25 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 26 | 27 | def compose_kubectl_cmd( 28 | self, 29 | kind: str, 30 | name: str = None, 31 | verb: str = "", 32 | verb_flags: str = "", 33 | label_selector: str = None, 34 | field_selector: str = None, 35 | context: str = None, 36 | namespace: str = None, 37 | output_format="yaml", 38 | binary_name: str="kubectl", 39 | **kwargs, 40 | ) -> str: 41 | command = [] 42 | command.append(f"{binary_name}") 43 | if context: 44 | command.append(f"--context {context}") 45 | if namespace: 46 | command.append(f"--namespace {namespace}") 47 | 48 | if verb and verb_flags: 49 | command.append(f"{verb} {verb_flags}") 50 | elif verb: 51 | command.append(f"{verb}") 52 | 53 | if label_selector: 54 | command.append(f"--selector {label_selector}") 55 | 56 | if kind and name and not label_selector: 57 | command.append(f"{kind}/{name}") 58 | elif kind: 59 | command.append(f"{kind}") 60 | 61 | if field_selector: 62 | command.append(f"--field-selector {field_selector}") 63 | 64 | if output_format: 65 | command.append(f"-o {output_format}") 66 | return " ".join(command) 67 | -------------------------------------------------------------------------------- /libraries/RW/Chat/robot_tests/notify_multi.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Library RW.Chat 3 | Suite Setup Suite Initialization 4 | 5 | *** Variables *** 6 | ${CHAT_MESSAGE} Chat says hello! 7 | 8 | *** Tasks *** 9 | Send Hello World With Bus To Google Chat 10 | ${rsp}= RW.Chat.Send Message 11 | ... include_reports=No 12 | ... chat_provider=GoogleChat 13 | ... webhook_url=${GCP_CHAT_WEBHOOK} 14 | ... message=${CHAT_MESSAGE} 15 | 16 | Send Hello World With Bus To Slack 17 | ${rsp}= RW.Chat.Send Message 18 | ... include_reports=No 19 | ... chat_provider=Slack 20 | ... channel=${SLACK_CHANNEL} 21 | ... token=${SLACK_TOKEN} 22 | ... message=${CHAT_MESSAGE} 23 | 24 | Send Hello World With Bus To RocketChat 25 | ${rsp}= RW.Chat.Send Message 26 | ... include_reports=No 27 | ... chat_provider=RocketChat 28 | ... webhook_url=${ROCKETCHAT_WEBHOOK} 29 | ... message=${ROCKETCHAT_TEXT} 30 | 31 | Send Report To RocketChat 32 | ${rsp}= RW.Chat.Send Message 33 | ... include_reports=Yes 34 | ... include_runsession_link=Yes 35 | ... chat_provider=RocketChat 36 | ... webhook_url=${ROCKETCHAT_WEBHOOK} 37 | ... message=${ROCKETCHAT_TEXT} 38 | 39 | Send Report To Slack 40 | ${rsp}= RW.Chat.Send Message 41 | ... include_runsession_link=Yes 42 | ... include_reports=Yes 43 | ... chat_provider=Slack 44 | ... channel=${SLACK_CHANNEL} 45 | ... token=${SLACK_TOKEN} 46 | ... message=${CHAT_MESSAGE} 47 | 48 | *** Keywords *** 49 | Suite Initialization 50 | Set Suite Variable ${GCP_CHAT_WEBHOOK} %{GCP_CHAT_WEBHOOK} 51 | Set Suite Variable ${SLACK_TOKEN} %{SLACK_TOKEN} 52 | Set Suite Variable ${SLACK_CHANNEL} %{SLACK_CHANNEL} 53 | Set Suite Variable ${ROCKETCHAT_WEBHOOK} %{ROCKETCHAT_WEBHOOK} 54 | Set Suite Variable ${ROCKETCHAT_ALIAS} %{ROCKETCHAT_ALIAS} 55 | Set Suite Variable ${ROCKETCHAT_TEXT} %{ROCKETCHAT_TEXT} 56 | -------------------------------------------------------------------------------- /codebundles/curl-generic/runbook.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Shea Stewart 3 | Metadata Display Name cURL Generic Report 4 | Metadata Supports cURL,HTTP,Generic,API 5 | Documentation A curl TaskSet for querying and extracting data from a generic curl call. Supports jq. Adds results to the report. 6 | Force Tags HTTP CURL NOAUTH DATA GET REQUEST 7 | Suite Setup Suite Initialization 8 | Library RW.Utils 9 | Library RW.Core 10 | Library RW.Curl 11 | 12 | *** Keywords *** 13 | Suite Initialization 14 | ${OPTIONAL_HEADERS}= RW.Core.Import Secret OPTIONAL_HEADERS 15 | ... type=string 16 | ... description=Optional. A json string of headers to include in the request against the REST endpoint. This can include your token. 17 | ... pattern=\w* 18 | ... default="{}" 19 | ... example='{"Content-Type":"application/json"}' 20 | ${CURL_COMMAND}= RW.Core.Import User Variable CURL_COMMAND 21 | ... type=string 22 | ... description=Curl command to run; should return a single metric. Can use jq for json parsing. 23 | ... pattern=\w* 24 | ... default=curl --silent -X GET https://postman-echo.com/get | jq length 25 | ... example=curl --silent -X GET https://postman-echo.com/get | jq length 26 | ${CURL_SERVICE}= RW.Core.Import Service curl 27 | ... type=string 28 | ... description=The selected RunWhen Service to use for accessing services within a network. 29 | ... pattern=\w* 30 | ... example=curl-service.shared 31 | ... default=curl-service.shared 32 | Set Suite Variable ${CURL_SERVICE} ${CURL_SERVICE} 33 | Set Suite Variable ${OPTIONAL_HEADERS} ${OPTIONAL_HEADERS} 34 | 35 | *** Tasks *** 36 | Run Curl Command and Add to Report 37 | ${rsp}= RW.Curl.Run Curl 38 | ... cmd=${CURL_COMMAND} 39 | ... target_service=${CURL_SERVICE} 40 | ... optional_headers=${OPTIONAL_HEADERS} 41 | RW.Core.Add Pre To Report ${rsp} -------------------------------------------------------------------------------- /libraries/RW/GCP/GCloudCLI.py: -------------------------------------------------------------------------------- 1 | import requests 2 | import logging 3 | import urllib 4 | import json 5 | import dateutil.parser 6 | from RW import platform, Utils 7 | 8 | logger = logging.getLogger(__name__) 9 | 10 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 11 | 12 | 13 | def shell( 14 | cmd: str, 15 | target_service: platform.Service, 16 | gcp_credentials_json: platform.Secret, 17 | project_id: str = None, 18 | ) -> any: 19 | if not target_service: 20 | raise ValueError("A runwhen service was not provided for the gcloud cli command") 21 | if not gcp_credentials_json: 22 | raise ValueError("A service account credentials json was not provided") 23 | gcp_credentials_json_str = gcp_credentials_json.value 24 | if Utils.is_json(gcp_credentials_json_str): 25 | gcp_credentials_json_dict: dict = Utils.from_json(gcp_credentials_json_str) 26 | if not project_id: 27 | project_id = gcp_credentials_json_dict.get("project_id", None) 28 | if not project_id: 29 | raise ValueError("A project_id could not be found or was not provided") 30 | # activate the service account in ssapi 31 | cmd = f"gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS && {cmd}" 32 | logger.info(f"requesting command: {cmd}") 33 | logger.info(f"selected project_id: {project_id}") 34 | request_secrets: [platform.ShellServiceRequestSecret] = [] 35 | request_secrets.append(platform.ShellServiceRequestSecret(gcp_credentials_json, as_file=True)) 36 | env = { 37 | "GOOGLE_APPLICATION_CREDENTIALS": f"./{gcp_credentials_json.key}", 38 | "CLOUDSDK_CORE_PROJECT": f"{project_id}", 39 | } 40 | rsp = platform.execute_shell_command(cmd=cmd, service=target_service, request_secrets=request_secrets, env=env) 41 | if (rsp.status != 200 or rsp.returncode > 0) and rsp.stderr != "": 42 | raise ValueError( 43 | f"The shell service responded with HTTP: {rsp.status} RC: {rsp.returncode} and response: {rsp}" 44 | ) 45 | logger.info(f"shell stdout: {rsp.stdout}") 46 | return rsp.stdout 47 | -------------------------------------------------------------------------------- /codebundles/curl-generic/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Shea Stewart 3 | Metadata Display Name cURL Generic Metric 4 | Metadata Supports cURL,HTTP,Generic,API 5 | Documentation A curl SLI for querying and extracting data from a generic curl call. Supports jq. Should prodice a single metric. 6 | Force Tags HTTP CURL NOAUTH DATA GET REQUEST 7 | Suite Setup Suite Initialization 8 | Library RW.Core 9 | Library RW.Utils 10 | Library RW.Curl 11 | 12 | *** Keywords *** 13 | Suite Initialization 14 | ${OPTIONAL_HEADERS}= RW.Core.Import Secret OPTIONAL_HEADERS 15 | ... type=string 16 | ... description=Optional. A json string of headers to include in the request against the REST endpoint. This can include your token. 17 | ... pattern=\w* 18 | ... default="{}" 19 | ... example='{"Content-Type":"application/json"}' 20 | ${CURL_COMMAND}= RW.Core.Import User Variable CURL_COMMAND 21 | ... type=string 22 | ... description=Curl command to run; should return a single metric. Can use jq for json parsing. 23 | ... pattern=\w* 24 | ... default=curl --silent -X GET https://postman-echo.com/get | jq length 25 | ... example=curl --silent -X GET https://postman-echo.com/get | jq length 26 | ${CURL_SERVICE}= RW.Core.Import Service curl 27 | ... type=string 28 | ... description=The selected RunWhen Service to use for accessing services within a network. 29 | ... pattern=\w* 30 | ... example=curl-service.shared 31 | ... default=curl-service.shared 32 | Set Suite Variable ${CURL_SERVICE} ${CURL_SERVICE} 33 | Set Suite Variable ${OPTIONAL_HEADERS} ${OPTIONAL_HEADERS} 34 | 35 | *** Tasks *** 36 | Run Curl Command and Push Metric 37 | ${rsp}= RW.Curl.Run Curl 38 | ... cmd=${CURL_COMMAND} 39 | ... target_service=${CURL_SERVICE} 40 | ... optional_headers=${OPTIONAL_HEADERS} 41 | ${metric}= Convert To Number ${rsp} 42 | RW.Core.Push Metric ${metric} -------------------------------------------------------------------------------- /.github/workflows/slack-notify-readme-updates.yml: -------------------------------------------------------------------------------- 1 | # Useful in a temporary manner until we fully release the public repo 2 | name: Send slack message when readme contents are updated 3 | on: 4 | workflow_dispatch: 5 | push: 6 | branches: 7 | - main 8 | paths: 9 | - "codebundles/**/README.md" 10 | - ".github/workflows/slack-notify-readme-updates.yml" 11 | 12 | env: 13 | CODEBUNDLE_DOCS_URL_PREFIX: "https://docs.runwhen.com/public/v/" 14 | 15 | jobs: 16 | notify-slack: 17 | runs-on: ubuntu-latest 18 | steps: 19 | - uses: actions/checkout@v3 20 | id: checkout 21 | with: 22 | fetch-depth: 0 23 | - name: Check for list of commits 24 | id: check-commits 25 | run: | 26 | # Get list of changed files 27 | declare CHANGED_FILES=($(git diff --name-only ${{ github.event.before }} ${{ github.event.after }} | grep README.md)) 28 | 29 | # Exit gracefully if no new readme updates are detected 30 | if [ ${#CHANGED_FILES[@]} -eq 0 ]; then 31 | echo "send_slack_message=false" >> $GITHUB_ENV 32 | exit 0 33 | fi 34 | 35 | for readme in "${CHANGED_FILES[@]}" 36 | do 37 | codebundle_url="${codebundle_url}""${CODEBUNDLE_DOCS_URL_PREFIX}${readme%/README.md}"$'\n' 38 | done 39 | 40 | echo "codebundle_url_list<> $GITHUB_ENV 41 | echo "$codebundle_url" >> $GITHUB_ENV 42 | echo "EOF" >> $GITHUB_ENV 43 | echo "send_slack_message=true" >> $GITHUB_ENV 44 | 45 | - name: Send message to public slack channel 46 | id: send-slack-message 47 | if: ${{ env.send_slack_message == 'true' }} 48 | uses: slackapi/slack-github-action@v1.15.0 49 | with: 50 | channel-id: "#codebundle-updates" # Slack channel id or name to post message. https://api.slack.com/methods/chat.postMessage#channels 51 | slack-message: "RunWhen Codebundle Documentation Updates:\n ${{ env.codebundle_url_list }}" 52 | env: 53 | SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }} -------------------------------------------------------------------------------- /codebundles/uptimecom-component-ok/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Metadata Display Name Uptime.com Component Health 4 | Metadata Supports uptime 5 | Documentation Check the status of an Uptime.com component for a given site. 6 | ... It compares the operational state of the component with the list of allowed states, resulting in a 1 when acceptable, and 0 when not. 7 | Force Tags Uptime.Com Uptime Component Statuspage Operational Up 8 | Library RW.Core 9 | Library RW.Uptime.StatusPage 10 | Suite Setup Suite Initialization 11 | 12 | *** Keywords *** 13 | Suite Initialization 14 | ${UPTIME_TOKEN}= RW.Core.Import Secret UPTIME_TOKEN 15 | ${UPTIME_COMPONENT_URL}= RW.Core.Import User Variable UPTIME_COMPONENT_URL 16 | ... type=string 17 | ... description=What URL to retrieve health data from. 18 | ... pattern=\w* 19 | ... default=https://uptime.com/api/v1/statuspages/{page_id}/components/{component_id}/ 20 | ... example=https://uptime.com/api/v1/statuspages/{page_id}/components/{component_id}/ 21 | RW.Core.Import User Variable ACCEPTABLE_STATES 22 | ... type=string 23 | ... description=What operational state the component can be in. eg: operational, undergoing planned maintenance, etc. Accepts a CSV. 24 | ... pattern=\w* 25 | ... default=operational,under-maintenance 26 | ... example=operational,under-maintenance 27 | Set Suite Variable ${UPTIME_TOKEN} ${UPTIME_TOKEN} 28 | Set Suite Variable ${ACCEPTABLE_STATES} ${ACCEPTABLE_STATES} 29 | Set Suite Variable ${UPTIME_COMPONENT_URL} ${UPTIME_COMPONENT_URL} 30 | 31 | *** Tasks *** 32 | Check If Vault Endpoint Is Healthy 33 | ${rsp}= RW.Uptime.StatusPage.Get Component Status auth_token=${UPTIME_TOKEN} url=${UPTIME_COMPONENT_URL} 34 | ${status}= RW.Uptime.StatusPage.Validate Component Status status_data=${rsp} allowed_status=${ACCEPTABLE_STATES} 35 | ${score}= Evaluate 1 if ${status} is True else 0 36 | RW.Core.Push Metric ${score} 37 | -------------------------------------------------------------------------------- /libraries/RW/K8s/job_tasks_mixin.py: -------------------------------------------------------------------------------- 1 | from time import sleep 2 | from benedict import benedict 3 | from RW.Utils.utils import yaml_to_dict 4 | 5 | from RW import platform 6 | 7 | class JobTasksMixin: 8 | def job_successful( 9 | self, 10 | job_name, 11 | namespace:str, 12 | context:str, 13 | kubeconfig: platform.Secret, 14 | target_service: platform.Service, 15 | binary_name: str = "kubectl", 16 | ) -> bool: 17 | is_successful: bool = True 18 | job_yaml: str = self.shell( 19 | cmd=f"{binary_name} get job/{job_name} -n {namespace} --context {context} -oyaml", 20 | target_service=target_service, 21 | kubeconfig=kubeconfig, 22 | ) 23 | job = yaml_to_dict(job_yaml) 24 | job: benedict = benedict(job, keypath_separator=None) 25 | if "failed" in job["status"] and int(job["status", "failed"]) > 0: 26 | return False 27 | if "conditions" not in job["status"]: 28 | return False 29 | conditions: list = job["status","conditions"] 30 | found_complete = False 31 | for condition in conditions: 32 | if condition["status"] == "True" and condition["type"] == "Complete": 33 | found_complete = True 34 | return is_successful and found_complete 35 | 36 | def wait_until_job_successful( 37 | self, 38 | job_name, 39 | namespace:str, 40 | context:str, 41 | kubeconfig: platform.Secret, 42 | target_service: platform.Service, 43 | retries: int=5, 44 | interval: int=5, 45 | binary_name: str = "kubectl", 46 | ) -> bool: 47 | for _ in range(retries): 48 | is_succeeded: bool = self.job_successful( 49 | job_name=job_name, 50 | namespace=namespace, 51 | context=context, 52 | kubeconfig=kubeconfig, 53 | target_service=target_service, 54 | binary_name=binary_name, 55 | ) 56 | if is_succeeded: 57 | return True 58 | sleep(interval) 59 | return False 60 | -------------------------------------------------------------------------------- /codebundles/aws-cloudwatch-metricquery-dashboard/runbook.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Metadata Supports aws,cloudwatch 4 | Metadata Display Name AWS CloudWatch Metric Query Dashboard 5 | Documentation Creates a URL to a AWS CloudWatch metrics dashboard with a running query. 6 | Force Tags AWS CloudWatch Metrics Metric Query Boto3 Errors Failures Link Dashboard 7 | Library RW.Core 8 | Library RW.AWS.CloudWatch 9 | Suite Setup Suite Initialization 10 | 11 | *** Tasks *** 12 | Get CloudWatch MetricQuery Insights URL 13 | ${rsp}= RW.AWS.CloudWatch.Get CloudWatch Metric Insights Url 14 | ... ${REGION} 15 | ... ${CLOUDWATCH_METRIC_QUERY} 16 | RW.Core.Add To Report CloudWatch Metric Query URL: 17 | RW.Core.Add To Report ${rsp} 18 | 19 | *** Keywords *** 20 | Suite Initialization 21 | RW.Core.Import User Variable AUTH_MODE 22 | ... type=string 23 | ... enum=[User,Role] 24 | ... description=Determines the authentication flow when connecting to AWS services. 25 | ... example=User 26 | RW.Core.Import User Variable REGION 27 | ... type=string 28 | ... description=The AWS region to target resources in. 29 | ... pattern=\w* 30 | ... example=us-west-1 31 | RW.Core.Import User Variable 32 | ... CLOUDWATCH_METRIC_QUERY 33 | ... type=string 34 | ... description=The CloudWatch query to run. You can paste query strings from the CloudWatch Metric Insights editor here. 35 | ... pattern=\w* 36 | ... example=SELECT MAX(CPUUtilization) FROM "AWS/EC2" 37 | Set Suite Variable ${AWS_ACCESS_KEY_ID} ${AWS_ACCESS_KEY_ID} 38 | Set Suite Variable ${AWS_SECRET_ACCESS_KEY} ${AWS_SECRET_ACCESS_KEY} 39 | Set Suite Variable ${AWS_ROLE_ASSUME_ARN} ${AWS_ROLE_ASSUME_ARN} 40 | Set Suite Variable ${AUTH_MODE} ${AUTH_MODE} 41 | Set Suite Variable ${REGION} ${REGION} 42 | Set Suite Variable ${CLOUDWATCH_METRIC_QUERY} ${CLOUDWATCH_METRIC_QUERY} 43 | Set Suite Variable ${SECONDS_IN_PAST} ${SECONDS_IN_PAST} 44 | -------------------------------------------------------------------------------- /libraries/RW/ArgoCD/argocd.py: -------------------------------------------------------------------------------- 1 | """ 2 | Argocd keyword library 3 | 4 | Scope: Global 5 | """ 6 | import time 7 | from dataclasses import dataclass 8 | from typing import Union, Optional 9 | from RW import platform 10 | from RW.K8s.k8s import K8s 11 | 12 | 13 | class ArgoCD: 14 | """ 15 | ArgoCD keyword library 16 | """ 17 | 18 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 19 | 20 | ARGOCD_DEPLOYMENTS = [ 21 | "argocd-applicationset-controller", 22 | "argocd-dex-server", 23 | "argocd-notifications-controller", 24 | "argocd-redis", 25 | "argocd-repo-server", 26 | "argocd-server", 27 | ] 28 | ARGOCD_STATEFULSETS = [ 29 | "argocd-application-controller", 30 | ] 31 | 32 | def health_check( 33 | self, 34 | target_service: platform.Service, 35 | kubeconfig: platform.Secret, 36 | context: str, 37 | namespace: str = "argocd", 38 | ): 39 | health = True 40 | k8s: K8s = K8s() 41 | for deployment in ArgoCD.ARGOCD_DEPLOYMENTS: 42 | resource_status: bool = False 43 | stdout = k8s.shell( 44 | cmd=f"kubectl get deployment.apps/{deployment} --context={context} --namespace={namespace} -o jsonpath='{{.status.conditions[?(@.type==\"Available\")].status}}'", 45 | target_service=target_service, 46 | kubeconfig=kubeconfig, 47 | ) 48 | if stdout == "True": 49 | resource_status = True 50 | health = health and resource_status 51 | for statefulset in ArgoCD.ARGOCD_STATEFULSETS: 52 | resource_status: bool = False 53 | stdout = k8s.shell( 54 | cmd=f"kubectl get statefulset.apps/{statefulset} --context={context} --namespace={namespace} -o jsonpath='{{.status.availableReplicas}}'", 55 | target_service=target_service, 56 | kubeconfig=kubeconfig, 57 | ) 58 | # TODO: revisit replica availability edge cases 59 | if stdout and int(stdout) > 0: 60 | resource_status = True 61 | health = health and resource_status 62 | return health 63 | -------------------------------------------------------------------------------- /README_HOWTO.md: -------------------------------------------------------------------------------- 1 | # General Guidelines for Readmes 2 | The main README.md in this repo is automatically updated by: 3 | - adding the readme_header.md 4 | - generating an index of the codebundles, their documentation, and use cases 5 | 6 | ## How Indexing Works 7 | - Codebundles are indexed based on their folder path (under the `codebundles` folder) and the presence of an sli|slo|runbook.robot file. 8 | - The first line of the .robot `Documentation` line will be added to the table. 9 | - If a README.md exists in the folder, any `Use Cases` that have a heading that matches `Use Case: SLI` or `Use Case: TaskSet` are also added to the `Documentation` column 10 | 11 | 12 | ## Format of a Codebundle Readme 13 | 14 | The ideal format of a README.md codebundle is as follows: 15 | ``` 16 | # [Target Platform, Product & Use - e.g. Kubernetes Cortext Metrix Ingester Health ] 17 | 18 | ## SLI 19 | General description of how the SLI works. e,g, What does it do, how does it calculate the metric, how can it be configured. 20 | 21 | ## TaskSet 22 | General description of how the TaskSet works. e,g, What does it do, what is the output, how can it be configured. 23 | 24 | 25 | ## Use Cases 26 | General use case details can be written here. Often these are not targeting specific use cases or configurations, but provide ideas to readers on how the codebundle might be used. 27 | 28 | ### Use Case: SLI: [Use Case Title - Target System or Configuration] 29 | General description of how the codebundle can be used to achieve a specific result. Sometimes is is applicable when using a generic codebundle that is applied to a specific product or system. 30 | 31 | ### Use Case: TaskSet: [Use Case Title - Target System or Configuration] 32 | General description of how the codebundle can be used to achieve a specific result. Sometimes is is applicable when using a generic codebundle that is applied to a specific product or system. 33 | 34 | ## Requirements 35 | Bullet list of requirements that might include rbac, service account, or configuration details. 36 | 37 | ## TODO 38 | General list of todos that you are thinking might ehnance the codebundle or its overall usage. 39 | - [ ] Add additional documentation 40 | - [ ] Add additional taskset checks 41 | 42 | ``` 43 | -------------------------------------------------------------------------------- /libraries/RW/GitHub/robot_tests/actions.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Library RW.GitHub.Actions 3 | Library RW.Utils 4 | Suite Setup Suite Initialization 5 | 6 | *** Keywords *** 7 | Suite Initialization 8 | ${GITHUB_SLI_TOKEN}= Evaluate RW.platform.Secret("github-read-token", """%{GITHUB_SLI_TOKEN}""") 9 | Set Suite Variable ${GITHUB_SLI_TOKEN} ${GITHUB_SLI_TOKEN} 10 | 11 | *** Variables *** 12 | ${OWNER} runwhen-contrib 13 | ${REPO} rw-public-codecollection 14 | ${WORKFLOW_FILENAME} generate-index.yml 15 | 16 | *** Tasks *** 17 | Get A Workflow's Runs 18 | ${rsp}= RW.GitHub.Actions.Get Workflow Runs 19 | ... owner=${OWNER} 20 | ... repo=${REPO} 21 | ... workflow_filename=${WORKFLOW_FILENAME} 22 | ... token=${GITHUB_SLI_TOKEN} 23 | 24 | Get A Workflow's Usage Stats 25 | ${rsp}= RW.GitHub.Actions.Get Workflow Usage 26 | ... owner=${OWNER} 27 | ... repo=${REPO} 28 | ... workflow_filename=${WORKFLOW_FILENAME} 29 | ... token=${GITHUB_SLI_TOKEN} 30 | 31 | Get Usage Of Last Run 32 | ${rsp}= RW.GitHub.Actions.Get Workflow Runs 33 | ... owner=${OWNER} 34 | ... repo=${REPO} 35 | ... workflow_filename=${WORKFLOW_FILENAME} 36 | ... token=${GITHUB_SLI_TOKEN} 37 | ${last_run_id}= Set Variable ${rsp["workflow_runs"][0]["id"]} 38 | ${usage}= RW.GitHub.Actions.Get Workflow Run Usage 39 | ... owner=${OWNER} 40 | ... repo=${REPO} 41 | ... run_id=${last_run_id} 42 | ... token=${GITHUB_SLI_TOKEN} 43 | 44 | Get Workflow Times For Last 30 Days 45 | ${times}= RW.GitHub.Actions.Get Workflow Times 46 | ... owner=${OWNER} 47 | ... repo=${REPO} 48 | ... workflow_filename=${WORKFLOW_FILENAME} 49 | ... token=${GITHUB_SLI_TOKEN} 50 | ${avg}= RW.Utils.Aggregate method=Average column=${times} 51 | 52 | Get Workflow Times For Last 15 Days 53 | ${times}= RW.GitHub.Actions.Get Workflow Times 54 | ... owner=${OWNER} 55 | ... repo=${REPO} 56 | ... workflow_filename=${WORKFLOW_FILENAME} 57 | ... within_time=15d 58 | ... token=${GITHUB_SLI_TOKEN} 59 | ${avg}= RW.Utils.Aggregate method=Average column=${times} 60 | -------------------------------------------------------------------------------- /libraries/RW/Uptime/StatusPage.py: -------------------------------------------------------------------------------- 1 | import requests 2 | from RW import platform 3 | 4 | 5 | class StatusPage: 6 | """Used to fetch and validate data/metrics from a Uptime.com status page and its components. 7 | 8 | Returns: 9 | _type_: None 10 | """ 11 | 12 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 13 | 14 | def get_component_status( 15 | self, auth_token: platform.Secret, url: str, timeout: int = 30 16 | ) -> dict: 17 | """Returns the current operational state of a component on a status page. Refer to https://uptime.com/api/v1/docs/#/statuspages/get_component_detail for docs. 18 | 19 | Args: 20 | auth_token (platform.Secret): A Platform Secret object containing the auth token for the Uptime status page. 21 | url (str): A URL pointing to the status page's component, eg: https://uptime.com/api/v1/statuspages/{status_page_id}/components/{component_id}/ 22 | timeout (int, optional): request timeout duration. Defaults to 30. 23 | 24 | Returns: 25 | dict: a dictionary containing the current operational state converted from json contents. 26 | """ 27 | headers: dict = {"Authorization": f"token {auth_token.value}"} 28 | rsp: requests.Response = requests.get( 29 | url=url, headers=headers, timeout=timeout 30 | ) 31 | return rsp.json() 32 | 33 | def validate_component_status( 34 | self, status_data: dict, allowed_status="operational,under-maintenance" 35 | ) -> bool: 36 | """Given a component status payload, check if it's within the allowed statuses (operational, planned maintenance, etc) 37 | returning True if it is, or false if not. 38 | 39 | Args: 40 | status_data (dict): A dictionary converted from the json contents of a response. Typically from get_component_status. 41 | allowed_status (str, optional): a CSV of allowed states. Defaults to "operational,under-maintenance". 42 | 43 | Returns: 44 | bool: whether the component is in an acceptable operational state or not. 45 | """ 46 | allowed_status: list = allowed_status.split(",") 47 | if status_data["status"] in allowed_status: 48 | return True 49 | return False 50 | -------------------------------------------------------------------------------- /libraries/RW/Artifactory/robot_tests/health.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Library RW.Artifactory 3 | Library RW.K8s 4 | Library RW.platform 5 | Library RW.Core 6 | Library OperatingSystem 7 | Suite Setup Suite Initialization 8 | 9 | *** Keywords *** 10 | Suite Initialization 11 | Set Suite Variable ${ARTIFACTORY_URL} %{ARTIFACTORY_URL} 12 | Set Suite Variable ${ARTIFACTORY_HEALTH_URL} %{ARTIFACTORY_HEALTH_URL} 13 | Set Suite Variable ${ARTIFACTORY_REGISTRY_URL} %{ARTIFACTORY_REGISTRY_URL} 14 | Set Suite Variable ${ARTIFACTORY_KUBECONFIG_PATH} %{ARTIFACTORY_KUBECONFIG_PATH} 15 | Set Suite Variable ${ARTIFACTORY_NS} %{ARTIFACTORY_NS} 16 | Set Suite Variable ${ARTIFACTORY_CONTEXT} %{ARTIFACTORY_CONTEXT} 17 | ${KUBECONFIG}= Get File ${ARTIFACTORY_KUBECONFIG_PATH} 18 | ${KUBECONFIG}= Evaluate RW.platform.Secret("kubeconfig", """${KUBECONFIG}""") 19 | Set Suite Variable ${KUBECONFIG} ${KUBECONFIG} 20 | 21 | *** Tasks *** 22 | Health Check Artifactory 23 | ${rsp}= RW.Artifactory.Get Health url=${ARTIFACTORY_HEALTH_URL} 24 | ${artifactory_health}= Set Variable ${rsp} 25 | ${status}= RW.Artifactory.Validate Health health_data=${artifactory_health} 26 | Log ${status} 27 | 28 | *** Tasks *** 29 | Get Artifactory Pods 30 | ${rsp}= RW.K8s.Get 31 | ... kind=Pod 32 | ... namespace=${ARTIFACTORY_NS} 33 | ... context=${ARTIFACTORY_CONTEXT} 34 | ... kubeconfig=${KUBECONFIG} 35 | ... label_selector=app=artifactory 36 | ... output_format=yaml 37 | ... unpack_from_items=True 38 | Log ${rsp} 39 | 40 | Get Artifactory Stateful Sets And Check Ready 41 | ${rsp}= RW.K8s.Get 42 | ... kind=StatefulSet 43 | ... namespace=${ARTIFACTORY_NS} 44 | ... context=${ARTIFACTORY_CONTEXT} 45 | ... kubeconfig=${KUBECONFIG} 46 | ... unpack_from_items=True 47 | ${all_ready}= RW.K8s.Stateful Sets Ready 48 | ... statefulsets=${rsp} 49 | ... unpack_from_items=False 50 | Log ${all_ready} 51 | Log ${rsp} 52 | 53 | Health Check Artifactory Registry 54 | Log hello 55 | 56 | Pull Artifactory Image 57 | Log hello 58 | -------------------------------------------------------------------------------- /codebundles/grpc-grpcurl-unary/runbook.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Metadata Display Name gRPC cURL Unary 4 | Metadata Supports gRPC,cURL 5 | Documentation A gRPC curl taskset for querying data from a generic grpcurl call and presenting the output. 6 | Force Tags GRPC CURL 7 | Suite Setup Suite Initialization 8 | Library String 9 | Library RW.Core 10 | Library RW.Utils 11 | Library RW.gRPC.gRPCurl 12 | 13 | *** Keywords *** 14 | Suite Initialization 15 | ${OPTIONAL_HEADERS}= RW.Core.Import Secret OPTIONAL_HEADERS 16 | ... type=string 17 | ... description=Optional. A json string of headers to include in the request against the REST endpoint. This can include your token. 18 | ... pattern=\w* 19 | ... default={} 20 | ... example={"Content-Type":"application/json"} 21 | ${GRPCURL_COMMAND}= RW.Core.Import User Variable GRPCURL_COMMAND 22 | ... type=string 23 | ... description=gRPCurl command to run; should return a single metric. You can also use jq for json parsing. 24 | ... pattern=\w* 25 | ... default=grpcurl -plaintext -d '{"greeting": "1"}' grpc.postman-echo.com:443 HelloService/SayHello 26 | ... example=grpcurl -plaintext -d '{"greeting": "1"}' grpc.postman-echo.com:443 HelloService/SayHello 27 | ${GRPCURL_SERVICE}= RW.Core.Import Service grpcurl 28 | ... type=string 29 | ... description=The selected RunWhen Service to use for accessing services within a network. 30 | ... pattern=\w* 31 | ... example=grpcurl-service.shared 32 | ... default=grpcurl-service.shared 33 | Set Suite Variable ${GRPCURL_COMMAND} ${GRPCURL_COMMAND} 34 | Set Suite Variable ${GRPCURL_SERVICE} ${GRPCURL_SERVICE} 35 | Set Suite Variable ${OPTIONAL_HEADERS} ${OPTIONAL_HEADERS} 36 | # TODO: design flow for proto files 37 | # TODO support more than unary method 38 | 39 | *** Tasks *** 40 | Run gRPCurl Command and Show Output 41 | ${rsp}= RW.gRPC.gRPCurl.Grpcurl Unary 42 | ... cmd=${GRPCURL_COMMAND} 43 | ... target_service=${GRPCURL_SERVICE} 44 | ... optional_headers=${OPTIONAL_HEADERS} 45 | RW.Core.Add Pre To Report ${rsp} -------------------------------------------------------------------------------- /libraries/RW/K8s/daemonset_tasks_mixin.py: -------------------------------------------------------------------------------- 1 | import logging 2 | from benedict import benedict 3 | from RW.Utils.utils import parse_numerical 4 | 5 | 6 | logger = logging.getLogger(__name__) 7 | 8 | 9 | class DaemonsetTasksMixin: 10 | def healthcheck_daemonset(self, daemonset): 11 | daemonset = benedict(daemonset, keypath_separator=None) 12 | current_number_scheduled = None 13 | desired_number_scheduled = None 14 | number_available = None 15 | # number of daemonsets on nodes that should not be 16 | number_misscheduled = None 17 | number_ready = None 18 | 19 | current_number_scheduled = daemonset["status", "currentNumberScheduled"] 20 | desired_number_scheduled = daemonset["status", "desiredNumberScheduled"] 21 | number_available = daemonset["status", "numberAvailable"] 22 | number_misscheduled = daemonset["status", "numberMisscheduled"] 23 | number_ready = daemonset["status", "numberReady"] 24 | 25 | max_unavailable = None 26 | try: 27 | mu = daemonset["spec", "updateStrategy", "rollingUpdate", "maxUnavailable"] 28 | max_unavailable = mu 29 | except: 30 | logger.info(f"Could not retreive updateStrategy.rollingUpdate.maxUnavailable from {daemonset}") 31 | number_unavailable = None 32 | try: 33 | nu = daemonset["status", "numberUnavailable"] 34 | number_unavailable = nu 35 | except: 36 | logger.info(f"Could not retreive status.numberUnavailable from {daemonset}") 37 | 38 | # we should not have above our max_unavailable 39 | if max_unavailable and number_unavailable and number_unavailable > max_unavailable: 40 | return False 41 | # we should have 0 mischeduled daemonset pods 42 | if number_misscheduled > 0: 43 | return False 44 | # current should be >= desired-max_unavailable 45 | if max_unavailable and current_number_scheduled < (desired_number_scheduled - max_unavailable): 46 | return False 47 | # ready, available, and current should be equal, indicating a successful pod rollout 48 | if current_number_scheduled != number_ready or number_ready != number_available: 49 | return False 50 | 51 | return True -------------------------------------------------------------------------------- /codebundles/cert-manager-expirations/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Metadata Display Name Cert-manager Expirations 4 | Metadata Supports K8s,cert-manager 5 | Documentation Retrieve number of expired TLS certificates managed by cert-manager within a given window. 6 | ... The metric pushed is the number of certs within the configured expiration window. 7 | Force Tags K8s Kubernetes Kube K8 Kubectl cert-manager 8 | Suite Setup Suite Initialization 9 | Library BuiltIn 10 | Library RW.Core 11 | Library RW.K8s 12 | Library RW.Utils 13 | Library RW.CertManager 14 | Library RW.platform 15 | Library OperatingSystem 16 | 17 | *** Keywords *** 18 | Suite Initialization 19 | ${kubeconfig}= RW.Core.Import Secret kubeconfig 20 | ${kubectl}= RW.Core.Import Service kubectl 21 | ${NAMESPACE}= RW.Core.Import User Variable NAMESPACE 22 | ... type=string 23 | ... description=The Kubernetes namespace your cert-manager resides in. 24 | ... pattern=\w* 25 | ... example=cert-manager 26 | ... default=cert-manager 27 | ${EXPIRATION_WINDOW}= RW.Core.Import User Variable EXPIRATION_WINDOW 28 | ... type=string 29 | ... description=The number of days at which a certificate is considered 'about to expire' for the metric pushed. 30 | ... pattern="^[0-9]*$" 31 | ... default=30 32 | ... example=30 33 | ${CONTEXT}= RW.Core.Import User Variable CONTEXT 34 | ... type=string 35 | ... description=Which Kubernetes context to operate within. 36 | ... pattern=\w* 37 | ... example=my-main-cluster 38 | 39 | *** Tasks *** 40 | Inspect Certification Expiration Dates 41 | ${rsp}= RW.K8s.Shell 42 | ... cmd=kubectl get Certificate --context=${CONTEXT} --namespace=${NAMESPACE} -o yaml 43 | ... target_service=${kubectl} 44 | ... kubeconfig=${KUBECONFIG} 45 | ${certs}= RW.Utils.Yaml To Dict ${rsp} 46 | ${rsp}= RW.CertManager.Get Expiring Certs 47 | ... certs=${certs} 48 | ... days_left_allowed=${EXPIRATION_WINDOW} 49 | ${metric}= Evaluate len($rsp) 50 | RW.Core.Push Metric ${metric} 51 | -------------------------------------------------------------------------------- /.github/scripts/index.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # This script generates the readme for the repo by combining the readme_header markdown content 4 | # with the appended index of sli content 5 | 6 | README_HEADER_PATH="readme_header.md" 7 | OUTPUT_FILE="README.md" 8 | CODEBUNDLE_PATH="./codebundles" 9 | 10 | # Add readme content from readme_header 11 | README_HEADER_CONTENT=$(cat $README_HEADER_PATH) 12 | echo "$README_HEADER_CONTENT" > $OUTPUT_FILE 13 | echo -e "\n" >> $OUTPUT_FILE 14 | 15 | 16 | # set markdown headers 17 | echo "## Codebundle Index" >> $OUTPUT_FILE 18 | echo "| Folder Name | Type | Path | Documentation | " >> $OUTPUT_FILE 19 | echo "|---|---|---|---|" >> $OUTPUT_FILE 20 | 21 | 22 | # Build array of all codebundle .robot files 23 | mapfile -d $'\0' codebundle_index < <(find $CODEBUNDLE_PATH -name "*.robot" -print0 | sort -z) 24 | 25 | # create simple markdown table 26 | for file in ${codebundle_index[@]} 27 | do 28 | IFS='/' read -ra path_split <<< ${file} 29 | docstring=$(cat ${file} | grep Documentation | head -1 | sed s/"Documentation"//) 30 | readme_ref="${path_split[0]}/${path_split[1]}/${path_split[2]}/README.md" 31 | path_ref="${path_split[0]}/${path_split[1]}/${path_split[2]}/" 32 | sli_use_cases=$(cat ${readme_ref} | grep "Use Case: SLI" | sed 's/#* //' | sed 's/$/
/' | sed 's/Use Case: SLI:/**Use Case**:/') 33 | sli_use_cases=$(echo $sli_use_cases) 34 | taskset_use_cases=$(cat ${readme_ref} | grep "Use Case: TaskSet" | sed 's/#* //' | sed 's/$/
/' | sed 's/Use Case: TaskSet:/**Use Case**:/') 35 | taskset_use_cases=$(echo $tasket_use_cases) 36 | 37 | if [[ ${path_split[3]} = "sli" || ${path_split[3]} = "sli.robot" ]]; then 38 | echo "| [${path_split[2]}](${path_ref}) | SLI | [sli.robot](${file}) | $docstring
$sli_use_cases |" >> $OUTPUT_FILE 39 | 40 | elif [[ ${path_split[3]} = "slo" || ${path_split[3]} = "slo.robot" ]]; then 41 | echo "| [${path_split[2]}](${path_ref}) | SLO | [slo.robot](${file}) | $docstring |" >> $OUTPUT_FILE 42 | 43 | elif [[ ${path_split[3]} = "runbook" || ${path_split[3]} = "runbook.robot" ]]; then 44 | echo "| [${path_split[2]}](${path_ref}) | TaskSet | [runbook.robot](${file}) | $docstring
$taskset_use_cases | ">> $OUTPUT_FILE 45 | fi 46 | 47 | done -------------------------------------------------------------------------------- /codebundles/github-status-components/README.md: -------------------------------------------------------------------------------- 1 | # GitHub Status - Platform Components 2 | 3 | ## SLI - Component Availability 4 | Check status of the GitHub platform (https://www.githubstatus.com/) for a specified set of GitHub service components. 5 | The metric supplied is a aggregated percentage indicating the availability of the components with 1 = 100% available. 6 | 7 | ### SLI Metric Calculation Details 8 | > **NOTE:** See the [RW GitHub Status Library](../../libraries/RW/GitHub/Status.py) code for additional details. 9 | 10 | This SLI calculates an availability metric for the GitHub platform, between 0 and 1. 11 | Optionally takes a subset of components from which to calculate this total. 12 | 13 | When no components are provided, the score is mapped from the indicator on the 14 | GitHub status page using the following values: 15 | - ``none`` : 1 16 | - ``minor`` : 0.66 17 | - ``major`` : 0.33 18 | - ``critical`` : 0 19 | 20 | If the components are provided, this function provides the average component 21 | availability score of the number of components provided in the set. These 22 | values are mapped from the component status attribute as follows: 23 | - ``operational`` : 1 24 | - ``degraded_performance`` : 0.66 25 | - ``partial_outage`` : 0.33 26 | - ``major_outage`` : 0 27 | 28 | Parameters: 29 | components (Set[str]): Set of components to optionally calculate 30 | availability score from. Current possible values at time of this release 31 | are: 32 | - "Git Operations" 33 | - "API Requests" 34 | - "Webhooks" 35 | - "Issues" 36 | - "Pull Requests" 37 | - "Actions" 38 | - "Packages" 39 | - "Pages" 40 | - "Codespaces" 41 | - "Copilot" 42 | 43 | Raises: 44 | ValueError: If the components provided do not match the list fetched from 45 | GitHub 46 | 47 | Returns: 48 | Value between 0 and 1 corresponding to the availability of the GitHub 49 | platform 50 | 51 | ## Use Cases 52 | 53 | ## Requirements 54 | 55 | ## TODO 56 | - [ ] Add additional documentation -------------------------------------------------------------------------------- /libraries/RW/CertManager/cert_manager.py: -------------------------------------------------------------------------------- 1 | """ 2 | cert-manager keyword library, based on shellservice base. 3 | 4 | Scope: Global 5 | """ 6 | import re, kubernetes, yaml 7 | import dateutil.parser 8 | import datetime 9 | from benedict import benedict 10 | from typing import Optional, Union 11 | from RW import platform 12 | from RW.K8s import K8s 13 | from enum import Enum 14 | 15 | class CertManager: 16 | """ 17 | cert-manager keyword library can be used monitor and health check cert-manager resources. 18 | """ 19 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 20 | 21 | def get_expiring_certs( 22 | self, 23 | certs, 24 | days_left_allowed:int, 25 | ): 26 | days_left_allowed = int(days_left_allowed) 27 | certs_expiring = [] 28 | if certs and "items" in certs: 29 | certs = certs["items"] if certs and "items" in certs else [certs] 30 | for cert in certs: 31 | cert = benedict(cert, keypath_separator=None) 32 | if ["status", "notAfter"] in cert: 33 | expiry_date = dateutil.parser.parse(cert["status", "notAfter"]) 34 | today = dateutil.parser.parse(self.get_now()) 35 | diff_days = (expiry_date - today).days 36 | if diff_days <= days_left_allowed: 37 | certs_expiring.append(cert) 38 | return certs_expiring 39 | 40 | def get_now(self): 41 | return f"{datetime.datetime.utcnow().isoformat()}Z" 42 | 43 | def health_check( 44 | self, 45 | cm_pods, 46 | ): 47 | healthy = True 48 | if cm_pods and "items" in cm_pods: 49 | cm_pods = cm_pods["items"] if cm_pods and "items" in cm_pods else [cm_pods] 50 | for pod in cm_pods: 51 | pod = benedict(pod, keypath_separator=None) 52 | if ["status", "containerStatuses"] in pod: 53 | for c_status in pod["status", "containerStatuses"]: 54 | c_status = benedict(c_status, keypath_separator=None) 55 | if ( 56 | c_status["ready"] is not True 57 | or c_status["started"] is not True 58 | ): 59 | healthy = False 60 | return healthy -------------------------------------------------------------------------------- /libraries/RW/Sysdig/robot_tests/get.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Library RW.Sysdig 3 | Suite Setup Suite Initialization 4 | 5 | *** Keywords *** 6 | Suite Initialization 7 | ${SYSDIG_TOKEN}= Evaluate RW.platform.Secret("token", """%{SYSDIG_TOKEN}""") 8 | ${SYSDIG_HEADERS}= Evaluate RW.platform.Secret("token", """%{SYSDIG_HEADERS}""") 9 | Set Suite Variable ${SYSDIG_HEADERS} ${SYSDIG_HEADERS} 10 | Set Suite Variable ${SYSDIG_TOKEN} ${SYSDIG_TOKEN} 11 | Set Suite Variable ${SYSDIG_URL} %{SYSDIG_URL} 12 | Set Suite Variable ${SYSDIG_PROMQL_URL} %{SYSDIG_PROMQL_URL} 13 | Set Suite Variable ${SYSDIG_QUERY} %{SYSDIG_QUERY} 14 | 15 | *** Tasks *** 16 | Fetch Metric List 17 | ${rsp}= RW.Sysdig.Get Metrics List token=${SYSDIG_TOKEN} sdc_url=${SYSDIG_URL} 18 | Log ${rsp} 19 | 20 | Fetch Filtered Metric List 21 | ${rsp}= RW.Sysdig.Get Metrics List token=${SYSDIG_TOKEN} sdc_url=${SYSDIG_URL} metric_filter=cpu 22 | Log ${rsp} 23 | ${rsp}= RW.Sysdig.Get Metrics List token=${SYSDIG_TOKEN} sdc_url=${SYSDIG_URL} metric_filter=fs 24 | Log ${rsp} 25 | ${rsp}= RW.Sysdig.Get Metrics List token=${SYSDIG_TOKEN} sdc_url=${SYSDIG_URL} metric_filter=kube 26 | Log ${rsp} 27 | 28 | Fetch Specific Metric Details 29 | ${rsp}= RW.Sysdig.Get Metrics Dict token=${SYSDIG_TOKEN} sdc_url=${SYSDIG_URL} metric_filter=fs.used.percent 30 | Log ${rsp} 31 | 32 | Fetch Metric 33 | ${rsp}= RW.Sysdig.Get Metric Data token=${SYSDIG_TOKEN} sdc_url=${SYSDIG_URL} 34 | ... query_str=[{"id": "cpu.used.percent", "aggregations": {"time": "timeAvg", "group": "avg"}}] 35 | Log ${rsp} 36 | 37 | Fetch Metric With Filter 38 | ${rsp}= RW.Sysdig.Get Metric Data token=${SYSDIG_TOKEN} sdc_url=${SYSDIG_URL} 39 | ... query_str=[{"id": "kubernetes.resourcequota.persistentvolumeclaims.used", "aggregations": {"time": "timeAvg", "group": "avg"}}] 40 | Log ${rsp} 41 | 42 | Fetch Promql Data 43 | ${rsp}= RW.Sysdig.Promql Query api_url=${SYSDIG_PROMQL_URL} query=${SYSDIG_QUERY} optional_headers=${SYSDIG_HEADERS} 44 | ... step=30s 45 | ... seconds_in_past=600 46 | ${data}= Set Variable ${rsp["data"]} 47 | ${transform}= RW.Sysdig.Transform Data ${data} Last 48 | -------------------------------------------------------------------------------- /codebundles/sli-alert-threshold/README.md: -------------------------------------------------------------------------------- 1 | # SLI Alert Threshold 2 | This codebundle allows you to monitor another SLI and trigger a TaskSet when the expected rate of a SLI value falls below a specified threshold. 3 | 4 | ## SLI 5 | Depending on your observability needs, the Multi-Window Multi-Burn algorithm + SLO error budgets approach may not apply to your use case. In those cases you can use this codebundle to create an alert threshold based on another SLI. A query will be performed on the monitored SLI's metrics for a given time window and resolution, and then the presence of a success (threshold) value will be checked. For example: fetch 1 hour of metric data at 5 minute intervals, for the monitored SLI; a `0` means failure and `1` means healthy. If we set the success value to `1` and a rate of `1.0` (100%) then when any failure occurs in the monitored SLI, this codebundle will immediately alert and trigger the given TaskSet. 6 | 7 | ### Use Case: SLI: Trigger a slack message when my API health check fails 8 | For our public API, it's uptime is critical, so we can monitor its health check and send a slack message to a team channel whenever the health check fails. 9 | 10 | ``` 11 | configProvided: 12 | - name: WORKSPACE_NAME 13 | value: 'tutorial-ws' 14 | - name: SLX_NAME 15 | value: public-api-health 16 | - name: HISTORY_WINDOW 17 | value: '1h' 18 | - name: RESOLUTION 19 | value: '15m' 20 | - name: THRESHOLD_VALUE 21 | value: 1 22 | - name: EXPECTED_THRESHOLD_RATE 23 | value: 1.0 24 | - name: INCIDENT_TASKSET 25 | value: tool-slackmsg 26 | ``` 27 | > Because the window in this example is `1h` and our threshold rate is `100%` then if 1 error is detected in the metric data, the threshold will be alerting for the next `1h` while it persists in the window. Consider this when determining your window, resolution and expected threshold rate in relation to how you want the TaskSet to behave. 28 | 29 | ## Requirements 30 | - The name of the SLI you want to monitor 31 | - Verify that the SLI submits a consistent value that denotes a success (eg: 0 is always good, 1 is always good, etc) as you'll need to set this as your `threshold value` 32 | - The name of the workspace the SLX, SLI and TaskSet reside in 33 | 34 | ## TODO 35 | - [ ] Add additional notes for tweaking threshold models to get the desired behaviour 36 | - [ ] Add docs for connecting to another workspace -------------------------------------------------------------------------------- /libraries/RW/AWS/robot_tests/cloudwatchlinks.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Library RW.AWS.CloudWatch 3 | Suite Setup Suite Initialization 4 | Variables test_queries.py 5 | 6 | *** Variables *** 7 | ${LOG_QUERY} fields @timestamp, @message | sort @timestamp desc | limit 500 8 | ${SECONDS_IN_PAST} 3600 9 | 10 | *** Tasks *** 11 | Get CloudWatch LogQuery Insights URL 12 | ${rsp}= RW.AWS.CloudWatch.Get CloudWatch Logs Insights Url 13 | ... ${AWS_REGION} 14 | ... ${LOG_QUERY} 15 | ... ${AWS_LOG_GROUP} 16 | ... ${SECONDS_IN_PAST} 17 | log ${rsp} 18 | 19 | Get CloudWatch MetricQuery Insights URL 20 | ${rsp}= RW.AWS.CloudWatch.Get CloudWatch Metric Insights Url 21 | ... ${AWS_REGION} 22 | ... ${AWS_METRIC_QUERY} 23 | log ${rsp} 24 | 25 | Test AWS URL Encode 26 | ${rsp}= RW.AWS.CloudWatch.AWS Encode Var ${SAMPLE_METRIC_QUERY["metrics"][0]["expression"]} 27 | 28 | *** Keywords *** 29 | Suite Initialization 30 | # used for testing user auth method 31 | ${AWS_USER_ACCESS_KEY_ID}= Evaluate RW.platform.Secret("aws_access_key_id", """%{AWS_USER_ACCESS_KEY_ID}""") 32 | Set Suite Variable ${AWS_USER_ACCESS_KEY_ID} ${AWS_USER_ACCESS_KEY_ID} 33 | ${AWS_USER_SECRET_ACCESS_KEY}= Evaluate RW.platform.Secret("aws_secret_access_key", """%{AWS_USER_SECRET_ACCESS_KEY}""") 34 | Set Suite Variable ${AWS_USER_SECRET_ACCESS_KEY} ${AWS_USER_SECRET_ACCESS_KEY} 35 | Set Suite Variable ${AWS_USER_REGION} %{AWS_USER_REGION} 36 | # standard role based auth 37 | ${AWS_ACCESS_KEY_ID}= Evaluate RW.platform.Secret("aws_access_key_id", """%{AWS_ACCESS_KEY_ID}""") 38 | Set Suite Variable ${AWS_ACCESS_KEY_ID} ${AWS_ACCESS_KEY_ID} 39 | ${AWS_SECRET_ACCESS_KEY}= Evaluate RW.platform.Secret("aws_secret_access_key", """%{AWS_SECRET_ACCESS_KEY}""") 40 | Set Suite Variable ${AWS_SECRET_ACCESS_KEY} ${AWS_SECRET_ACCESS_KEY} 41 | Set Suite Variable ${AWS_REGION} %{AWS_REGION} 42 | ${AWS_ROLE_ASSUME_ARN}= Evaluate RW.platform.Secret("aws_role_assume_arn", """%{AWS_ROLE_ASSUME_ARN}""") 43 | Set Suite Variable ${AWS_ROLE_ASSUME_ARN} ${AWS_ROLE_ASSUME_ARN} 44 | # Test config 45 | Set Suite Variable ${AWS_METRIC_QUERY} %{AWS_METRIC_QUERY} 46 | Set Suite Variable ${AWS_LOG_GROUP} %{AWS_LOG_GROUP} 47 | -------------------------------------------------------------------------------- /codebundles/gcp-gcloudcli-generic/runbook.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Metadata Display Name GCP GCloud Generic Report 4 | Metadata Supports GCP,gcloud 5 | Documentation Run arbitrary gcloud commands and capture the stdout in a report. 6 | Force Tags GCLOUD CLI JSON DATA 7 | Suite Setup Suite Initialization 8 | Library RW.Core 9 | Library RW.Utils 10 | Library RW.GCP.GCloudCLI 11 | 12 | *** Keywords *** 13 | Suite Initialization 14 | ${GCLOUD_COMMAND}= RW.Core.Import User Variable GCLOUD_COMMAND 15 | ... type=string 16 | ... description=gcloud command to run and return the stdout of. 17 | ... pattern=\w* 18 | ... default=gcloud logging read "severity>=WARNING" --freshness=15m --limit=5 19 | ... example=gcloud logging read "severity>=WARNING" --freshness=15m --limit=5 20 | ${GCLOUD_SERVICE}= RW.Core.Import Service gcloud 21 | ... type=string 22 | ... description=The selected RunWhen Service to use for accessing services within a network. 23 | ... pattern=\w* 24 | ... example=gcloud-service.shared 25 | ... default=gcloud-service.shared 26 | ${gcp_credentials_json}= RW.Core.Import Secret gcp_credentials_json 27 | ... type=string 28 | ... description=GCP service account json used to authenticate with GCP APIs. 29 | ... pattern=\w* 30 | ... example={"type": "service_account","project_id":"myproject-ID", ... super secret stuff ...} 31 | ${PROJECT_ID}= RW.Core.Import User Variable PROJECT_ID 32 | ... type=string 33 | ... description=The GCP Project ID to scope the API to. 34 | ... pattern=\w* 35 | ... example=myproject-ID 36 | Set Suite Variable ${GCLOUD_COMMAND} ${GCLOUD_COMMAND} 37 | Set Suite Variable ${GCLOUD_SERVICE} ${GCLOUD_SERVICE} 38 | Set Suite Variable ${gcp_credentials_json} ${gcp_credentials_json} 39 | Set Suite Variable ${PROJECT_ID} ${PROJECT_ID} 40 | 41 | *** Tasks *** 42 | Run Gcloud CLI Command and Push metric 43 | ${rsp}= RW.GCP.GCloudCLI.Shell 44 | ... cmd=${GCLOUD_COMMAND} 45 | ... target_service=${GCLOUD_SERVICE} 46 | ... gcp_credentials_json=${gcp_credentials_json} 47 | ... project_id=${PROJECT_ID} 48 | RW.Core.Add Pre To Report ${rsp} -------------------------------------------------------------------------------- /codebundles/gcp-opssuite-logquery/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Metadata Display Name GCP Operations Suite Log Query 4 | Metadata Supports GCP,Cloud-Logging,Operations-Suite,stackdriver 5 | Documentation Retrieve the number of results of a GCP Log Explorer query. 6 | Force Tags GCP OpsSuite Query Logs 7 | Library OperatingSystem 8 | Library Collections 9 | Library DateTime 10 | Library RW.Core 11 | Library RW.Utils.RWUtils 12 | Library RW.GCP.OpsSuite 13 | Suite Setup Suite Initialization 14 | 15 | *** Tasks *** 16 | Running GCE Logging Query And Pushing Result Count Metric 17 | ${query}= RW.GCP.OpsSuite.Add Time Range 18 | ... base_query=${LOG_QUERY} 19 | ... within_time=${WITHIN_TIME} 20 | ${rsp}= RW.GCP.OpsSuite.Get Gce Logs 21 | ... project_name=${PROJECT_ID} 22 | ... log_filter=${query} 23 | ... gcp_credentials=${ops-suite-sa} 24 | ${result_dict}= RW.Utils.RWUtils.From Json ${rsp} 25 | ${result_count}= Evaluate len($result_dict) 26 | RW.Core.Push Metric ${result_count} 27 | 28 | *** Keywords *** 29 | Suite Initialization 30 | RW.Core.Import Secret ops-suite-sa 31 | ... type=string 32 | ... description=GCP service account json used to authenticate with GCP APIs. 33 | ... pattern=\w* 34 | ... example={"type": "service_account","project_id":"myproject-ID", ... super secret stuff ...} 35 | RW.Core.Import User Variable PROJECT_ID 36 | ... type=string 37 | ... description=The GCP Project ID to scope the API to. 38 | ... pattern=\w* 39 | ... example=myproject-ID 40 | RW.Core.Import User Variable LOG_QUERY 41 | ... type=string 42 | ... description=The log query used to filter results to determine the metric count. 43 | ... pattern=\w* 44 | ... example=resource.labels.namespace_name:"my-namespace" 45 | RW.Core.Import User Variable WITHIN_TIME 46 | ... type=string 47 | ... pattern=((\d+?)d)?((\d+?)h)?((\d+?)m)?((\d+?)s)? 48 | ... description=How far back to retrieve log entries, in the format "1d1h15m", with possible unit values being 'd' representing days, 'h' representing hours, 'm' representing minutes, and 's' representing seconds. 49 | ... example=30m 50 | ... default=15m 51 | -------------------------------------------------------------------------------- /codebundles/grpc-grpcurl-unary/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Metadata Display Name gRPC cURL Unary 4 | Metadata Supports gRPC,cURL 5 | Documentation A gRPC curl SLI for querying and extracting data from a generic grpcurl call. 6 | Force Tags GRPC CURL 7 | Suite Setup Suite Initialization 8 | Library String 9 | Library RW.Core 10 | Library RW.Utils 11 | Library RW.gRPC.gRPCurl 12 | 13 | *** Keywords *** 14 | Suite Initialization 15 | ${OPTIONAL_HEADERS}= RW.Core.Import Secret OPTIONAL_HEADERS 16 | ... type=string 17 | ... description=Optional. A json string of headers to include in the request against the REST endpoint. This can include your token. 18 | ... pattern=\w* 19 | ... default={} 20 | ... example={"Content-Type":"application/json"} 21 | ${GRPCURL_COMMAND}= RW.Core.Import User Variable GRPCURL_COMMAND 22 | ... type=string 23 | ... description=gRPCurl command to run; should return a single metric. You can also use jq for json parsing. 24 | ... pattern=\w* 25 | ... default=grpcurl -plaintext -d '{"greeting": "1"}' grpc.postman-echo.com:443 HelloService/SayHello | jq '(.reply | split(" "))[1]' 26 | ... example=grpcurl -plaintext -d '{"greeting": "1"}' grpc.postman-echo.com:443 HelloService/SayHello | jq '(.reply | split(" "))[1]' 27 | ${GRPCURL_SERVICE}= RW.Core.Import Service grpcurl 28 | ... type=string 29 | ... description=The selected RunWhen Service to use for accessing services within a network. 30 | ... pattern=\w* 31 | ... example=grpcurl-service.shared 32 | ... default=grpcurl-service.shared 33 | Set Suite Variable ${GRPCURL_COMMAND} ${GRPCURL_COMMAND} 34 | Set Suite Variable ${GRPCURL_SERVICE} ${GRPCURL_SERVICE} 35 | Set Suite Variable ${OPTIONAL_HEADERS} ${OPTIONAL_HEADERS} 36 | # TODO: design flow for proto files 37 | # TODO support more than unary method 38 | 39 | *** Tasks *** 40 | Run gRPCurl Command and Push Metric 41 | ${rsp}= RW.gRPC.gRPCurl.Grpcurl Unary 42 | ... cmd=${GRPCURL_COMMAND} 43 | ... target_service=${GRPCURL_SERVICE} 44 | ... optional_headers=${OPTIONAL_HEADERS} 45 | ${rsp}= Remove String ${rsp} " \n 46 | ${metric}= Convert To Number ${rsp} 47 | RW.Core.Push Metric ${metric} -------------------------------------------------------------------------------- /libraries/RW/SocialScrape/SocialScrape.py: -------------------------------------------------------------------------------- 1 | # Inspired from https://github.com/MartinBeckUT/TwitterScraper/blob/master/snscrape/python-wrapper/snscrape-python-wrapper.py 2 | # # Medium Article Follow-Along: https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af 3 | """ 4 | SocialScrape keyword library 5 | Based on snscrape https://github.com/JustAnotherArchivist/snscrape 6 | 7 | Scope: Global 8 | """ 9 | 10 | import requests 11 | import urllib 12 | import json 13 | import dateutil.parser 14 | 15 | from datetime import timedelta, date 16 | import datetime 17 | from RW import platform 18 | import snscrape.modules.twitter as sntwitter 19 | import pandas as pd 20 | 21 | class SocialScrape: 22 | """ 23 | Twitter Scraper keyword library 24 | Uses https://github.com/JustAnotherArchivist/snscrape 25 | """ 26 | 27 | ROBOT_LIBRARY_SCOPE = "GLOBAL" 28 | 29 | @staticmethod 30 | def twitter_scrape_handle(handle: str = None, maxTweets: int = 5, max_tweet_age: int = 365, min_tweet_age: int = 0): 31 | """ 32 | Scrapes a specific twitter handle and delivers a list of tweets. 33 | E.g. `[[datetime.datetime(2022, 11, 9, 15, 22, 29, tzinfo=datetime.timezone.utc), 1590364208201633793, 'The incident has now been resolved. https://t.co/H0SiNoKzw8', 'GitBookStatus']` 34 | 35 | The search range is provided in days and must be provided. The maximum amout of tweets to fetch must also be provided. 36 | """ 37 | 38 | latest_tweets = [] 39 | 40 | ## Set todays date and calculate the date range for the search query 41 | today = date.today() 42 | current_date = today.strftime("%Y-%m-%d") 43 | start_range = datetime.datetime.strptime(current_date,'%Y-%m-%d').date()-timedelta(days=max_tweet_age) 44 | end_range = datetime.datetime.strptime(current_date,'%Y-%m-%d').date()-timedelta(days=min_tweet_age) 45 | 46 | # Using TwitterSearchScraper to scrape data 47 | for x,tweet in enumerate(sntwitter.TwitterSearchScraper(f'from:{handle} since:{start_range} until:{end_range}').get_items()): 48 | if x>maxTweets: 49 | break 50 | latest_tweets.append([tweet.date, tweet.id, tweet.content, tweet.user.username]) 51 | 52 | # Format list 53 | formatted_tweets = pd.DataFrame(latest_tweets, columns=['Datetime', 'Tweet Id', 'Text', 'Username']) 54 | 55 | return formatted_tweets 56 | -------------------------------------------------------------------------------- /codebundles/k8s-kubectl-run/runbook.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Documentation This codebundle runs an arbitrary kubectl command and writes the stdout to a report. 4 | ... Typically used in conjunction with other codebundles. 5 | Force Tags K8s Kubernetes Kube K8 Kubectl Stdout Command Run 6 | Metadata Display Name Kubernetes Run Shell Command 7 | Metadata Supports Kubernetes,AKS,EKS,GKE,OpenShift 8 | Suite Setup Suite Initialization 9 | Library RW.Core 10 | Library RW.K8s 11 | Library RW.Utils 12 | Library RW.platform 13 | 14 | *** Keywords *** 15 | Suite Initialization 16 | ${kubeconfig}= RW.Core.Import Secret kubeconfig 17 | ... type=string 18 | ... description=The kubernetes kubeconfig yaml containing connection configuration used to connect to cluster(s). 19 | ... pattern=\w* 20 | ... example=For examples, start here https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/ 21 | ${kubectl}= RW.Core.Import Service kubectl 22 | ... description=The location service used to interpret shell commands. 23 | ... default=kubectl-service.shared 24 | ... example=kubectl-service.shared 25 | ${KUBECTL_COMMAND}= RW.Core.Import User Variable KUBECTL_COMMAND 26 | ... type=string 27 | ... description=The kubectl command to run and retreive stdout from. 28 | ... pattern=\w* 29 | ... example=kubectl get pods --context my-context -n my-namespace 30 | ${DISTRIBUTION}= RW.Core.Import User Variable DISTRIBUTION 31 | ... type=string 32 | ... description=Which distribution of Kubernetes to use for operations, such as: Kubernetes, OpenShift, etc. 33 | ... pattern=\w* 34 | ... enum=[Kubernetes,GKE,OpenShift] 35 | ... example=Kubernetes 36 | ... default=Kubernetes 37 | Set Suite Variable ${kubeconfig} ${kubeconfig} 38 | 39 | *** Tasks *** 40 | Running Kubectl And Adding Stdout To Report 41 | ${stdout}= RW.K8s.Shell 42 | ... cmd=${KUBECTL_COMMAND} 43 | ... target_service=${kubectl} 44 | ... kubeconfig=${KUBECONFIG} 45 | ${history}= RW.K8s.Pop Shell History 46 | ${history}= RW.Utils.List To String data_list=${history} 47 | RW.Core.Add Pre To Report ${stdout} 48 | RW.Core.Add Pre To Report Commands Used: ${history} 49 | -------------------------------------------------------------------------------- /codebundles/github-actions-workflowtiming/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Monitors the average timing of a github actions workflow file within a repo 3 | ... and returns the average runtime in minutes. 4 | Metadata Display Name GitHub Actions Workflow Timing 5 | Metadata Supports GitHub,Actions,Workflows 6 | Metadata Type SLI 7 | Metadata Author Jonathan Funk 8 | Force Tags github actions timing monitor 9 | Library RW.Core 10 | Library RW.GitHub.Actions 11 | Library RW.Utils 12 | Suite Setup Suite Initialization 13 | 14 | *** Keywords *** 15 | Suite Initialization 16 | ${OWNER}= RW.Core.Import User Variable OWNER 17 | ... type=string 18 | ... description=The owner or organization name for the repo. 19 | ... pattern=\w* 20 | ... example=my-org 21 | ${REPO}= RW.Core.Import User Variable REPO 22 | ... type=string 23 | ... description=The name of the github repository. 24 | ... pattern=\w* 25 | ... example=myproject 26 | ${WORKFLOW_FILE}= RW.Core.Import User Variable WORKFLOW_FILE 27 | ... type=string 28 | ... description=The filename of the github workflow. 29 | ... pattern=\w* 30 | ... example=my-cicd.yaml 31 | ${DURATION}= RW.Core.Import User Variable DURATION 32 | ... type=string 33 | ... pattern=((\d+?)d)?((\d+?)h)?((\d+?)m)?((\d+?)s)? 34 | ... description=How much history to include in calculations. This range is in the format "1d7h10m", with possible unit values being 'd' representing days, 'h' representing hours, 'm' representing minutes, and 's' representing seconds. 35 | ... example=30d 36 | ${github-read-token}= RW.Core.Import Secret github-read-token 37 | ... type=string 38 | ... description=The github token to use. 39 | ... pattern=\w* 40 | ... example=my-super-secret-token 41 | 42 | *** Tasks *** 43 | Get Average Run Time For Workflow 44 | ${times}= RW.GitHub.Actions.Get Workflow Times 45 | ... owner=${OWNER} 46 | ... repo=${REPO} 47 | ... workflow_filename=${WORKFLOW_FILE} 48 | ... within_time=${DURATION} 49 | ... token=${github-read-token} 50 | ${avg_seconds}= RW.Utils.Aggregate method=Average column=${times} 51 | ${avg_minutes}= Evaluate ${avg_seconds}/60 52 | RW.Core.Push Metric ${avg_minutes} 53 | -------------------------------------------------------------------------------- /codebundles/gcp-serviceshealth/sli.robot: -------------------------------------------------------------------------------- 1 | 2 | *** Settings *** 3 | Metadata Author Jonathan Funk 4 | Metadata Display Name GCP Service Status 5 | Metadata Supports GCP,Status 6 | Documentation This codebundle sets up a monitor for a specific region and GCP Product, which is then periodically checked for 7 | ... ongoing incidents based on the history available at https://status.cloud.google.com/incidents.json filtered based on severity level. 8 | Force Tags GCP Status Health services Up Available Platform Google Cloud Incidents 9 | Library RW.Core 10 | Library RW.GCP.ServiceHealth 11 | 12 | *** Tasks *** 13 | Get Number of GCP Incidents Effecting My Workspace 14 | RW.Core.Import User Variable WITHIN_TIME 15 | ... type=string 16 | ... pattern=((\d+?)d)?((\d+?)h)?((\d+?)m)?((\d+?)s)? 17 | ... description=How far back in incident history to check, in the format "1d1h15m", with possible unit values being 'd' representing days, 'h' representing hours, 'm' representing minutes, and 's' representing seconds. 18 | ... example=30m 19 | ... default=15m 20 | RW.Core.Import User Variable PRODUCTS 21 | ... type=string 22 | ... description=Which product(s) to monitor for incidents. Accepts CSV. For further examples refer to the product names at https://status.cloud.google.com/index.html 23 | ... pattern=\w* 24 | ... default=Google Kubernetes Engine 25 | ... example=Google Kubernetes Engine,Google Cloud Console 26 | RW.Core.Import User Variable REGIONS 27 | ... type=string 28 | ... description=Which region to monitor for incidents. Accepts CSV. For further region value examples refer to any of the region tabs, eg: https://status.cloud.google.com/regional/americas 29 | ... pattern=\w* 30 | ... default=us-central1 31 | ... example=us-central1,us-west2 32 | RW.Core.Import User Variable SEVERITY 33 | ... type=string 34 | ... enum=[low,medium,high] 35 | ... description=What level of severity to consider for counting as incidents. 36 | ... example=low 37 | ... default=low 38 | ${history}= RW.GCP.ServiceHealth.Get Status Json 39 | ${filtered}= RW.GCP.ServiceHealth.Filter Status Results ${history} ${WITHIN_TIME} 40 | ${metric}= Evaluate len($filtered) 41 | Log count: ${metric} 42 | RW.Core.Push Metric ${metric} 43 | -------------------------------------------------------------------------------- /codebundles/gcp-gcloudcli-generic/sli.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Metadata Display Name GCP GCloud Generic Metric 4 | Metadata Supports GCP,gcloud 5 | Documentation Run arbitrary gcloud commands and parse their output for arbitrary values such as json to be submitted as a metric. 6 | Force Tags GCLOUD CLI JSON DATA 7 | Suite Setup Suite Initialization 8 | Library RW.Core 9 | Library RW.Utils 10 | Library RW.GCP.GCloudCLI 11 | 12 | *** Keywords *** 13 | Suite Initialization 14 | ${GCLOUD_COMMAND}= RW.Core.Import User Variable GCLOUD_COMMAND 15 | ... type=string 16 | ... description=gcloud command to run; should return a single metric. Can use jq for json parsing. 17 | ... pattern=\w* 18 | ... default=gcloud logging read "severity>=WARNING" --freshness=15m --limit=20 --format=json | jq length 19 | ... example=gcloud logging read "severity>=WARNING" --freshness=15m --limit=20 --format=json | jq length 20 | ${GCLOUD_SERVICE}= RW.Core.Import Service gcloud 21 | ... type=string 22 | ... description=The selected RunWhen Service to use for accessing services within a network. 23 | ... pattern=\w* 24 | ... example=gcloud-service.shared 25 | ... default=gcloud-service.shared 26 | ${gcp_credentials_json}= RW.Core.Import Secret gcp_credentials_json 27 | ... type=string 28 | ... description=GCP service account json used to authenticate with GCP APIs. 29 | ... pattern=\w* 30 | ... example={"type": "service_account","project_id":"myproject-ID", ... super secret stuff ...} 31 | ${PROJECT_ID}= RW.Core.Import User Variable PROJECT_ID 32 | ... type=string 33 | ... description=The GCP Project ID to scope the API to. 34 | ... pattern=\w* 35 | ... example=myproject-ID 36 | Set Suite Variable ${GCLOUD_COMMAND} ${GCLOUD_COMMAND} 37 | Set Suite Variable ${GCLOUD_SERVICE} ${GCLOUD_SERVICE} 38 | Set Suite Variable ${gcp_credentials_json} ${gcp_credentials_json} 39 | Set Suite Variable ${PROJECT_ID} ${PROJECT_ID} 40 | 41 | *** Tasks *** 42 | Run Gcloud CLI Command and Push metric 43 | ${rsp}= RW.GCP.GCloudCLI.Shell 44 | ... cmd=${GCLOUD_COMMAND} 45 | ... target_service=${GCLOUD_SERVICE} 46 | ... gcp_credentials_json=${gcp_credentials_json} 47 | ... project_id=${PROJECT_ID} 48 | ${metric}= Convert To Number ${rsp} 49 | RW.Core.Push Metric ${metric} -------------------------------------------------------------------------------- /codebundles/aws-s3-stalecheck/runbook.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Metadata Author Jonathan Funk 3 | Metadata Type TaskSet 4 | Metadata Supports aws,s3,bucket 5 | Metadata Display Name AWS S3 Stale Check 6 | Documentation Identify stale AWS S3 buckets, based on last modified object timestamp. 7 | Force Tags AWS Storage S3 Bucket Metrics Metric Query Boto3 Objects Stale 8 | Library RW.Core 9 | Library RW.AWS.S3 10 | Suite Setup Suite Initialization 11 | 12 | *** Tasks *** 13 | Create Report For Stale Buckets 14 | ${rsp}= RW.AWS.S3.Authenticate 15 | ... ${AWS_ACCESS_KEY_ID} 16 | ... ${AWS_SECRET_ACCESS_KEY} 17 | ... ${REGION} 18 | ... auth_mode=${AUTH_MODE} 19 | ... role_arn=${AWS_ROLE_ASSUME_ARN} 20 | ${report}= RW.AWS.S3.Run S3 Checks region_name=${REGION} days_stale_threshold=${DAYS_STALE_THRESHOLD} 21 | RW.Core.Add Pre To Report ${report} 22 | 23 | *** Keywords *** 24 | Suite Initialization 25 | ${AWS_ACCESS_KEY_ID}= Import Secret aws_access_key_id 26 | ... description=What AWS access key ID to use for authentication. 27 | ${AWS_SECRET_ACCESS_KEY}= Import Secret aws_secret_access_key 28 | ... description=What AWS secret access key to use for authentication. 29 | ${AWS_ROLE_ASSUME_ARN}= Import Secret aws_assume_role_arn 30 | ... description=Which role arn to assume if the role authentication flow is used. 31 | RW.Core.Import User Variable AUTH_MODE 32 | ... type=string 33 | ... enum=[User,Role] 34 | ... description=Determines the authentication flow when connecting to AWS services. 35 | ... example=User 36 | RW.Core.Import User Variable REGION 37 | ... type=string 38 | ... description=The AWS region to target resources in. 39 | ... pattern=\w* 40 | ... example=us-west-1 41 | RW.Core.Import User Variable 42 | ... DAYS_STALE_THRESHOLD 43 | ... type=string 44 | ... description=The number of days of no activity allowed before a bucket is considered stale. 45 | ... pattern="^[0-9]*$" 46 | ... example=90 47 | ... default=90 48 | Set Suite Variable ${AWS_ACCESS_KEY_ID} ${AWS_ACCESS_KEY_ID} 49 | Set Suite Variable ${AWS_SECRET_ACCESS_KEY} ${AWS_SECRET_ACCESS_KEY} 50 | Set Suite Variable ${AWS_ROLE_ASSUME_ARN} ${AWS_ROLE_ASSUME_ARN} 51 | Set Suite Variable ${AUTH_MODE} ${AUTH_MODE} 52 | Set Suite Variable ${REGION} ${REGION} 53 | Set Suite Variable ${DAYS_STALE_THRESHOLD} ${DAYS_STALE_THRESHOLD} 54 | --------------------------------------------------------------------------------