├── rfcs ├── images │ ├── oip3-1.png │ ├── oip3-2.png │ ├── oip3-3.png │ ├── oip3-4.png │ ├── oip3-5.png │ ├── oip2-siren-1.png │ ├── oip2-siren-2.png │ └── oip2-siren-3.png ├── OIP-000-template.md ├── OIP-001-unified-urn.md ├── OIP-002-alert-subscription-and-notification.md └── OIP-003-siren-as-notification-service.md ├── .gitignore ├── README.md ├── .github └── ISSUE_TEMPLATE │ └── feature_request.md ├── roadmap.md └── LICENSE /rfcs/images/oip3-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/raystack/platform/HEAD/rfcs/images/oip3-1.png -------------------------------------------------------------------------------- /rfcs/images/oip3-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/raystack/platform/HEAD/rfcs/images/oip3-2.png -------------------------------------------------------------------------------- /rfcs/images/oip3-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/raystack/platform/HEAD/rfcs/images/oip3-3.png -------------------------------------------------------------------------------- /rfcs/images/oip3-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/raystack/platform/HEAD/rfcs/images/oip3-4.png -------------------------------------------------------------------------------- /rfcs/images/oip3-5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/raystack/platform/HEAD/rfcs/images/oip3-5.png -------------------------------------------------------------------------------- /rfcs/images/oip2-siren-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/raystack/platform/HEAD/rfcs/images/oip2-siren-1.png -------------------------------------------------------------------------------- /rfcs/images/oip2-siren-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/raystack/platform/HEAD/rfcs/images/oip2-siren-2.png -------------------------------------------------------------------------------- /rfcs/images/oip2-siren-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/raystack/platform/HEAD/rfcs/images/oip2-siren-3.png -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Binaries for programs and plugins 2 | *.exe 3 | *.exe~ 4 | *.dll 5 | *.so 6 | *.dylib 7 | .DS_Store 8 | 9 | # Test binary, built with `go test -c` 10 | *.test 11 | 12 | # Output of the go coverage tool, specifically when used with LiteIDE 13 | *.out 14 | 15 | # Dependency directories (remove the comment below to include it) 16 | # vendor/ 17 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Platform 2 | 3 | ODPF is the next-gen collaborative Open Data Platform to power data workflows. 4 | 5 | This repository contains 6 | 7 | - ODPF's [roadmap](https://github.com/orgs/odpf/projects/10) 8 | - A place to raise [issues](https://github.com/odpf/platform/issues) 9 | - Have [discussions](https://github.com/orgs/odpf/discussions), ask questions and more 10 | - Join us on [Slack](https://bit.ly/2RzPbtn) 11 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Summary** 11 | A clear and concise description of what the feature is. Ex. I'm always frustrated when [...] 12 | 13 | **Intended outcome** 14 | A clear and concise description of what you want to happen. 15 | 16 | **How will it work?** 17 | A clear and concise description of how this feature will work? 18 | 19 | **Additional context** 20 | Add any other context or screenshots about the feature request here. 21 | -------------------------------------------------------------------------------- /roadmap.md: -------------------------------------------------------------------------------- 1 | # Roadmap 2 | 3 | ❇️ View the o[fficial ODPF public roadmap 4 | 5 | Our product [roadmap](https://github.com/orgs/odpf/projects/10) is where you can learn about what features we're working on, what stage they're in, and when we expect to bring them to you. The platform repository is for communicating ODPF’s roadmap. Have any questions or comments about items on the roadmap? Share your feedback via [ODPF public feedback discussions](https://github.com/orgs/odpf/discussions). 6 | 7 | ## Guide to the roadmap 8 | 9 | Every item on the roadmap is an issue, with a label that indicates each of the following: 10 | 11 | - A **status** that indicates the expected planned timeline for the release. 12 | - A **progress** that indicates the stage of development of the feature. 13 | - A **feature** that indicates the feature or product to which the item belongs. 14 | - One or more **product** labels that indicate which ODPF product we expect the feature to be available in. 15 | - Once a feature is delivered, the **released** label will be applied to the roadmap issue and the issue will be closed with a comment linking to the relevant changelog post. 16 | 17 | ## Release phases 18 | 19 | Release phases indicate the stages that the product or feature goes through, from early testing to general availability. 20 | 21 | - **alpha:** _Primarily for testing and feedback_ 22 | 23 | Features still under heavy development, and subject to change. Not for production use, and no documentation, or support provided. 24 | 25 | - **beta:** _Released for public feedback_ 26 | 27 | Features mostly complete and documented. Timeline and requirements for stable release usually published. Limited support provided. 28 | 29 | - **stable:** _Released for production use_ 30 | 31 | Ready for production use for all users Approximately 1-2 months from beta. 32 | 33 | ## Roadmap stages 34 | 35 | The roadmap is arranged on a project board to give a sense for how far out each item is on the horizon. Every product or feature is added to a particular project board column according to the quarter in which it is expected to ship next. Be sure to read the disclaimer below since the roadmap is subject to change, especially further out on the timeline. 36 | 37 | ## Disclaimer 38 | 39 | Any statement in this repository that is not purely historical is considered a forward-looking statement. Forward-looking statements included in this repository are based on information available to ODPF as of the date they are made, and ODPF assumes no obligation to update any forward-looking statements. The forward-looking product roadmap does not represent a commitment, guarantee, obligation or promise to deliver any product or feature, or to deliver any product and feature by any particular date, and is intended to outline the general development plans. Customers should not rely on this roadmap to make any decision. 40 | -------------------------------------------------------------------------------- /rfcs/OIP-000-template.md: -------------------------------------------------------------------------------- 1 | # OIP-000 - RFC Template 2 | 3 | The RFC begins with a brief overview. This section should be one or two paragraphs that just explains what the goal of this RFC is going to be, but without diving too deeply into the "why", "why now", "how", etc. Ensure anyone opening the document will form a clear understanding of the RFCs intent from reading this paragraph(s). 4 | 5 | # Background 6 | 7 | The next section is the "Background" section. This section should be at least two paragraphs and can take up to a whole page in some cases. The guiding goal of the background section is: as a newcomer to this project (new employee, team transfer), can I read the background section and follow any links to get the full context of why this change is necessary? 8 | 9 | If you can't show a random engineer the background section and have them acquire nearly full context on the necessity for the RFC, then the background section is not full enough. To help achieve this, link to prior RFCs, discussions, and more here as necessary to provide context so you don't have to simply repeat yourself. 10 | 11 | # Proposal 12 | 13 | The next required section is "Proposal" or "Goal". Given the background above, this section proposes a solution. This should be an overview of the "how" for the solution, but for details further sections will be used. 14 | 15 | ## Abandoned Ideas (Optional) 16 | 17 | As RFCs evolve, it is common that there are ideas that are abandoned. Rather than simply deleting them from the document, you should try to organize them into sections that make it clear they're abandoned while explaining why they were abandoned. 18 | 19 | When sharing your RFC with others or having someone look back on your RFC in the future, it is common to walk the same path and fall into the same pitfalls that we've since matured from. Abandoned ideas are a way to recognize that path and explain the pitfalls and why they were abandoned. 20 | 21 | ## Sections (Heading 2) 22 | 23 | From this point onwards, the sections and headers are generally freeform depending on the RFC. Sections are styled as "Heading 2". Try to organize your information into self-contained sections that answer some critical question, and organize your sections into an order that builds up knowledge necessary (rather than forcing a reader to jump around to gain context). 24 | 25 | Sections often are split further into sub-sections styled "Heading 3". These sub-sections just further help to organize data to ease reading and discussion. 26 | 27 | ### [Example] Implementation 28 | 29 | Many RFCs have an "implementation" section which details how the implementation will work. This section should explain the rough API changes (internal and external), package changes, etc. The goal is to give an idea to reviews about the subsystems that require change and the surface area of those changes. 30 | 31 | This knowledge can result in recommendations for alternate approaches that perhaps are idiomatic to the project or result in less packages touched. Or, it may result in the realization that the proposed solution in this RFC is too complex given the problem. 32 | 33 | For the RFC author, typing out the implementation in a high-level often serves as "rubber duck debugging" and you can catch a lot of issues or unknown unknowns prior to writing any real code. 34 | 35 | ### [Example] UX 36 | 37 | If there are user-impacting changes by this RFC, it is important to have a "UI/UX" section. User-impacting changes include external API changes, configuration format changes, CLI output changes, etc. 38 | 39 | This section is effectively the "implementation" section for the user experience. The goal is to explain the changes necessary, any impacts to backwards compatibility, any impacts to normal workflow, etc. 40 | 41 | As a reviewer, this section should be checked to see if the proposed changes feel like the project in question. For example, if the UX changes are proposing a flag "-foo_bar" but all our flags use hyphens like "-foo-bar", then that is a noteworthy review comment. Further, if the breaking changes are intolerable or there is a way to make a change while preserving compatibility, that should be explored. 42 | 43 | ### [Example] UI 44 | 45 | Will this RFC have implications for the web UI? If so, be sure to collaborate with a frontend engineer and/or product designer. They can add UI design assets (user flows, wireframes, mockups or prototypes) to this document, and if changes are substantial, they may wish to create a separate RFC to dive further into details on the UI changes. 46 | 47 | ## Style Notes 48 | 49 | All RFCs should follow similar styling and structure to ease reading. "Beautiful is better" is a core principle of ODPF and we care about the details. 50 | 51 | ### Heading Styles 52 | 53 | "Heading 2" should be used for section titles. We do not use "Heading 1" because aesthetically the text is too large. Google Docs will use Heading 2 as the outermost headers in the generated outline. 54 | 55 | "Heading 3" should be used for sub-sections. 56 | 57 | Further heading styles can be used for nested sections, however it is rare that a RFC goes beyond "Heading 4," and rare itself that "Heading 4" is reached. 58 | 59 | ### Lists 60 | 61 | When making lists, it is common to bold the first phrase/sentence/word to bring some category or point to attention. For example, a list of API considerations: 62 | 63 | - **Format** should be widgets 64 | - **Protocol** should be widgets-rpc 65 | - **Backwards** compatibility should be considered. 66 | 67 | ### Typeface 68 | 69 | Type size should use this template's default configuration (11pt for body text, larger for headings), and the type family should be Arial. No other typeface customization (eg color, highlight) should be made other than italics, bold, and underline. 70 | 71 | ### Code Samples 72 | 73 | Code samples should be indented (tab or spaces are fine as long as it is consistent) text using the Courier New font. Syntax highlighting can be included if possible but isn't necessary. Please ensure the highlighted syntax is the proper font size and using the font Courier New so non-highlighted samples don't appear out of place. 74 | 75 | CLI output samples are similar to code samples but should be highlighted with the color they'll output if it is known so that the RFC could also cover formatting as part of the user experience. 76 | 77 | func example() { 78 | <-make(chan struct{}) 79 | } 80 | 81 | Note: This RFC is heavily inspired from HashiCorp RFC template. 82 | -------------------------------------------------------------------------------- /rfcs/OIP-001-unified-urn.md: -------------------------------------------------------------------------------- 1 | # OIP-001 - Unifying URN format across tools 2 | 3 | URN or Uniform Resource Name is what we are using across our tools and libraries in ODPF. URN should not be ambiguous and only represent a single resource. That is why having a good URN format is crucial as it will prevent conflict or duplication of identifiers. 4 | 5 | The goal of this RFC is to decide what is a good and persistent URN format that can be used across our tools. 6 | 7 | ## Background 8 | 9 | ### What is a resource? 10 | 11 | Below are list of things that we consider as a resource: 12 | 13 | - Table (BigQuery, Postgres, MySQL, Elasticsearch, etc) 14 | - Topic (Kafka, RabbitMQ, etc) 15 | - Job (Firehose, Optimus, Dagger, etc) 16 | - Dashboard (Metabase, Tableau, etc) 17 | 18 | ### Current formats 19 | 20 | To understand the needs of this initiation better, let's take a look at how each of our tools generate their URN format. 21 | 22 | | Resource | Format | Example | 23 | | :------------------ | :------------------------------------------- | :------------------------------------------------ | 24 | | Meteor's RDBMS | `{service}::{host}/{database}/{table}` | `postgres::10.283.86.19:5432/user_db/user_role` | 25 | | Meteor's BigQuery | `{service}::{project}/{dataset}/{table}` | `bigquery::odpf-prod/datamart/daily_booking` | 26 | | Meteor's Metabase | `{service}::{host}/dashboards/{dashboardID}` | `metabase::my-metabase-server.com/dashboards/872` | 27 | | Shield's Resource | `{resource_type}/{namespace}/{resource_id}` | `r/namespace-id/resource-name` | 28 | | Guardian's BigQuery | `{resource_id}` | `metabase:293` | 29 | 30 | There are few things that we can improve here: 31 | 32 | - Using `{host}` as part of the URN will damage persistency (mostly on meteor's). Resource location should be allowed to change without causing its generated URN to be invalid. 33 | - Using `/` as a separator has a few issues: 34 | 1. When passing a resource URN as route parameter via `http` protocol, this URN will need to be encoded. 35 | 2. Even if it is encoded, some **services** or **proxies** may not be able to `route-match` properly. (e.g [gorillamux](https://github.com/gorilla/mux/issues/639) default behaviour) 36 | 37 | ### Limited resource referencing between tools 38 | 39 | Instead of each tools defining their own URN formats, it will be better, if possible, all tools or services have the same urn format when talking about a resource or asset. 40 | 41 | Since different tools are using different format, this would prevent resource referencing (or potentially sharing?) between tools without helps from an extra mapping layer (either by service or library). 42 | 43 | ## Requirements 44 | 45 | Our final unified URN should handle these cases: 46 | 47 | 1. Persist through change of resource location. (e.g. DB is moved to another server) 48 | 2. Can easily be used on URL without relying on services to handle the encoding/decoding. 49 | 3. Should be globally unique, or at least within an organization (AKAB / DKAB). 50 | 4. SAMPLE CASE: If we somehow have two different Metabases, we should be able to differentiate which metabase it is from URN without relying on `host`. 51 | 52 | ## Proposals 53 | 54 | **1.** `urn:{NID}:{NSS}:{project}:{kind}:{name}` - by [spy16](https://github.com/spy16) 55 | 56 | I highly recommend we stick to the IETF standard definition of URN from [RFC8141](https://datatracker.ietf.org/doc/html/rfc8141) (even if we take only a subset of it). 57 | 58 | [RFC 8141: Section 2](https://datatracker.ietf.org/doc/html/rfc8141#section-2) defines the syntax for URNs. 59 | 60 | 1. For all ODPF products, we can use `odpf` (or `ODPF`) as the [Namespace Identifier (NID)](https://datatracker.ietf.org/doc/html/rfc8141#section-2.1). 61 | 2. Every ODPF product should use the product name as [Namespace Specific String](https://datatracker.ietf.org/doc/html/rfc8141#section-2.2). For example, all resources managed by Entropy would have `entropy` as the NSS. 62 | 63 | NID and NSS combined forms the `assigned-name`: `urn::` --> `urn:odpf:entropy`. This assigned-name uniquely identifies every product within odpf. 64 | 65 | Optional components (which are defined by the entity that owns the NSS) can be appended to `assigned-name` to form resource-level identifiers. For example: `urn:odpf:siren:alert1` 66 | 67 | Optional components can have some generic restrictions that we follow. For example: 68 | 69 | - all optional components following the namespace should match the pattern `^[A-Za-z0-9-]+$`. 70 | - no components in the URN are allowed to have `/` character. 71 | - components must be ordered to match reducing scope (i.e., `urn` matches everything globally, `urn:odpf` matches everything within ODPF, `urn:odpf:entropy` matches everything within entropy product of odpf, `urn:odpf:entropy:project-foo` matches everything within `project-foo` and so on). 72 | 73 | With all these combined: The URN for a "resource" of kind "firehose" in project "foo" with name "f1" managed by "entropy" will be `urn:odpf:entropy:foo:firehose:f1` 74 | 75 | **2.** `{namespace}:{label}:{source}:{identifier}` - by [StewartJingga](https://github.com/StewartJingga) 76 | 77 | - **namespace** represents which org (or even environment) the resource belongs to. This is especially useful if you are maintaining resources from different organizations or entities. Example: `odpf`, `odpf-prod`. 78 | - **label** can be used in a case where for example you have two different postgres servers in a namespace. Label is used to differentiate those two, without labels, we can only use the server address. Example: `transaction_storage`, `optimus`, `main-database`, `production`. 79 | - **source** is the service/tool/storage that generate the resource. Example: `postgres`, `bigquery`, `metabase`, `kafka`. 80 | - **identifier** should be unique inside the `source`. The simplest approach is to just use the identifier generated by the `source` itself. In case of `metabase's collection`, we can use `collection:321` or `card:88` for representing a card. 81 | 82 | Examples 83 | 84 | - **metabase** - `odpf:main-dashboard:metabase:collection:321` 85 | - **bigquery** - `odpf:default:bigquery:myproject:mydataset:mytable` - default is used for an urn that does not require label for uniqueness 86 | - **postgres** - `odpf:stencil-integration:postgres:descriptors` - this is to represents a postgres table that is used by stencil integration 87 | - **elasticsearch** - `odpf-prod:compass:elasticsearch:index:table` - this is to represents an elasticsearch index that is used by compass in production 88 | - **hadoop** - `odpf:datalake:hadoop:index:table` - this is to represents a hadoop table that is being used as a datalake 89 | 90 | ## Accepted Proposal 91 | 92 | ### URN for internal services 93 | 94 | Format: `orn:{NSS}:{scope}:{kind}:{name}` 95 | Example: `orn:entropy:foo:firehose:f1` 96 | 97 | _Note: `orn` stands for `ODPF Resource Name`._ 98 | 99 | ### URN for external services 100 | 101 | Format: `urn:{source}:{scope}:{kind}:{identifier}` 102 | 103 | Examples: 104 | 105 | - **metabase** - `urn:metabase:main-metabase:collection:321` 106 | - **bigquery** - `urn:bigquery:p-godata-id:table:p-godata-id:mydataset.mytable` 107 | - **postgres** - `urn:postgres:stencil-integration:table:schemas` 108 | - **elasticsearch** - `urn:elasticsearch:compass-prod:index:random-index-name` 109 | - **hadoop** - `urn:hadoop:datalake:table:raw-table` 110 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /rfcs/OIP-002-alert-subscription-and-notification.md: -------------------------------------------------------------------------------- 1 | # OIP-002 - Alert Subscription and Notification 2 | 3 | Siren is a tool used and developed by the data platform team to manage observability, alerting rules, and notification channels. The alert subscription and notification in Siren is currently handled by the provider (Cortex). This RFC explains how we move the responsibility of alert subscription and notification from the provider to Siren. In current Siren version, the only providert that siren supports is CortexMetrics. In this RFC, we use CortexMetrics or Cortex as the example of provider. 4 | 5 | # Background 6 | 7 | Based on the [PRD](https://github.com/odpf/platform/discussions/15), these are the expected use cases that users are capable to: 8 | 9 | - Create alert policy/rules 10 | - Create SLO/SLI 11 | - Create incident 12 | - Send non-alert notification 13 | - Silence alert 14 | - Subscribe to an alert 15 | - Received notification 16 | - Get alert/policy & subscription changes 17 | - Get an incident 18 | 19 | From the use-cases mentioned above, we can summarize the whole flow of the requirements to be like this. To identify what problems to tackle, we could figure out the gap between the flow of our existing system vs the ideal flow. 20 | 21 | ```mermaid 22 | flowchart TD 23 | X([Start])-->A 24 | X-->H 25 | H[Notification Creation Request]-->|Request|C 26 | A[Source Signal] -->|Signal| B[Alert Generation & Silencing] 27 | B --> |Alert|C[Send Notification] 28 | B --> |Alert|E[Incident Generation] 29 | G[Incident Creation Request]-->|Request|E 30 | X-->G 31 | C -->|Notification|Z([End]) 32 | E --> |Incident|Z 33 | ``` 34 | 35 | **Source Signal** 36 | 37 | - This is the process where the telemetry signal is being sent to the system. 38 | - _Current state_ 39 | - One of Siren's providers is CortexMetrics. In some setup, Siren uses CortexMetrics backend and upstream services/jobs send metrics with prometheus to the CortexMetrics. 40 | - _Problems_ 41 | - Not much problem for this case. One small problem with this is the scattered responsibility between the provider and Siren. Siren does not have responsibility to consume observability data. The rationale of this decision previously was to not burdening Siren with incoming high traffic. 42 | - For now, we could keep this behavior and evaluate it. 43 | 44 | **Alert Generation & Silencing** 45 | 46 | - The incoming signal is being processed and based on a specific alert/policy and threshold, an alert with certain severity will be triggered. In case of there are silences for the alerts, alerts won't be triggered. 47 | - _Current state_ 48 | - We heavily rely on Cortex Alertmanager in this process. Alert generation (trigger) and silencing are being done in Cortex Alertmanager. Siren role is just proxying rules creation to the provider (Cortex Alertmanager). Siren provides templating and rules definition and siren rules are converted to Cortex Alertmanager rules. 49 | - Siren does not currently have capability to silence alerts. To silence alerts, one could call Cortex Alertmanager API or open Cortex Alertmanager UI and manually add silence there. 50 | - _Problems_ 51 | - Alert generation & silencing are being done on the provider side. Not a real big deal for now, but if we intend to support more providers, some providers might not have the same/similar capabilities. 52 | 53 | **Notification Creation Request** 54 | 55 | - This is the process of a request to generate a notification. 56 | - Current State 57 | - Siren is currently having an endpoint `/notify` but that is only designed to send notifications via slack. 58 | - Problems 59 | - Notify endpoint in siren is highly coupled to slack specific configuration. Siren might need to have abstraction on top of it. 60 | 61 | **Send Notification** 62 | 63 | - This is the process where a notification is being generated. The notification is then being sent to the subscribed receivers. 64 | - Current State 65 | - There are two places to generate notification in our system currently. The notification generation for manual trigger handled by siren but the notification generation based on alerts handled by Cortex Alertmanager. 66 | - Siren supports adding a receiver of notification (only for pagerduty and slack). The receiver could be used for alert subscription. User needs to create a subscription (inside it there are receiver id , receiver configuration, and match labels). Which receiver will be notified, that depends on the alert labels that match with match labels. However these processes of sending notifications to receivers are being done in Cortex Alertmanager. 67 | - Siren itself has the capability to manually send slack messages to individuals or groups. 68 | - Problems 69 | - Responsibility to generate notification is not centralized. 70 | - Cortex Alertmanager supports a limited number of receiver types. Relying on Cortex Alertmanager to generate notifications will block us from supporting more receiver types. 71 | - Alert subscription does match labels of the subscription vs alerts labels. If there are match labels, notifications will be triggered. 72 | 73 | **Incident Creation Request** 74 | 75 | - This is the process of a request to generate an incident. 76 | - This won't be covered in this RFC. 77 | - Current State 78 | - We don't have this feature as of now. 79 | 80 | **Incident Generation** 81 | 82 | - This is the process of generating incidents. An incident could be generated by manually creating it or converting it from Alert. 83 | - This won't be covered in this RFC. 84 | - Current State 85 | - We don't have this feature as of now. 86 | 87 | ## Problems 88 | 89 | Below is the existing architecture of Siren. Siren responsibility is only proxying cortex rules and alert config and sending notification directly to slack. Alerts generation, silencing and notifications are being done by CortexMetrics. 90 | 91 | ![Existing siren](images/oip2-siren-1.png) 92 | 93 | To subscribe an alert, user could register a new subscription by calling create subscription API with this data 94 | 95 | ```go 96 | type Subscription struct { 97 | ID uint64 `json:"id"` 98 | URN string `json:"urn"` 99 | Namespace uint64 `json:"namespace"` 100 | Receivers []Receiver `json:"receivers"` 101 | Match map[string]string `json:"match"` 102 | CreatedAt time.Time `json:"created_at"` 103 | UpdatedAt time.Time `json:"updated_at"` 104 | } 105 | ``` 106 | 107 | If labels in the triggered alert match with the match labels in subscriptions, receivers will get notifications. 108 | 109 | The problem with the existing Siren architecture are 110 | Users could only subscribe to alerts generated by CortexMetrics 111 | The notification channels (vendors) are limited to what CortexMetrics (provider) supports. 112 | 113 | From the requirements, what we want are 114 | Users could subscribe to any alerts or notifications 115 | We have more flexibility to support more notifications channels 116 | 117 | Therefore, there is a need to rethink Siren architecture to accommodate the requirements. 118 | 119 | # Proposal 120 | 121 | As we want to have flexibility to support more notification channels and capability to support subscriptions for generic use cases, there are several possible approaches to solve this. 122 | 123 | ## Abandoned Ideas 124 | 125 | **Proxy-to-provider** 126 | 127 | ![Proxy-to-provider approach](images/oip2-siren-2.png) 128 | 129 | This approach relies on the provider more. Cortex Alertmanager used to consume alerts only from Cortex ruler (the one that generates the alerts). Cortex ruler calls a Cortex Alertmanager's API to trigger notification. We could utilize the API to let Cortex send notification although the source was not coming from the Cortex ruler. 130 | 131 | Siren responsibility here is only proxying tasks to the provider. Notification flow is fully owned by the provider. When there is a manual notification triggered, Siren will transform the notification into Cortex Alert and call Alertmanager API. 132 | 133 | With this approach, there won't be much changes on subscription flow. Users could still use the existing flow to subscribe to an alert. 134 | 135 | - **Pros** 136 | - Relatively simpler to implement. 137 | - **Cons** 138 | - Notification channels that are supported will be limited to what provider supports. 139 | - Some providers might not have a capability to send notifications like Cortex Metrics/Prometheus. 140 | 141 | ## Preferred Approach 142 | 143 | **Notification-as-a-Service** 144 | 145 | ![Notification-as-a-service approach](images/oip2-siren-3.png) 146 | 147 | Cortex Alertmanager won't have responsibility to send notification to external vendors (slack, pager duty). The communication of Alert between Cortex and Siren is only being done through a webhook with this information in the body to `/v1beta1/alerts/cortex/{provider_id}`. 148 | 149 | ```json 150 | { 151 | "version": "4", 152 | "groupKey": , // key identifying the group of alerts (e.g. to deduplicate) 153 | "truncatedAlerts": , // how many alerts have been truncated due to "max_alerts" 154 | "status": "", 155 | "receiver": , 156 | "groupLabels": , 157 | "commonLabels": , 158 | "commonAnnotations": , 159 | "externalURL": , // backlink to the Alertmanager. 160 | "alerts": [ 161 | { 162 | "status": "", 163 | "labels": , 164 | "annotations": , 165 | "startsAt": "", 166 | "endsAt": "", 167 | "generatorURL": , // identifies the entity that caused the alert 168 | "fingerprint": // fingerprint to identify the alert 169 | }, 170 | ... 171 | ] 172 | } 173 | ``` 174 | 175 | Once an alert in webhook is received in Siren, Siren will forward it to a Notification Service that is responsible to trigger notification and to route the notification to the receivers. Each provider will have a different webhook API. 176 | 177 | Although the approach name is `Notification-as-a-service`, it doesn't mean we will create a new different service for it. The notification service could still be in Siren but better to have less logical coupling to all components in siren. Therefore, it is possible for the notification service to be done on a separate RFC. 178 | 179 | The flow of notification would be like this: 180 | 181 | ```mermaid 182 | flowchart LR 183 | A[Webhook API]-->|write|B[DB] 184 | A-->|publish|D[Notification Service] 185 | C[Notify API]-->|write|B 186 | C-->|publish|D[Notification Service] 187 | D-->|send message|E[Receiver] 188 | ``` 189 | 190 | This approach requires the changes to be backward compatible with our existing siren. Changes that we need are: 191 | 192 | 1. Create a new webhook API as an entry point for notifications that are only being called by Cortex for the new flow. The webhook `/v1beta1/alerts/cortex/{provider_id}` is still being used for alert history (for backward compatibility). 193 | 2. Update `/v1beta1/Notify` (if not being used) as an entry point of a manual trigger notification. 194 | 3. Create a new table to store triggered notifications (Not the notifications that we sent to specific channels, this part will be handled by notification service laters). This will store data that is triggered manually via /v1beta1/Notify and via new webhook API. 195 | 4. We could introduce a new apiVersion v3 for the rule template to indicate the rules should use new flow (won't trigger notification through alertmanager). Otherwise it is still using the old flow. 196 | We need to make the receiver part of the notification service and decouple it from the siren main flow. 197 | 5. Use subscriptions to wire alerts and notifications. We still could use labels to match labels, from alerts and subscriptions. For manual notifications, we could have a new label called `topic` to subscribe for a specific notification event. 198 | 199 | - **Pros** 200 | - Flexibility to add more notification channels. 201 | - Decoupling notification from alerting for scalability. 202 | - **Cons** 203 | - Require more complex development for Notifications service. 204 | -------------------------------------------------------------------------------- /rfcs/OIP-003-siren-as-notification-service.md: -------------------------------------------------------------------------------- 1 | # OIP-003 - Siren as Notification Service 2 | 3 | Siren is a tool used and developed by the data platform team to manage observability, alerting rules, and notification channels. The alert subscription and notification in Siren is currently handled by the provider (Cortex). Based on the previous [RFC](./OIP-002-alert-subscription-and-notification.md), we prefer for Siren to handle notification subscription and distribution. This RFC explains the high level of Notification Service that would be implemented. 4 | 5 | # Background 6 | 7 | **Architecture** 8 | 9 | Below is the existing architecture of Siren. Siren responsibility is only proxying cortex rules and alert config and sending notification directly to slack. Alerts generation, silencing and notifications are being done by CortexMetrics. 10 | 11 | ![Existing siren](images/oip2-siren-1.png) 12 | 13 | **Siren Domain Model** 14 | 15 | ![Siren domain model](images/oip3-1.png) 16 | 17 | **Subscriptions** 18 | 19 | To subscribe to an alert, users could register a new subscription by calling create subscription API with subscription model data mentioned above. The subscription data will be transformed to Alertmanager config and uploaded to Cortex. In Cortex, if labels in the triggered alert match with the match labels in subscriptions, receivers will get notifications. 20 | 21 | **Notifications** 22 | 23 | Notifications handled by Cortex Alertmanager, the number of supported receivers/notification channels are limited to what Cortex Alertmanager supports now. For siren, currently it supports slack and pagerduty notifications. 24 | 25 | **Alert History** 26 | 27 | Siren also utilizes Cortex webhook to send alerts notification to siren. Siren provides an API to be a webhook and all alerts are ingested to that webhook. Siren stores the notifications as alert history. 28 | 29 | ## Requirements 30 | 31 | The problem with the existing Siren architecture are: 32 | 33 | - Users could only subscribe to alerts generated by CortexMetrics. 34 | - The notification channels (vendors) are limited to what CortexMetrics (provider) supports. 35 | 36 | Meanwhile what we want are: 37 | 38 | - Users could subscribe to any alerts or notifications. 39 | - We have more flexibility to support more notifications channels. 40 | 41 | For the notifications, these are what we expect 42 | 43 | - User could subscribe to more than one receivers. 44 | - Each receiver in subscription could have a template. 45 | - Notification request should be idempotent. 46 | - Similar to alert, we could extend notification with notification silencing and grouping/batching feature. 47 | 48 | # Proposal 49 | 50 | The proposed architecture as part of the previous [RFC](./OIP-002-alert-subscription-and-notification.md) is like this. 51 | 52 | ![Notification-as-a-service approach](images/oip2-siren-3.png) 53 | 54 | This architecture expects Cortex/provider to only send alerts notifications to the siren webhook and the responsibility notification is taken care of by Siren. This RFC focuses on the implementation detail of the Notification Service. 55 | 56 | Here are the steps to generate notifications and its responsibility, we will discuss what is the preferred approach for each step. 57 | 58 | ![Notification steps](images/oip3-2.png) 59 | 60 | 1. Notification Source 61 | 62 | - Communicate to Notification Dispatcher to publish a notification. 63 | 64 | 2. Notification Dispatcher 65 | 66 | - Notification Model. 67 | - Match notification with subscribers. 68 | - For each subscriber, generate a notification message and publish to queue. 69 | - Resolve template and transform message to vendor-specific message. 70 | 71 | 3. Queue 72 | 73 | - Buffer of notification messages to reduce the pressure. 74 | 75 | 4. Notification Handler 76 | 77 | - Subscribe to queue for a message. 78 | - Send to the external vendor. 79 | 80 | ## Notification Source 81 | 82 | The current plan, there will be 2 possible sources: **Alert webhook from provider** and **Manually triggered API**. Whatever the source is, the model should be transformed into a single model called `Notification`. There should be idempotency handling in this step. The detail of how idempotency is being implemented will be discussed in another RFC later. 83 | 84 | ## Notification Dispatcher 85 | 86 | Notification dispatcher responsibilities are to generate notification messages to all subscribers. Since it is possible to have multiple receivers for a subscription, for every dispatched notification there will be one or more notification messages generated. Notification dispatcher sends messages asynchronously by publishing messages to a queue. Some decision about dispatcher needs to be discussed: 87 | 88 | 1. Subscription flow 89 | 2. Notification Model 90 | 3. Notification Message Model 91 | 92 | ### Subscription Flow 93 | 94 | ```go 95 | type Subscription struct { 96 | ID uint64 `json:"id"` 97 | URN string `json:"urn"` 98 | Namespace uint64 `json:"namespace"` 99 | Receivers []Receiver `json:"receivers"` 100 | Match map[string]string `json:"match"` 101 | CreatedAt time.Time `json:"created_at"` 102 | UpdatedAt time.Time `json:"updated_at"` 103 | } 104 | ``` 105 | 106 | Above is the details of the subscription model. The existing subscription will match labels to the kv-labels in the alerts. We could still keep this behavior in Siren. To know which subscription that should be notified, Siren could expect kv-labels in Notification model and kv-labels in subscriptions. For each matching, Siren fetches receivers and for each receiver, Siren generates a message. 107 | 108 | ```go 109 | var n notification.Notification 110 | .. 111 | receivers := subscription.GetReceiversByLabels(n.Labels) 112 | for _,rcv := range receivers { 113 | notificationMessage := n.BuildMessage(rcv) 114 | notification.Publish(notificationMessage) 115 | } 116 | ``` 117 | 118 | **Consideration** 119 | 120 | Need to figure out the best way to filter labels-set in Postgres 121 | Optimization could be done later by caching the match labels index of each subscription in-memory 122 | 123 | ### Notification Model & Notification Message Model 124 | 125 | The Notification Model contains information that should be sent to each receiver. Each receiver could have a specific requirement of a notification payload and we assume it is customizable. 126 | One of the features that we expect in notification is the message templating which will be resolved in Notification Dispatcher. Siren already had a templating feature, we could utilize that feature for this purpose. 127 | type Notification struct { 128 | ID uint64 129 | Variables map[string]string 130 | Labels map[string]string 131 | ExpiryDuration string 132 | CreatedAt time.Time 133 | } 134 | 135 | When registering a subscription of a notification, one could add a template key in the receiver. If no template key is found, the default template will be used. 136 | 137 | ```go 138 | type Subscription struct { 139 | ... 140 | Receivers: []receiver.Receiver{ 141 | { 142 | ID: 1, 143 | Configurations: map[string]interface{}{ 144 | "channel_name": "odpf-critical", 145 | "template": "alert-slack-details", 146 | }, 147 | } 148 | }, 149 | ... 150 | } 151 | ``` 152 | 153 | Notification Message is a Materialized View of Notification for a specific receiver type (vendor). It has delivery status FAILED/PUBLISHED to track. 154 | 155 | ```go 156 | type Message struct { 157 | ID uint64 158 | ReceiverType Receiver 159 | ReceiverConfigs map[string]interface{} 160 | Details map[string]interface{} 161 | Status string 162 | ExpiredAt time.Time 163 | CreatedAt time.Time 164 | UpdatedAt time.Time 165 | } 166 | ``` 167 | 168 | The dispatcher converts Notification to Notification Message. If the template in receiver config in subscriber is not empty, the dispatcher will resolve variables with the template and the rendered yaml text will be read to be converted to vendor-specific details struct. 169 | 170 | ![Resolving template](images/oip3-3.png) 171 | 172 | ## Dispatch Execution 173 | 174 | There are a lot of tasks that the dispatcher is doing. Considering this factore, there are 2 possible approaches to the execution. 175 | 176 | ### Abandoned Ideas 177 | 178 | **Asynchronous** 179 | 180 | ![Asynchronous Execution](images/oip3-4.png) 181 | 182 | This approach adds a buffer between notification source and dispatcher. What queue buffer to use will be similar with the queue that is being used in publishing the notification message. That will be discussed in the next section. 183 | 184 | - **Pros** 185 | - Could afford throughput relatively higher than synchronous execution. 186 | - **Cons** 187 | - More complex interaction. 188 | - Introduce an additional point of failure (queue). 189 | 190 | ### Preferred Approach 191 | 192 | **Synchronous** 193 | 194 | This is the simplest approach. Notification Sources just need to transform their model to a Notification model and call Notification Dispatcher function. 195 | 196 | ![Synchronous Execution](images/oip3-5.png) 197 | 198 | - **Pros** 199 | - Easier to implement (as simple as calling function). 200 | - **Cons** 201 | - Considering the relatively heavy tasks that are being done in Notification dispatcher (label matching, template rendering), Notification dispatcher will get more pressure if the rate of incoming notifications is higher than the rate of dispatching notification (e.g. for bulk notifications). 202 | 203 | ## Message Queue 204 | 205 | Considering the notification handler will interact with external parties, the interaction would be less reliable and not as rapid as invoking local functions or interacting within a local network. With this scenario, having a queue to buffer notification messages is needed since a Notification will also be transformed into one-or-multiple Notification Messages. There are a couple approaches possible to implement this message queue. 206 | 207 | ### Abandoned Ideas 208 | 209 | **Message Queue Infrastructure (Kafka, RabbitMQ, etc)** 210 | 211 | Although this approach seems trivial to choose, this approach is less preferred since it will add dependency to a new component and could make Siren less vendor-neutral. We might want to make Siren to have a capability to plug any Message Queuing system. But that sure won't be in this scope. 212 | 213 | - **Pros** 214 | - No need to implement logic to queue. 215 | - Could leverage features provided by the tool. 216 | - **Cons** 217 | - More infra to manage. 218 | 219 | **Redis-based Queue (e.g. gocraft/work)** 220 | 221 | There are several tools written in Go that could manage queues like gocraft/work. The tool provides out-of-the-box features that we need. It supports managing dead jobs, scheduling jobs, and retrying the dead jobs. 222 | 223 | - **Pros** 224 | - Out-of-the box features to leverage. 225 | Has a UI to check the jobs. 226 | - **Cons** 227 | - More infra to manage. 228 | 229 | ### Preferred Approach 230 | 231 | **PostgreSQL FOR UPDATE / SKIP LOCKED** 232 | 233 | We could leverage PostgreSQL to implement a queue with FOR UPDATE & SKIP LOCKED. A notification handler goroutine could be run periodically to fetch the rows and process the messages. With this we are sure that a message is only being picked up by one goroutine. With this approach, we could have a new table like this. 234 | 235 | ```sql 236 | CREATE TABLE message_queue 237 | ( 238 | id bigserial NOT NULL, 239 | status integer DEFAULT 0 NOT NULL, 240 | try_count integer DEFAULT 0 NOT NULL, 241 | max_tries integer DEFAULT 5 NOT NULL, 242 | receiver_type string NOT NULL, 243 | receiver_configs jsonb, 244 | details jsonb, 245 | expired_at timestamptz, 246 | created_at timestamptz DEFAULT CURRENT_TIMESTAMP NOT NULL, 247 | updated_at timestamptz, 248 | priority integer DEFAULT 0 NOT NULL 249 | ); 250 | ``` 251 | 252 | status could be 0 if unpublished, -1 if failed, and 1 if published 253 | priority is there just in case we want to have priority-based queue 254 | 255 | - **Pros** 256 | - No additional component (infra) required. 257 | Relatively easy to implement. 258 | - **Cons** 259 | - Queue table might be bloated after sometime (need to have periodic maintenance like vacuum or bloated table monitoring). 260 | 261 | ## Notification Handler 262 | 263 | Notification handler responsibilities is to send notification messages outbond. It should have knowledge about all external notification vendors' contracts. Notification handler consumes notification message and transforms it to vendor-specific message. To support at-least-once delivery, there is a need to have retry logic in notification handler (probably with exponential backoff) or a need to store dead messages in DLQ and retry them. 264 | 265 | For each notification message, the validity would depend on the `expired_at` field. The empty or null `expired_at` field would indicate the message won't be expired. When the failed-to-send notification message is being retried, the notification messages that exceed the validity won't be retried. 266 | 267 | The existing siren pre-define slack and pagerduty receivers details in an alertmanager config template file. For notification service approach, we will only keep webhook config for alertmanager and extract out slack and pagerduty templates. Each receiver type will expect a specific contract of notification and users could configure the contract with a yaml file generated by template if needed. For example, slack details yaml is possible to contain all supported slack chat.PostMessage payload. Notification templates always have a receiver_type key to be used for validation. 268 | 269 | ```yaml 270 | apiVersion: v2 271 | type: template 272 | name: alert-slack-details 273 | body: 274 | receiver_type: slack 275 | attachments: 276 | - text: '[[.text]]' 277 | icon_emoji: ':eagle:' 278 | link_names: false 279 | color: '[[.color]]' 280 | title: '[[.title]]' 281 | pretext: '[[.pretext]]' 282 | text: '[[.text]]' 283 | actions: 284 | - type: button 285 | text: 'Runbook :books:' 286 | url: '[[.runbook"]]' 287 | - type: button 288 | text: 'Dashboard :bar_chart:' 289 | url: '[[.dashboard"]]' 290 | variables: 291 | - name: color 292 | type: string 293 | description: slack color 294 | default: #2eb886 295 | - name: text 296 | type: string 297 | default: This is an alert 298 | - name: title 299 | type: string 300 | default: Alert 301 | - name: pretext 302 | type: string 303 | description: Pre-text of slack alert 304 | default: Siren 305 | - name: runbook 306 | type: string 307 | description: url to runbook 308 | default: http://url 309 | - name: dashboard 310 | type: string 311 | description: url to dashboard 312 | default: http://url 313 | tags: 314 | - slack 315 | ``` 316 | --------------------------------------------------------------------------------