├── rfcs
    ├── images
    │   ├── oip3-1.png
    │   ├── oip3-2.png
    │   ├── oip3-3.png
    │   ├── oip3-4.png
    │   ├── oip3-5.png
    │   ├── oip2-siren-1.png
    │   ├── oip2-siren-2.png
    │   └── oip2-siren-3.png
    ├── OIP-000-template.md
    ├── OIP-001-unified-urn.md
    ├── OIP-002-alert-subscription-and-notification.md
    └── OIP-003-siren-as-notification-service.md
├── .gitignore
├── README.md
├── .github
    └── ISSUE_TEMPLATE
    │   └── feature_request.md
├── roadmap.md
└── LICENSE


/rfcs/images/oip3-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raystack/platform/HEAD/rfcs/images/oip3-1.png


--------------------------------------------------------------------------------
/rfcs/images/oip3-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raystack/platform/HEAD/rfcs/images/oip3-2.png


--------------------------------------------------------------------------------
/rfcs/images/oip3-3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raystack/platform/HEAD/rfcs/images/oip3-3.png


--------------------------------------------------------------------------------
/rfcs/images/oip3-4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raystack/platform/HEAD/rfcs/images/oip3-4.png


--------------------------------------------------------------------------------
/rfcs/images/oip3-5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raystack/platform/HEAD/rfcs/images/oip3-5.png


--------------------------------------------------------------------------------
/rfcs/images/oip2-siren-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raystack/platform/HEAD/rfcs/images/oip2-siren-1.png


--------------------------------------------------------------------------------
/rfcs/images/oip2-siren-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raystack/platform/HEAD/rfcs/images/oip2-siren-2.png


--------------------------------------------------------------------------------
/rfcs/images/oip2-siren-3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raystack/platform/HEAD/rfcs/images/oip2-siren-3.png


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | # Binaries for programs and plugins
 2 | *.exe
 3 | *.exe~
 4 | *.dll
 5 | *.so
 6 | *.dylib
 7 | .DS_Store
 8 | 
 9 | # Test binary, built with `go test -c`
10 | *.test
11 | 
12 | # Output of the go coverage tool, specifically when used with LiteIDE
13 | *.out
14 | 
15 | # Dependency directories (remove the comment below to include it)
16 | # vendor/
17 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Platform
 2 | 
 3 | ODPF is the next-gen collaborative Open Data Platform to power data workflows.
 4 | 
 5 | This repository contains
 6 | 
 7 | - ODPF's [roadmap](https://github.com/orgs/odpf/projects/10)
 8 | - A place to raise [issues](https://github.com/odpf/platform/issues)
 9 | - Have [discussions](https://github.com/orgs/odpf/discussions), ask questions and more
10 | - Join us on [Slack](https://bit.ly/2RzPbtn)
11 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/feature_request.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Feature request
 3 | about: Suggest an idea for this project
 4 | title: ''
 5 | labels: ''
 6 | assignees: ''
 7 | 
 8 | ---
 9 | 
10 | **Summary**
11 | A clear and concise description of what the feature is. Ex. I'm always frustrated when [...]
12 | 
13 | **Intended outcome**
14 | A clear and concise description of what you want to happen.
15 | 
16 | **How will it work?**
17 | A clear and concise description of how this feature will work?
18 | 
19 | **Additional context**
20 | Add any other context or screenshots about the feature request here.
21 | 


--------------------------------------------------------------------------------
/roadmap.md:
--------------------------------------------------------------------------------
 1 | # Roadmap
 2 | 
 3 | ❇️ View the o[fficial ODPF public roadmap
 4 | 
 5 | Our product [roadmap](https://github.com/orgs/odpf/projects/10) is where you can learn about what features we're working on, what stage they're in, and when we expect to bring them to you. The platform repository is for communicating ODPF’s roadmap. Have any questions or comments about items on the roadmap? Share your feedback via [ODPF public feedback discussions](https://github.com/orgs/odpf/discussions).
 6 | 
 7 | ## Guide to the roadmap
 8 | 
 9 | Every item on the roadmap is an issue, with a label that indicates each of the following:
10 | 
11 | - A **status** that indicates the expected planned timeline for the release.
12 | - A **progress** that indicates the stage of development of the feature.
13 | - A **feature** that indicates the feature or product to which the item belongs.
14 | - One or more **product** labels that indicate which ODPF product we expect the feature to be available in.
15 | - Once a feature is delivered, the **released** label will be applied to the roadmap issue and the issue will be closed with a comment linking to the relevant changelog post.
16 | 
17 | ## Release phases
18 | 
19 | Release phases indicate the stages that the product or feature goes through, from early testing to general availability.
20 | 
21 | - **alpha:** _Primarily for testing and feedback_
22 | 
23 |   Features still under heavy development, and subject to change. Not for production use, and no documentation, or support provided.
24 | 
25 | - **beta:** _Released for public feedback_
26 | 
27 |   Features mostly complete and documented. Timeline and requirements for stable release usually published. Limited support provided.
28 | 
29 | - **stable:** _Released for production use_
30 | 
31 |   Ready for production use for all users Approximately 1-2 months from beta.
32 | 
33 | ## Roadmap stages
34 | 
35 | The roadmap is arranged on a project board to give a sense for how far out each item is on the horizon. Every product or feature is added to a particular project board column according to the quarter in which it is expected to ship next. Be sure to read the disclaimer below since the roadmap is subject to change, especially further out on the timeline.
36 | 
37 | ## Disclaimer
38 | 
39 | Any statement in this repository that is not purely historical is considered a forward-looking statement. Forward-looking statements included in this repository are based on information available to ODPF as of the date they are made, and ODPF assumes no obligation to update any forward-looking statements. The forward-looking product roadmap does not represent a commitment, guarantee, obligation or promise to deliver any product or feature, or to deliver any product and feature by any particular date, and is intended to outline the general development plans. Customers should not rely on this roadmap to make any decision.
40 | 


--------------------------------------------------------------------------------
/rfcs/OIP-000-template.md:
--------------------------------------------------------------------------------
 1 | # OIP-000 - RFC Template
 2 | 
 3 | The RFC begins with a brief overview. This section should be one or two paragraphs that just explains what the goal of this RFC is going to be, but without diving too deeply into the "why", "why now", "how", etc. Ensure anyone opening the document will form a clear understanding of the RFCs intent from reading this paragraph(s).
 4 | 
 5 | # Background
 6 | 
 7 | The next section is the "Background" section. This section should be at least two paragraphs and can take up to a whole page in some cases. The guiding goal of the background section is: as a newcomer to this project (new employee, team transfer), can I read the background section and follow any links to get the full context of why this change is necessary?
 8 | 
 9 | If you can't show a random engineer the background section and have them acquire nearly full context on the necessity for the RFC, then the background section is not full enough. To help achieve this, link to prior RFCs, discussions, and more here as necessary to provide context so you don't have to simply repeat yourself.
10 | 
11 | # Proposal
12 | 
13 | The next required section is "Proposal" or "Goal". Given the background above, this section proposes a solution. This should be an overview of the "how" for the solution, but for details further sections will be used.
14 | 
15 | ## Abandoned Ideas (Optional)
16 | 
17 | As RFCs evolve, it is common that there are ideas that are abandoned. Rather than simply deleting them from the document, you should try to organize them into sections that make it clear they're abandoned while explaining why they were abandoned.
18 | 
19 | When sharing your RFC with others or having someone look back on your RFC in the future, it is common to walk the same path and fall into the same pitfalls that we've since matured from. Abandoned ideas are a way to recognize that path and explain the pitfalls and why they were abandoned.
20 | 
21 | ## Sections (Heading 2)
22 | 
23 | From this point onwards, the sections and headers are generally freeform depending on the RFC. Sections are styled as "Heading 2". Try to organize your information into self-contained sections that answer some critical question, and organize your sections into an order that builds up knowledge necessary (rather than forcing a reader to jump around to gain context).
24 | 
25 | Sections often are split further into sub-sections styled "Heading 3". These sub-sections just further help to organize data to ease reading and discussion.
26 | 
27 | ### [Example] Implementation
28 | 
29 | Many RFCs have an "implementation" section which details how the implementation will work. This section should explain the rough API changes (internal and external), package changes, etc. The goal is to give an idea to reviews about the subsystems that require change and the surface area of those changes.
30 | 
31 | This knowledge can result in recommendations for alternate approaches that perhaps are idiomatic to the project or result in less packages touched. Or, it may result in the realization that the proposed solution in this RFC is too complex given the problem.
32 | 
33 | For the RFC author, typing out the implementation in a high-level often serves as "rubber duck debugging" and you can catch a lot of issues or unknown unknowns prior to writing any real code.
34 | 
35 | ### [Example] UX
36 | 
37 | If there are user-impacting changes by this RFC, it is important to have a "UI/UX" section. User-impacting changes include external API changes, configuration format changes, CLI output changes, etc.
38 | 
39 | This section is effectively the "implementation" section for the user experience. The goal is to explain the changes necessary, any impacts to backwards compatibility, any impacts to normal workflow, etc.
40 | 
41 | As a reviewer, this section should be checked to see if the proposed changes feel like the project in question. For example, if the UX changes are proposing a flag "-foo_bar" but all our flags use hyphens like "-foo-bar", then that is a noteworthy review comment. Further, if the breaking changes are intolerable or there is a way to make a change while preserving compatibility, that should be explored.
42 | 
43 | ### [Example] UI
44 | 
45 | Will this RFC have implications for the web UI? If so, be sure to collaborate with a frontend engineer and/or product designer. They can add UI design assets (user flows, wireframes, mockups or prototypes) to this document, and if changes are substantial, they may wish to create a separate RFC to dive further into details on the UI changes.
46 | 
47 | ## Style Notes
48 | 
49 | All RFCs should follow similar styling and structure to ease reading. "Beautiful is better" is a core principle of ODPF and we care about the details.
50 | 
51 | ### Heading Styles
52 | 
53 | "Heading 2" should be used for section titles. We do not use "Heading 1" because aesthetically the text is too large. Google Docs will use Heading 2 as the outermost headers in the generated outline.
54 | 
55 | "Heading 3" should be used for sub-sections.
56 | 
57 | Further heading styles can be used for nested sections, however it is rare that a RFC goes beyond "Heading 4," and rare itself that "Heading 4" is reached.
58 | 
59 | ### Lists
60 | 
61 | When making lists, it is common to bold the first phrase/sentence/word to bring some category or point to attention. For example, a list of API considerations:
62 | 
63 | - **Format** should be widgets
64 | - **Protocol** should be widgets-rpc
65 | - **Backwards** compatibility should be considered.
66 | 
67 | ### Typeface
68 | 
69 | Type size should use this template's default configuration (11pt for body text, larger for headings), and the type family should be Arial. No other typeface customization (eg color, highlight) should be made other than italics, bold, and underline.
70 | 
71 | ### Code Samples
72 | 
73 | Code samples should be indented (tab or spaces are fine as long as it is consistent) text using the Courier New font. Syntax highlighting can be included if possible but isn't necessary. Please ensure the highlighted syntax is the proper font size and using the font Courier New so non-highlighted samples don't appear out of place.
74 | 
75 | CLI output samples are similar to code samples but should be highlighted with the color they'll output if it is known so that the RFC could also cover formatting as part of the user experience.
76 | 
77 |     func example() {
78 |       <-make(chan struct{})
79 |     }
80 | 
81 | Note: This RFC is heavily inspired from HashiCorp RFC template.
82 | 


--------------------------------------------------------------------------------
/rfcs/OIP-001-unified-urn.md:
--------------------------------------------------------------------------------
  1 | # OIP-001 - Unifying URN format across tools
  2 | 
  3 | URN or Uniform Resource Name is what we are using across our tools and libraries in ODPF. URN should not be ambiguous and only represent a single resource. That is why having a good URN format is crucial as it will prevent conflict or duplication of identifiers.
  4 | 
  5 | The goal of this RFC is to decide what is a good and persistent URN format that can be used across our tools.
  6 | 
  7 | ## Background
  8 | 
  9 | ### What is a resource?
 10 | 
 11 | Below are list of things that we consider as a resource:
 12 | 
 13 | - Table (BigQuery, Postgres, MySQL, Elasticsearch, etc)
 14 | - Topic (Kafka, RabbitMQ, etc)
 15 | - Job (Firehose, Optimus, Dagger, etc)
 16 | - Dashboard (Metabase, Tableau, etc)
 17 | 
 18 | ### Current formats
 19 | 
 20 | To understand the needs of this initiation better, let's take a look at how each of our tools generate their URN format.
 21 | 
 22 | | Resource            | Format                                       | Example                                           |
 23 | | :------------------ | :------------------------------------------- | :------------------------------------------------ |
 24 | | Meteor's RDBMS      | `{service}::{host}/{database}/{table}`       | `postgres::10.283.86.19:5432/user_db/user_role`   |
 25 | | Meteor's BigQuery   | `{service}::{project}/{dataset}/{table}`     | `bigquery::odpf-prod/datamart/daily_booking`      |
 26 | | Meteor's Metabase   | `{service}::{host}/dashboards/{dashboardID}` | `metabase::my-metabase-server.com/dashboards/872` |
 27 | | Shield's Resource   | `{resource_type}/{namespace}/{resource_id}`  | `r/namespace-id/resource-name`                    |
 28 | | Guardian's BigQuery | `{resource_id}`                              | `metabase:293`                                    |
 29 | 
 30 | There are few things that we can improve here:
 31 | 
 32 | - Using `{host}` as part of the URN will damage persistency (mostly on meteor's). Resource location should be allowed to change without causing its generated URN to be invalid.
 33 | - Using `/` as a separator has a few issues:
 34 |   1. When passing a resource URN as route parameter via `http` protocol, this URN will need to be encoded.
 35 |   2. Even if it is encoded, some **services** or **proxies** may not be able to `route-match` properly. (e.g [gorillamux](https://github.com/gorilla/mux/issues/639) default behaviour)
 36 | 
 37 | ### Limited resource referencing between tools
 38 | 
 39 | Instead of each tools defining their own URN formats, it will be better, if possible, all tools or services have the same urn format when talking about a resource or asset.
 40 | 
 41 | Since different tools are using different format, this would prevent resource referencing (or potentially sharing?) between tools without helps from an extra mapping layer (either by service or library).
 42 | 
 43 | ## Requirements
 44 | 
 45 | Our final unified URN should handle these cases:
 46 | 
 47 | 1. Persist through change of resource location. (e.g. DB is moved to another server)
 48 | 2. Can easily be used on URL without relying on services to handle the encoding/decoding.
 49 | 3. Should be globally unique, or at least within an organization (AKAB / DKAB).
 50 | 4. SAMPLE CASE: If we somehow have two different Metabases, we should be able to differentiate which metabase it is from URN without relying on `host`.
 51 | 
 52 | ## Proposals
 53 | 
 54 | **1.** `urn:{NID}:{NSS}:{project}:{kind}:{name}` - by [spy16](https://github.com/spy16)
 55 | 
 56 | I highly recommend we stick to the IETF standard definition of URN from [RFC8141](https://datatracker.ietf.org/doc/html/rfc8141) (even if we take only a subset of it).
 57 | 
 58 | [RFC 8141: Section 2](https://datatracker.ietf.org/doc/html/rfc8141#section-2) defines the syntax for URNs.
 59 | 
 60 | 1. For all ODPF products, we can use `odpf` (or `ODPF`) as the [Namespace Identifier (NID)](https://datatracker.ietf.org/doc/html/rfc8141#section-2.1).
 61 | 2. Every ODPF product should use the product name as [Namespace Specific String](https://datatracker.ietf.org/doc/html/rfc8141#section-2.2). For example, all resources managed by Entropy would have `entropy` as the NSS.
 62 | 
 63 | NID and NSS combined forms the `assigned-name`: `urn:<NID>:<NSS>` --> `urn:odpf:entropy`. This assigned-name uniquely identifies every product within odpf.
 64 | 
 65 | Optional components (which are defined by the entity that owns the NSS) can be appended to `assigned-name` to form resource-level identifiers. For example: `urn:odpf:siren:alert1`
 66 | 
 67 | Optional components can have some generic restrictions that we follow. For example:
 68 | 
 69 | - all optional components following the namespace should match the pattern `^[A-Za-z0-9-]+$`.
 70 | - no components in the URN are allowed to have `/` character.
 71 | - components must be ordered to match reducing scope (i.e., `urn` matches everything globally, `urn:odpf` matches everything within ODPF, `urn:odpf:entropy` matches everything within entropy product of odpf, `urn:odpf:entropy:project-foo` matches everything within `project-foo` and so on).
 72 | 
 73 | With all these combined: The URN for a "resource" of kind "firehose" in project "foo" with name "f1" managed by "entropy" will be `urn:odpf:entropy:foo:firehose:f1`
 74 | 
 75 | **2.** `{namespace}:{label}:{source}:{identifier}` - by [StewartJingga](https://github.com/StewartJingga)
 76 | 
 77 | - **namespace** represents which org (or even environment) the resource belongs to. This is especially useful if you are maintaining resources from different organizations or entities. Example: `odpf`, `odpf-prod`.
 78 | - **label** can be used in a case where for example you have two different postgres servers in a namespace. Label is used to differentiate those two, without labels, we can only use the server address. Example: `transaction_storage`, `optimus`, `main-database`, `production`.
 79 | - **source** is the service/tool/storage that generate the resource. Example: `postgres`, `bigquery`, `metabase`, `kafka`.
 80 | - **identifier** should be unique inside the `source`. The simplest approach is to just use the identifier generated by the `source` itself. In case of `metabase's collection`, we can use `collection:321` or `card:88` for representing a card.
 81 | 
 82 | Examples
 83 | 
 84 | - **metabase** - `odpf:main-dashboard:metabase:collection:321`
 85 | - **bigquery** - `odpf:default:bigquery:myproject:mydataset:mytable` - default is used for an urn that does not require label for uniqueness
 86 | - **postgres** - `odpf:stencil-integration:postgres:descriptors` - this is to represents a postgres table that is used by stencil integration
 87 | - **elasticsearch** - `odpf-prod:compass:elasticsearch:index:table` - this is to represents an elasticsearch index that is used by compass in production
 88 | - **hadoop** - `odpf:datalake:hadoop:index:table` - this is to represents a hadoop table that is being used as a datalake
 89 | 
 90 | ## Accepted Proposal
 91 | 
 92 | ### URN for internal services
 93 | 
 94 | Format: `orn:{NSS}:{scope}:{kind}:{name}`
 95 | Example: `orn:entropy:foo:firehose:f1`
 96 | 
 97 | _Note: `orn` stands for `ODPF Resource Name`._
 98 | 
 99 | ### URN for external services
100 | 
101 | Format: `urn:{source}:{scope}:{kind}:{identifier}`
102 | 
103 | Examples:
104 | 
105 | - **metabase** - `urn:metabase:main-metabase:collection:321`
106 | - **bigquery** - `urn:bigquery:p-godata-id:table:p-godata-id:mydataset.mytable`
107 | - **postgres** - `urn:postgres:stencil-integration:table:schemas`
108 | - **elasticsearch** - `urn:elasticsearch:compass-prod:index:random-index-name`
109 | - **hadoop** - `urn:hadoop:datalake:table:raw-table`
110 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright [yyyy] [name of copyright owner]
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/rfcs/OIP-002-alert-subscription-and-notification.md:
--------------------------------------------------------------------------------
  1 | # OIP-002 - Alert Subscription and Notification
  2 | 
  3 | Siren is a tool used and developed by the data platform team to manage observability, alerting rules, and notification channels. The alert subscription and notification in Siren is currently handled by the provider (Cortex). This RFC explains how we move the responsibility of alert subscription and notification from the provider to Siren. In current Siren version, the only providert that siren supports is CortexMetrics. In this RFC, we use CortexMetrics or Cortex as the example of provider.
  4 | 
  5 | # Background
  6 | 
  7 | Based on the [PRD](https://github.com/odpf/platform/discussions/15), these are the expected use cases that users are capable to:
  8 | 
  9 | - Create alert policy/rules
 10 | - Create SLO/SLI
 11 | - Create incident
 12 | - Send non-alert notification
 13 | - Silence alert
 14 | - Subscribe to an alert
 15 | - Received notification
 16 | - Get alert/policy & subscription changes
 17 | - Get an incident
 18 | 
 19 | From the use-cases mentioned above, we can summarize the whole flow of the requirements to be like this. To identify what problems to tackle, we could figure out the gap between the flow of our existing system vs the ideal flow.
 20 | 
 21 | ```mermaid
 22 | flowchart TD
 23 |     X([Start])-->A
 24 |     X-->H
 25 |     H[Notification Creation Request]-->|Request|C
 26 |     A[Source Signal] -->|Signal| B[Alert Generation & Silencing]
 27 |     B --> |Alert|C[Send Notification]
 28 |     B --> |Alert|E[Incident Generation]
 29 |     G[Incident Creation Request]-->|Request|E
 30 |     X-->G
 31 |     C -->|Notification|Z([End])
 32 |     E --> |Incident|Z
 33 | ```
 34 | 
 35 | **Source Signal**
 36 | 
 37 | - This is the process where the telemetry signal is being sent to the system.
 38 | - _Current state_
 39 |   - One of Siren's providers is CortexMetrics. In some setup, Siren uses CortexMetrics backend and upstream services/jobs send metrics with prometheus to the CortexMetrics.
 40 | - _Problems_
 41 |   - Not much problem for this case. One small problem with this is the scattered responsibility between the provider and Siren. Siren does not have responsibility to consume observability data. The rationale of this decision previously was to not burdening Siren with incoming high traffic.
 42 |   - For now, we could keep this behavior and evaluate it.
 43 | 
 44 | **Alert Generation & Silencing**
 45 | 
 46 | - The incoming signal is being processed and based on a specific alert/policy and threshold, an alert with certain severity will be triggered. In case of there are silences for the alerts, alerts won't be triggered.
 47 | - _Current state_
 48 |   - We heavily rely on Cortex Alertmanager in this process. Alert generation (trigger) and silencing are being done in Cortex Alertmanager. Siren role is just proxying rules creation to the provider (Cortex Alertmanager). Siren provides templating and rules definition and siren rules are converted to Cortex Alertmanager rules.
 49 |   - Siren does not currently have capability to silence alerts. To silence alerts, one could call Cortex Alertmanager API or open Cortex Alertmanager UI and manually add silence there.
 50 | - _Problems_
 51 |   - Alert generation & silencing are being done on the provider side. Not a real big deal for now, but if we intend to support more providers, some providers might not have the same/similar capabilities.
 52 | 
 53 | **Notification Creation Request**
 54 | 
 55 | - This is the process of a request to generate a notification.
 56 | - Current State
 57 |   - Siren is currently having an endpoint `/notify` but that is only designed to send notifications via slack.
 58 | - Problems
 59 |   - Notify endpoint in siren is highly coupled to slack specific configuration. Siren might need to have abstraction on top of it.
 60 | 
 61 | **Send Notification**
 62 | 
 63 | - This is the process where a notification is being generated. The notification is then being sent to the subscribed receivers.
 64 | - Current State
 65 |   - There are two places to generate notification in our system currently. The notification generation for manual trigger handled by siren but the notification generation based on alerts handled by Cortex Alertmanager.
 66 |   - Siren supports adding a receiver of notification (only for pagerduty and slack). The receiver could be used for alert subscription. User needs to create a subscription (inside it there are receiver id , receiver configuration, and match labels). Which receiver will be notified, that depends on the alert labels that match with match labels. However these processes of sending notifications to receivers are being done in Cortex Alertmanager.
 67 |   - Siren itself has the capability to manually send slack messages to individuals or groups.
 68 | - Problems
 69 |   - Responsibility to generate notification is not centralized.
 70 |   - Cortex Alertmanager supports a limited number of receiver types. Relying on Cortex Alertmanager to generate notifications will block us from supporting more receiver types.
 71 |   - Alert subscription does match labels of the subscription vs alerts labels. If there are match labels, notifications will be triggered.
 72 | 
 73 | **Incident Creation Request**
 74 | 
 75 | - This is the process of a request to generate an incident.
 76 | - This won't be covered in this RFC.
 77 | - Current State
 78 |   - We don't have this feature as of now.
 79 | 
 80 | **Incident Generation**
 81 | 
 82 | - This is the process of generating incidents. An incident could be generated by manually creating it or converting it from Alert.
 83 | - This won't be covered in this RFC.
 84 | - Current State
 85 |   - We don't have this feature as of now.
 86 | 
 87 | ## Problems
 88 | 
 89 | Below is the existing architecture of Siren. Siren responsibility is only proxying cortex rules and alert config and sending notification directly to slack. Alerts generation, silencing and notifications are being done by CortexMetrics.
 90 | 
 91 | ![Existing siren](images/oip2-siren-1.png)
 92 | 
 93 | To subscribe an alert, user could register a new subscription by calling create subscription API with this data
 94 | 
 95 | ```go
 96 | type Subscription struct {
 97 | 	ID        uint64            `json:"id"`
 98 | 	URN       string            `json:"urn"`
 99 | 	Namespace uint64            `json:"namespace"`
100 | 	Receivers []Receiver        `json:"receivers"`
101 | 	Match     map[string]string `json:"match"`
102 | 	CreatedAt time.Time         `json:"created_at"`
103 | 	UpdatedAt time.Time         `json:"updated_at"`
104 | }
105 | ```
106 | 
107 | If labels in the triggered alert match with the match labels in subscriptions, receivers will get notifications.
108 | 
109 | The problem with the existing Siren architecture are
110 | Users could only subscribe to alerts generated by CortexMetrics
111 | The notification channels (vendors) are limited to what CortexMetrics (provider) supports.
112 | 
113 | From the requirements, what we want are
114 | Users could subscribe to any alerts or notifications
115 | We have more flexibility to support more notifications channels
116 | 
117 | Therefore, there is a need to rethink Siren architecture to accommodate the requirements.
118 | 
119 | # Proposal
120 | 
121 | As we want to have flexibility to support more notification channels and capability to support subscriptions for generic use cases, there are several possible approaches to solve this.
122 | 
123 | ## Abandoned Ideas
124 | 
125 | **Proxy-to-provider**
126 | 
127 | ![Proxy-to-provider approach](images/oip2-siren-2.png)
128 | 
129 | This approach relies on the provider more. Cortex Alertmanager used to consume alerts only from Cortex ruler (the one that generates the alerts). Cortex ruler calls a Cortex Alertmanager's API to trigger notification. We could utilize the API to let Cortex send notification although the source was not coming from the Cortex ruler.
130 | 
131 | Siren responsibility here is only proxying tasks to the provider. Notification flow is fully owned by the provider. When there is a manual notification triggered, Siren will transform the notification into Cortex Alert and call Alertmanager API.
132 | 
133 | With this approach, there won't be much changes on subscription flow. Users could still use the existing flow to subscribe to an alert.
134 | 
135 | - **Pros**
136 |   - Relatively simpler to implement.
137 | - **Cons**
138 |   - Notification channels that are supported will be limited to what provider supports.
139 |   - Some providers might not have a capability to send notifications like Cortex Metrics/Prometheus.
140 | 
141 | ## Preferred Approach
142 | 
143 | **Notification-as-a-Service**
144 | 
145 | ![Notification-as-a-service approach](images/oip2-siren-3.png)
146 | 
147 | Cortex Alertmanager won't have responsibility to send notification to external vendors (slack, pager duty). The communication of Alert between Cortex and Siren is only being done through a webhook with this information in the body to `/v1beta1/alerts/cortex/{provider_id}`.
148 | 
149 | ```json
150 | {
151 |   "version": "4",
152 |   "groupKey": <string>,              // key identifying the group of alerts (e.g. to deduplicate)
153 |   "truncatedAlerts": <int>,          // how many alerts have been truncated due to "max_alerts"
154 |   "status": "<resolved|firing>",
155 |   "receiver": <string>,
156 |   "groupLabels": <object>,
157 |   "commonLabels": <object>,
158 |   "commonAnnotations": <object>,
159 |   "externalURL": <string>,           // backlink to the Alertmanager.
160 |   "alerts": [
161 |     {
162 |       "status": "<resolved|firing>",
163 |       "labels": <object>,
164 |       "annotations": <object>,
165 |       "startsAt": "<rfc3339>",
166 |       "endsAt": "<rfc3339>",
167 |       "generatorURL": <string>,      // identifies the entity that caused the alert
168 |       "fingerprint": <string>        // fingerprint to identify the alert
169 |     },
170 |     ...
171 |   ]
172 | }
173 | ```
174 | 
175 | Once an alert in webhook is received in Siren, Siren will forward it to a Notification Service that is responsible to trigger notification and to route the notification to the receivers. Each provider will have a different webhook API.
176 | 
177 | Although the approach name is `Notification-as-a-service`, it doesn't mean we will create a new different service for it. The notification service could still be in Siren but better to have less logical coupling to all components in siren. Therefore, it is possible for the notification service to be done on a separate RFC.
178 | 
179 | The flow of notification would be like this:
180 | 
181 | ```mermaid
182 | flowchart LR
183 |     A[Webhook API]-->|write|B[DB]
184 |     A-->|publish|D[Notification Service]
185 |     C[Notify API]-->|write|B
186 |     C-->|publish|D[Notification Service]
187 |     D-->|send message|E[Receiver]
188 | ```
189 | 
190 | This approach requires the changes to be backward compatible with our existing siren. Changes that we need are:
191 | 
192 | 1. Create a new webhook API as an entry point for notifications that are only being called by Cortex for the new flow. The webhook `/v1beta1/alerts/cortex/{provider_id}` is still being used for alert history (for backward compatibility).
193 | 2. Update `/v1beta1/Notify` (if not being used) as an entry point of a manual trigger notification.
194 | 3. Create a new table to store triggered notifications (Not the notifications that we sent to specific channels, this part will be handled by notification service laters). This will store data that is triggered manually via /v1beta1/Notify and via new webhook API.
195 | 4. We could introduce a new apiVersion v3 for the rule template to indicate the rules should use new flow (won't trigger notification through alertmanager). Otherwise it is still using the old flow.
196 |    We need to make the receiver part of the notification service and decouple it from the siren main flow.
197 | 5. Use subscriptions to wire alerts and notifications. We still could use labels to match labels, from alerts and subscriptions. For manual notifications, we could have a new label called `topic` to subscribe for a specific notification event.
198 | 
199 | - **Pros**
200 |   - Flexibility to add more notification channels.
201 |   - Decoupling notification from alerting for scalability.
202 | - **Cons**
203 |   - Require more complex development for Notifications service.
204 | 


--------------------------------------------------------------------------------
/rfcs/OIP-003-siren-as-notification-service.md:
--------------------------------------------------------------------------------
  1 | # OIP-003 - Siren as Notification Service
  2 | 
  3 | Siren is a tool used and developed by the data platform team to manage observability, alerting rules, and notification channels. The alert subscription and notification in Siren is currently handled by the provider (Cortex). Based on the previous [RFC](./OIP-002-alert-subscription-and-notification.md), we prefer for Siren to handle notification subscription and distribution. This RFC explains the high level of Notification Service that would be implemented.
  4 | 
  5 | # Background
  6 | 
  7 | **Architecture**
  8 | 
  9 | Below is the existing architecture of Siren. Siren responsibility is only proxying cortex rules and alert config and sending notification directly to slack. Alerts generation, silencing and notifications are being done by CortexMetrics.
 10 | 
 11 | ![Existing siren](images/oip2-siren-1.png)
 12 | 
 13 | **Siren Domain Model**
 14 | 
 15 | ![Siren domain model](images/oip3-1.png)
 16 | 
 17 | **Subscriptions**
 18 | 
 19 | To subscribe to an alert, users could register a new subscription by calling create subscription API with subscription model data mentioned above. The subscription data will be transformed to Alertmanager config and uploaded to Cortex. In Cortex, if labels in the triggered alert match with the match labels in subscriptions, receivers will get notifications.
 20 | 
 21 | **Notifications**
 22 | 
 23 | Notifications handled by Cortex Alertmanager, the number of supported receivers/notification channels are limited to what Cortex Alertmanager supports now. For siren, currently it supports slack and pagerduty notifications.
 24 | 
 25 | **Alert History**
 26 | 
 27 | Siren also utilizes Cortex webhook to send alerts notification to siren. Siren provides an API to be a webhook and all alerts are ingested to that webhook. Siren stores the notifications as alert history.
 28 | 
 29 | ## Requirements
 30 | 
 31 | The problem with the existing Siren architecture are:
 32 | 
 33 | - Users could only subscribe to alerts generated by CortexMetrics.
 34 | - The notification channels (vendors) are limited to what CortexMetrics (provider) supports.
 35 | 
 36 | Meanwhile what we want are:
 37 | 
 38 | - Users could subscribe to any alerts or notifications.
 39 | - We have more flexibility to support more notifications channels.
 40 | 
 41 | For the notifications, these are what we expect
 42 | 
 43 | - User could subscribe to more than one receivers.
 44 | - Each receiver in subscription could have a template.
 45 | - Notification request should be idempotent.
 46 | - Similar to alert, we could extend notification with notification silencing and grouping/batching feature.
 47 | 
 48 | # Proposal
 49 | 
 50 | The proposed architecture as part of the previous [RFC](./OIP-002-alert-subscription-and-notification.md) is like this.
 51 | 
 52 | ![Notification-as-a-service approach](images/oip2-siren-3.png)
 53 | 
 54 | This architecture expects Cortex/provider to only send alerts notifications to the siren webhook and the responsibility notification is taken care of by Siren. This RFC focuses on the implementation detail of the Notification Service.
 55 | 
 56 | Here are the steps to generate notifications and its responsibility, we will discuss what is the preferred approach for each step.
 57 | 
 58 | ![Notification steps](images/oip3-2.png)
 59 | 
 60 | 1. Notification Source
 61 | 
 62 |    - Communicate to Notification Dispatcher to publish a notification.
 63 | 
 64 | 2. Notification Dispatcher
 65 | 
 66 |    - Notification Model.
 67 |    - Match notification with subscribers.
 68 |    - For each subscriber, generate a notification message and publish to queue.
 69 |    - Resolve template and transform message to vendor-specific message.
 70 | 
 71 | 3. Queue
 72 | 
 73 |    - Buffer of notification messages to reduce the pressure.
 74 | 
 75 | 4. Notification Handler
 76 | 
 77 |    - Subscribe to queue for a message.
 78 |    - Send to the external vendor.
 79 | 
 80 | ## Notification Source
 81 | 
 82 | The current plan, there will be 2 possible sources: **Alert webhook from provider** and **Manually triggered API**. Whatever the source is, the model should be transformed into a single model called `Notification`. There should be idempotency handling in this step. The detail of how idempotency is being implemented will be discussed in another RFC later.
 83 | 
 84 | ## Notification Dispatcher
 85 | 
 86 | Notification dispatcher responsibilities are to generate notification messages to all subscribers. Since it is possible to have multiple receivers for a subscription, for every dispatched notification there will be one or more notification messages generated. Notification dispatcher sends messages asynchronously by publishing messages to a queue. Some decision about dispatcher needs to be discussed:
 87 | 
 88 | 1. Subscription flow
 89 | 2. Notification Model
 90 | 3. Notification Message Model
 91 | 
 92 | ### Subscription Flow
 93 | 
 94 | ```go
 95 | type Subscription struct {
 96 | 	ID        uint64            `json:"id"`
 97 | 	URN       string            `json:"urn"`
 98 | 	Namespace uint64            `json:"namespace"`
 99 | 	Receivers []Receiver        `json:"receivers"`
100 | 	Match     map[string]string `json:"match"`
101 | 	CreatedAt time.Time         `json:"created_at"`
102 | 	UpdatedAt time.Time         `json:"updated_at"`
103 | }
104 | ```
105 | 
106 | Above is the details of the subscription model. The existing subscription will match labels to the kv-labels in the alerts. We could still keep this behavior in Siren. To know which subscription that should be notified, Siren could expect kv-labels in Notification model and kv-labels in subscriptions. For each matching, Siren fetches receivers and for each receiver, Siren generates a message.
107 | 
108 | ```go
109 | var n notification.Notification
110 | ..
111 | receivers := subscription.GetReceiversByLabels(n.Labels)
112 | for _,rcv := range receivers {
113 |     notificationMessage := n.BuildMessage(rcv)
114 |     notification.Publish(notificationMessage)
115 | }
116 | ```
117 | 
118 | **Consideration**
119 | 
120 | Need to figure out the best way to filter labels-set in Postgres
121 | Optimization could be done later by caching the match labels index of each subscription in-memory
122 | 
123 | ### Notification Model & Notification Message Model
124 | 
125 | The Notification Model contains information that should be sent to each receiver. Each receiver could have a specific requirement of a notification payload and we assume it is customizable.
126 | One of the features that we expect in notification is the message templating which will be resolved in Notification Dispatcher. Siren already had a templating feature, we could utilize that feature for this purpose.
127 | type Notification struct {
128 | ID uint64
129 | Variables map[string]string
130 | Labels map[string]string
131 | ExpiryDuration string
132 | CreatedAt time.Time
133 | }
134 | 
135 | When registering a subscription of a notification, one could add a template key in the receiver. If no template key is found, the default template will be used.
136 | 
137 | ```go
138 | type Subscription struct {
139 |     ...
140 |     Receivers: []receiver.Receiver{
141 |       {
142 |         ID: 1,
143 |         Configurations: map[string]interface{}{
144 |           "channel_name": "odpf-critical",
145 |           "template": "alert-slack-details",
146 |         },
147 |       }
148 |     },
149 |     ...
150 | }
151 | ```
152 | 
153 | Notification Message is a Materialized View of Notification for a specific receiver type (vendor). It has delivery status FAILED/PUBLISHED to track.
154 | 
155 | ```go
156 | type Message struct {
157 | 	ID              uint64
158 | 	ReceiverType    Receiver
159 | 	ReceiverConfigs map[string]interface{}
160 | 	Details         map[string]interface{}
161 |       Status          string
162 |       ExpiredAt       time.Time
163 | 	CreatedAt       time.Time
164 | 	UpdatedAt       time.Time
165 | }
166 | ```
167 | 
168 | The dispatcher converts Notification to Notification Message. If the template in receiver config in subscriber is not empty, the dispatcher will resolve variables with the template and the rendered yaml text will be read to be converted to vendor-specific details struct.
169 | 
170 | ![Resolving template](images/oip3-3.png)
171 | 
172 | ## Dispatch Execution
173 | 
174 | There are a lot of tasks that the dispatcher is doing. Considering this factore, there are 2 possible approaches to the execution.
175 | 
176 | ### Abandoned Ideas
177 | 
178 | **Asynchronous**
179 | 
180 | ![Asynchronous Execution](images/oip3-4.png)
181 | 
182 | This approach adds a buffer between notification source and dispatcher. What queue buffer to use will be similar with the queue that is being used in publishing the notification message. That will be discussed in the next section.
183 | 
184 | - **Pros**
185 |   - Could afford throughput relatively higher than synchronous execution.
186 | - **Cons**
187 |   - More complex interaction.
188 |   - Introduce an additional point of failure (queue).
189 | 
190 | ### Preferred Approach
191 | 
192 | **Synchronous**
193 | 
194 | This is the simplest approach. Notification Sources just need to transform their model to a Notification model and call Notification Dispatcher function.
195 | 
196 | ![Synchronous Execution](images/oip3-5.png)
197 | 
198 | - **Pros**
199 |   - Easier to implement (as simple as calling function).
200 | - **Cons**
201 |   - Considering the relatively heavy tasks that are being done in Notification dispatcher (label matching, template rendering), Notification dispatcher will get more pressure if the rate of incoming notifications is higher than the rate of dispatching notification (e.g. for bulk notifications).
202 | 
203 | ## Message Queue
204 | 
205 | Considering the notification handler will interact with external parties, the interaction would be less reliable and not as rapid as invoking local functions or interacting within a local network. With this scenario, having a queue to buffer notification messages is needed since a Notification will also be transformed into one-or-multiple Notification Messages. There are a couple approaches possible to implement this message queue.
206 | 
207 | ### Abandoned Ideas
208 | 
209 | **Message Queue Infrastructure (Kafka, RabbitMQ, etc)**
210 | 
211 | Although this approach seems trivial to choose, this approach is less preferred since it will add dependency to a new component and could make Siren less vendor-neutral. We might want to make Siren to have a capability to plug any Message Queuing system. But that sure won't be in this scope.
212 | 
213 | - **Pros**
214 |   - No need to implement logic to queue.
215 |   - Could leverage features provided by the tool.
216 | - **Cons**
217 |   - More infra to manage.
218 | 
219 | **Redis-based Queue (e.g. gocraft/work)**
220 | 
221 | There are several tools written in Go that could manage queues like gocraft/work. The tool provides out-of-the-box features that we need. It supports managing dead jobs, scheduling jobs, and retrying the dead jobs.
222 | 
223 | - **Pros**
224 |   - Out-of-the box features to leverage.
225 |     Has a UI to check the jobs.
226 | - **Cons**
227 |   - More infra to manage.
228 | 
229 | ### Preferred Approach
230 | 
231 | **PostgreSQL FOR UPDATE / SKIP LOCKED**
232 | 
233 | We could leverage PostgreSQL to implement a queue with FOR UPDATE & SKIP LOCKED. A notification handler goroutine could be run periodically to fetch the rows and process the messages. With this we are sure that a message is only being picked up by one goroutine. With this approach, we could have a new table like this.
234 | 
235 | ```sql
236 | CREATE TABLE message_queue
237 | (
238 |    id               bigserial NOT NULL,
239 |    status           integer DEFAULT 0 NOT NULL,
240 |    try_count        integer DEFAULT 0 NOT NULL,
241 |    max_tries        integer DEFAULT 5 NOT NULL,
242 |    receiver_type    string NOT NULL,
243 |    receiver_configs jsonb,
244 |    details          jsonb,
245 |    expired_at       timestamptz,
246 |    created_at       timestamptz DEFAULT CURRENT_TIMESTAMP NOT NULL,
247 |    updated_at       timestamptz,
248 |    priority         integer DEFAULT 0 NOT NULL
249 | );
250 | ```
251 | 
252 | status could be 0 if unpublished, -1 if failed, and 1 if published
253 | priority is there just in case we want to have priority-based queue
254 | 
255 | - **Pros**
256 |   - No additional component (infra) required.
257 |     Relatively easy to implement.
258 | - **Cons**
259 |   - Queue table might be bloated after sometime (need to have periodic maintenance like vacuum or bloated table monitoring).
260 | 
261 | ## Notification Handler
262 | 
263 | Notification handler responsibilities is to send notification messages outbond. It should have knowledge about all external notification vendors' contracts. Notification handler consumes notification message and transforms it to vendor-specific message. To support at-least-once delivery, there is a need to have retry logic in notification handler (probably with exponential backoff) or a need to store dead messages in DLQ and retry them.
264 | 
265 | For each notification message, the validity would depend on the `expired_at` field. The empty or null `expired_at` field would indicate the message won't be expired. When the failed-to-send notification message is being retried, the notification messages that exceed the validity won't be retried.
266 | 
267 | The existing siren pre-define slack and pagerduty receivers details in an alertmanager config template file. For notification service approach, we will only keep webhook config for alertmanager and extract out slack and pagerduty templates. Each receiver type will expect a specific contract of notification and users could configure the contract with a yaml file generated by template if needed. For example, slack details yaml is possible to contain all supported slack chat.PostMessage payload. Notification templates always have a receiver_type key to be used for validation.
268 | 
269 | ```yaml
270 | apiVersion: v2
271 | type: template
272 | name: alert-slack-details
273 | body:
274 |   receiver_type: slack
275 |   attachments:
276 |     - text: '[[.text]]'
277 |       icon_emoji: ':eagle:'
278 |       link_names: false
279 |       color: '[[.color]]'
280 |       title: '[[.title]]'
281 |       pretext: '[[.pretext]]'
282 |       text: '[[.text]]'
283 |       actions:
284 |         - type: button
285 |           text: 'Runbook :books:'
286 |           url: '[[.runbook"]]'
287 |         - type: button
288 |           text: 'Dashboard :bar_chart:'
289 |           url: '[[.dashboard"]]'
290 | variables:
291 |   - name: color
292 |     type: string
293 |     description: slack color
294 |     default: #2eb886
295 |   - name: text
296 |     type: string
297 |     default: This is an alert
298 |   - name: title
299 |     type: string
300 |     default: Alert
301 |   - name: pretext
302 |     type: string
303 |     description: Pre-text of slack alert
304 |     default: Siren
305 |   - name: runbook
306 |     type: string
307 |     description: url to runbook
308 |     default: http://url
309 |   - name: dashboard
310 |     type: string
311 |     description: url to dashboard
312 |     default: http://url
313 | tags:
314 |   - slack
315 | ```
316 | 


--------------------------------------------------------------------------------