├── rfc-139
    ├── fb-h3.png
    ├── youtube-h3.png
    ├── google-search-h3.png
    ├── visual-filmstrip.png
    ├── visual-progress.png
    ├── http3-browser-usage.png
    ├── 5_H2_2files_multiplexed.png
    └── QUIC-illustration-final.jpg
├── rfc-115
    ├── clean-connections.png
    ├── connection-view.png
    ├── h2-dns-annotated.png
    ├── cold-cache-summary.png
    ├── the-impact-annotated.png
    ├── connection-view-annotated.png
    └── underutilised-connections-annotated.png
├── rfc-138
    ├── total-bytes-search.png
    ├── brotli-usage-caniuse.png
    ├── total-bytes-homepage.png
    ├── visual-progress-covid.png
    ├── visual-progress-search.png
    ├── total-bytes-coronavirus.png
    ├── visual-progress-homepage.png
    ├── total-bytes-bank-holidays.png
    └── visual-progress-bank-holidays.png
├── rfc-171
    ├── ie_usage_last_year.png
    └── ie_usage_last_three_years.png
├── rfc-172
    └── pgexplain-query-plan-visualisation.png
├── .github
    └── workflows
    │   └── actionlint.yml
├── rfc-000-template.md
├── rfc-066-use-github-for-rfcs.md
├── rfc-047-add-another-timestamp-to-content-items.md
├── rfc-045-consolidate-sidekiq-usage-into-shared-gem.md
├── rfc-046-break-ability-to-replay-publishing-api-event-log.md
├── rfc-085-special-route-publisher.md
├── rfc-053-terminology-for-migration-progress.md
├── rfc-028-keeping-gov-uk-s-software-current.md
├── rfc-067-tagging-to-organisations.md
├── rfc-103-merge-dependabot-pull-requests-with-a-single-review.md
├── rfc-030-customise-call-to-action-text-on-simple-smart-answers-start-page.md
├── rfc-015-environment-names.md
├── rfc-088-external-content.md
├── rfc-041-separate-document-type-from-format.md
├── rfc-083-case-insensitive-routing.md
├── rfc-063-naming-new-apps-gems-on-gov-uk.md
├── rfc-017-simpler-draft-stack.md
├── rfc-035-explicitly-make-the-details-hash-non-opaque.md
├── rfc-147-enable-speedcurve-http-protocol-capture.md
├── rfc-078-re-architect-signin-permissions-in-signon.md
├── rfc-075-managing-users.md
├── rfc-025-managing-special-snowflake-urls-through-the-publishing-api.md
├── rfc-043-content-items-without-a-base-path.md
├── rfc-057-default-values-for-application-secrets.md
├── rfc-086-draft-stack-rummager.md
├── rfc-058-publishing-api-events.md
├── rfc-091-sharing-assets.md
├── rfc-050-do-end-to-end-testing-of-gov-uk-applications.md
├── rfc-165-add-api-endpoints-to-transition.md
├── rfc-159-switch-off-whitehall-apis.md
├── rfc-023-putting-detailed-guides-paths-under-guidance.md
├── rfc-022-putting-elasticsearch-backups-in-s3.md
├── rfc-042-testing-backend-applications-against-production-traffic.md
├── rfc-036-stop-preserving-order-for-links.md
├── rfc-044-unpublishing-content-items.md
├── rfc-106-docker-for-local-development.md
├── rfc-097-verify-specific-start-pages.md
├── README.md
├── rfc-055-content-history.md
├── rfc-059-workflow-for-making-changes-to-the-schemas.md
├── rfc-016-how-to-prevent-published-live-frontends-from-reading-from-the-draft-content-store.md
├── rfc-087-dealing-with-errors.md
├── rfc-027-supporting-slug-changes-in-the-publishing-api.md
├── rfc-145-unarchive-govuk_admin_template.md
├── rfc-098-csp.md
├── rfc-100-linting.md
├── rfc-064-killing-router-data.md
├── rfc-013-thoughts-on-access-limiting-in-draft.md
├── rfc-052-pull-request-merging-process.md
├── rfc-056-ordered-link-types.md
├── rfc-163-cdn-rationalisation.md
├── rfc-186-replacement-of-email-based-fact-checking-process-in-publisher.md
├── rfc-146-production-deploy-access.md
├── rfc-108-including-gem-component-assets.md
├── rfc-093-retire-govuk-cdn-logs-monitor.md
├── rfc-142-add-interest-cohort-permssion-policy-header.md
├── rfc-138-enable-brotli-compression.md
├── rfc-143-split-database-instances.md
├── rfc-136-remove-our-backup-cdn-in-gcp.md
└── rfc-158-port-content-store-to-postgresql.md


/rfc-139/fb-h3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-139/fb-h3.png


--------------------------------------------------------------------------------
/rfc-139/youtube-h3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-139/youtube-h3.png


--------------------------------------------------------------------------------
/rfc-115/clean-connections.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-115/clean-connections.png


--------------------------------------------------------------------------------
/rfc-115/connection-view.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-115/connection-view.png


--------------------------------------------------------------------------------
/rfc-115/h2-dns-annotated.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-115/h2-dns-annotated.png


--------------------------------------------------------------------------------
/rfc-139/google-search-h3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-139/google-search-h3.png


--------------------------------------------------------------------------------
/rfc-139/visual-filmstrip.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-139/visual-filmstrip.png


--------------------------------------------------------------------------------
/rfc-139/visual-progress.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-139/visual-progress.png


--------------------------------------------------------------------------------
/rfc-115/cold-cache-summary.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-115/cold-cache-summary.png


--------------------------------------------------------------------------------
/rfc-138/total-bytes-search.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-138/total-bytes-search.png


--------------------------------------------------------------------------------
/rfc-139/http3-browser-usage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-139/http3-browser-usage.png


--------------------------------------------------------------------------------
/rfc-171/ie_usage_last_year.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-171/ie_usage_last_year.png


--------------------------------------------------------------------------------
/rfc-115/the-impact-annotated.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-115/the-impact-annotated.png


--------------------------------------------------------------------------------
/rfc-138/brotli-usage-caniuse.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-138/brotli-usage-caniuse.png


--------------------------------------------------------------------------------
/rfc-138/total-bytes-homepage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-138/total-bytes-homepage.png


--------------------------------------------------------------------------------
/rfc-138/visual-progress-covid.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-138/visual-progress-covid.png


--------------------------------------------------------------------------------
/rfc-138/visual-progress-search.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-138/visual-progress-search.png


--------------------------------------------------------------------------------
/rfc-138/total-bytes-coronavirus.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-138/total-bytes-coronavirus.png


--------------------------------------------------------------------------------
/rfc-138/visual-progress-homepage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-138/visual-progress-homepage.png


--------------------------------------------------------------------------------
/rfc-139/5_H2_2files_multiplexed.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-139/5_H2_2files_multiplexed.png


--------------------------------------------------------------------------------
/rfc-139/QUIC-illustration-final.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-139/QUIC-illustration-final.jpg


--------------------------------------------------------------------------------
/rfc-115/connection-view-annotated.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-115/connection-view-annotated.png


--------------------------------------------------------------------------------
/rfc-138/total-bytes-bank-holidays.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-138/total-bytes-bank-holidays.png


--------------------------------------------------------------------------------
/rfc-171/ie_usage_last_three_years.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-171/ie_usage_last_three_years.png


--------------------------------------------------------------------------------
/rfc-138/visual-progress-bank-holidays.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-138/visual-progress-bank-holidays.png


--------------------------------------------------------------------------------
/rfc-172/pgexplain-query-plan-visualisation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-172/pgexplain-query-plan-visualisation.png


--------------------------------------------------------------------------------
/rfc-115/underutilised-connections-annotated.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alphagov/govuk-rfcs/HEAD/rfc-115/underutilised-connections-annotated.png


--------------------------------------------------------------------------------
/.github/workflows/actionlint.yml:
--------------------------------------------------------------------------------
 1 | name: Lint GitHub Actions
 2 | on:
 3 |   push:
 4 |     paths: ['.github/**']
 5 | jobs:
 6 |   actionlint:
 7 |     runs-on: ubuntu-latest
 8 |     steps:
 9 |       - uses: actions/checkout@v4
10 |         with:
11 |           show-progress: false
12 |       - uses: alphagov/govuk-infrastructure/.github/actions/actionlint@main
13 | 


--------------------------------------------------------------------------------
/rfc-000-template.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: proposed
 3 | implementation: proposed
 4 | status_last_reviewed:
 5 | ---
 6 | 
 7 | # My RFC Title
 8 | 
 9 | ## Summary
10 | 
11 | An abstract, tl;dr or executive summary of your RFC.
12 | 
13 | ## Problem
14 | 
15 | Describe the problem your RFC is trying to solve.
16 | 
17 | ## Proposal
18 | 
19 | Describe your proposal, with a focus on clarity of meaning. You MAY use [RFC2119-style](https://www.ietf.org/rfc/rfc2119.txt) MUST, SHOULD and MAY language to help clarify your intentions.
20 | 


--------------------------------------------------------------------------------
/rfc-066-use-github-for-rfcs.md:
--------------------------------------------------------------------------------
 1 | ## Problem
 2 | 
 3 | - The confluence wiki is closed to people outside of GOV.UK
 4 | - Not everyone is a fan of the wiki's user interface
 5 | - It's hard to keep track of changes to RFCs during the proposal period
 6 | 
 7 | ## Proposal
 8 | 
 9 | - Move the RFCs to GitHub
10 | - RFCs are proposed as markdown documents and discussed in a pull request
11 | 
12 | ## Implementation details
13 | 
14 | - We'll create a repository **alphagov/govuk-rfcs**
15 | - We'll create pull requests for each existing RFC so that we keep the current numbering system
16 | 
17 | 


--------------------------------------------------------------------------------
/rfc-047-add-another-timestamp-to-content-items.md:
--------------------------------------------------------------------------------
 1 | ## Problem
 2 | 
 3 | One of the goals of migration is to make publishing apps use the publishing api as their data store. Publishing apps contain index pages that list all of the content they manage. These index pages are sorted by "most recently updated first" to help users find content they're working on more easily. The publishing api doesn't have a timestamp for this. The closest it has is updated\_at, but this is affected when content is republished. We could insist that republishing must happen in the same order that content appears on index pages, but republishing is often done via sidekiq, which doesn't guarantee message ordering. We also display this timestamp on index pages and it could confuse users if it changes as a result of republishing.
 4 | 
 5 | ## Proposal
 6 | 
 7 | Introduce a new timestamp for content items in publishing api that is updated whenever there is a major or minor update, but not on republishes. This field should be called something like 'edited\_at' or 'private\_updated\_at' and is for internal use only. It should not be sent downstream to the content store. Publishing apps then specify this sort order in the request to publishing api when requesting content for index pages.
 8 | 
 9 | &nbsp;
10 | 
11 | &nbsp;
12 | 
13 | 


--------------------------------------------------------------------------------
/rfc-045-consolidate-sidekiq-usage-into-shared-gem.md:
--------------------------------------------------------------------------------
 1 | ## Problem
 2 | 
 3 | The Publishing Platform team is currently working on adding request tracing to all apps that use the Publishing API in an asynchronous way via Sidekiq - [https://github.com/alphagov/collections-publisher/pull/194](https://github.com/alphagov/collections-publisher/pull/194)&nbsp;and&nbsp;[https://github.com/alphagov/whitehall/pull/2567](https://github.com/alphagov/whitehall/pull/2567). This will need repeating across many apps. The implementation is sufficiently complex that a mass-change of all apps in the future is not unlikely.
 4 | 
 5 | A previous shotgun surgery on apps with Sidekiq was the adding of sidekiq-statsd -&nbsp;[https://trello.com/c/z2aHqwS8/48-add-sidekiq-statsd-to-apps-that-use-sidekiq](https://trello.com/c/z2aHqwS8/48-add-sidekiq-statsd-to-apps-that-use-sidekiq)&nbsp;which needed ~10 PRs.
 6 | 
 7 | There are ~15 GOV.UK apps that use Sidekiq, and they all use slightly different versions, logging and configuration.
 8 | 
 9 | ## Proposal
10 | 
11 | Introduce a `govuk-sidekiq` gem that consolidates all GOV.UK Sidekiq conventions:
12 | 
13 | - Use automatic request tracing (based on&nbsp;[https://github.com/alphagov/whitehall/pull/2567](https://github.com/alphagov/whitehall/pull/2567))
14 | - Use sidekiq-statsd (like [https://github.com/alphagov/imminence/pull/117](https://github.com/alphagov/imminence/pull/117))
15 | - Use logging (`sidekiq-logging-json` is used very&nbsp;inconsistently)
16 | - Perhaps setup & configuration
17 | 
18 | This will make all our apps easier to manage and upgrade.
19 | 
20 | &nbsp;
21 | 
22 | &nbsp;
23 | 
24 | &nbsp;
25 | 
26 | &nbsp;
27 | 
28 | &nbsp;
29 | 
30 | &nbsp;
31 | 
32 | 


--------------------------------------------------------------------------------
/rfc-046-break-ability-to-replay-publishing-api-event-log.md:
--------------------------------------------------------------------------------
 1 | ## Problem
 2 | 
 3 | The Event Log is a log of all requests received by the publishing api. Theoretically, we should be able to replay each of these events in turn against a blank database and end up with the same content as in the current live publishing api database.
 4 | 
 5 | In practice, this replaying of events takes too long to be useful for disaster recovery. It also prevents in situ changes to data via data migrations and the Rails console, as these changes are not recorded in the event log, and thus wouldn't be replayed. There have already been data migrations and instances of "devopsing" in production which has broken the integrity of the Event log.
 6 | 
 7 | ## Proposal
 8 | 
 9 | In order to allow for more flexibility to developers to make ad hoc changes to data in ways that are not supported by the available commands, we're proposing we remove the implied contract that the event log can be replayed at any time. In practice, it is unlikely we'll ever need to do this. The event log is still very important as an audit trail of actions that have occurred in the publishing API, so this change will enable us to change how the log is stored long term. There are already almost 1.5m events in the log, and postgresql is not the most efficient solution for storing these.&nbsp;
10 | 
11 | The event log needs to be retained for now as the primary key for the table is used as an atomic counter for content items lock versions. We could in future archive events into cold storage after a certain time period. Other necessary changes to data that would not be logged in the Event log should be performed using Rails' migration functionality.
12 | 
13 | &nbsp;
14 | 
15 | &nbsp;
16 | 
17 | 


--------------------------------------------------------------------------------
/rfc-085-special-route-publisher.md:
--------------------------------------------------------------------------------
 1 | # Special Route Publisher app
 2 | 
 3 | ## Summary
 4 | 
 5 | This RFC proposes the creation of a Special Route Publisher app to cater for routes that require registration but have no content.
 6 | 
 7 | ## Problem
 8 | 
 9 | There are examples of certain routes that have no direct content and therefore do not live in any particular publishing but still require a route to be registered via the the Publishing-Api -> Content Store. Examples of these can be seen in [Frontend](https://github.com/alphagov/frontend/blob/master/lib/special_route_publisher.rb) and [Rummager](https://github.com/alphagov/rummager/blob/master/lib/tasks/publishing_api.rake) where the former publishes many routes, such as site search, and the latter the site map. Each app then has a rake task which is triggered on deploy and updates the routes via the Publishing API.
10 | 
11 | This creates some inconsistencies/issues:
12 | 
13 | 1. Frontend apps should ideally not talk directly to the Publishing API.
14 |    This has become more relevant recently whilst moving site search into Finder Frontend.
15 | 
16 | 2. No central location for the administration of special routes.
17 | 
18 | 3. No consensus as to which app should take responsibility for
19 |    publishing any future special routes.
20 | 
21 | 4. Each app that publishes these routes does so in slightly different ways.
22 | 
23 | ## Proposal
24 | 
25 | Create a Special Route Publisher app to handle the administration of any
26 | special routes that have no direct pieces of content and therefore
27 | cannot obviously live in any other publishing app.
28 | 
29 | The app could be used to publish other types of content items in the
30 | future, but this is out of scope for now. For example, external links
31 | are not currently sent to the Publishing API but are sent to search.
32 | Tracking them as content items would allow them to be used elsewhere.
33 | 


--------------------------------------------------------------------------------
/rfc-053-terminology-for-migration-progress.md:
--------------------------------------------------------------------------------
 1 | ## Problem
 2 | 
 3 | So far, we've been describing the progress of Migration using the terms "phase 1" and "phase 2". These terms have no formal definition, have become overloaded and can incorrectly imply the intent for our work. They were born out of casual conversation when Migration started, but are now used formally in our roadmap. There is confusion about the definition of the terms, and they are mis-applied to applications rather than formats. The terms also mask additional complexity, "phase 1" is more complicated than a single stage of work.
 4 | 
 5 | ## Proposal
 6 | 
 7 | Firstly, we should stop applying these states to entire applications. The only time we're able to refer to an application as a single state on the migration journey is when all of its formats are fully migrated, at which point we can stop talking about it altogether. A publishing application is responsible for a range of formats, and due to the process of migration these formats can and will be in different states.
 8 | 
 9 | Our proposed terminology is:
10 | 
11 | | Name | Description |
12 | | --- | --- |
13 | | **Pre-migration** | Publishing API is unaware of the existence of this format. The publishing app does not send this content to the API. |
14 | | **Placeholder** | Placeholders are sent to the Publishing API to represent the existence of documents of this format. They have at least a base path (where appropriate) and a title. |
15 | | **Content complete** | The Publishing API has everything needed for a frontend application to render the static content of documents of this format. The frontend application does not yet use this content to render documents. |
16 | | **Rendered** | As above, but a frontend application makes a request to the content store in order to render the static content. The publishing application is still the source of truth for the content, not the Publishing API. |
17 | | **Migrated** | The Publishing API is the canonical source of truth for the content for this format. The publishing application treats the Publishing API as its database, and the content can be removed from the publishing application's database. |
18 | 
19 | 


--------------------------------------------------------------------------------
/rfc-028-keeping-gov-uk-s-software-current.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: superseded
 3 | implementation: superseded
 4 | status_last_reviewed: 2024-03-06
 5 | status_notes: 'GOV.UK has a policy on keeping software current: https://docs.publishing.service.gov.uk/manual/keeping-software-current.html'
 6 | ---
 7 | 
 8 | # Keeping GOV.UK's software current
 9 | 
10 | One of our core values is to use secure and up to date software. This document lays out the recommendations for keeping our Ruby on Rails software current.
11 | 
12 | ## Introduction
13 | 
14 | We run a lot of Rails applications. This means that we have dependencies on both Rails and Ruby versions.
15 | 
16 | ## Upgrading Rails
17 | 
18 | It's very important that we're running a currently supported version of Rails for all applications, otherwise we **aren't covered** &nbsp;by [security fixes](http://rubyonrails.org/security/). We should:
19 | 
20 | - Be running on the current major version - this currently means `4.y.z`
21 | - Maintain our applications at the latest current bugfix release for the minor version we're on (expressed in Gemfile syntax as: `~> X.Y.Z`) - this currently means `4.1.8` and `4.2.3`
22 | - Keep abreast of breaking changes for the next major version (`5.y.z`), and have a plan to migrate our apps before `4.2.x` is deprecated
23 | 
24 | ## Upgrading Ruby
25 | 
26 | New versions of Ruby bring us improved performance and nicer syntax for certain things, but also can cause issues with the libraries etc. we use. We should:
27 | 
28 | - Be running on the current major version - this currently means `2.y.z`
29 | - Maintain our applications at the current or next-to-current minor version - this means `2.2.z` or `2.1.z`, depending on your app's dependencies
30 | 
31 | ## Current state
32 | 
33 | The current state of the Ruby and Rails versions is:
34 | 
35 | - [Listed in this versions spreadsheet](https://docs.google.com/spreadsheets/d/1FJmr39c9eXgpA-qHUU6GAbbJrnenc0P7JcyY2NB9PgU/edit#gid=1480786499) by&nbsp;alext
36 | - [Another spreadsheet with team ownership](https://docs.google.com/a/digital.cabinet-office.gov.uk/spreadsheets/d/17SaFqFqVEMoabq-FjEeCHpUmA5yAjqLr_Vt-lDeXMsE/edit?usp=sharing) by&nbsp;alexandria.jackson.
37 | 
38 | 


--------------------------------------------------------------------------------
/rfc-067-tagging-to-organisations.md:
--------------------------------------------------------------------------------
 1 | # Tagging to organisations (not implemented)
 2 | 
 3 | ---
 4 | **NOTE 2017/06/12**: This PR has not been implemented because of difficulty getting consensus about the semantics of the organisations tags. The only tag that has been added at the time of writing is the `primary_publishing_organisation`, which contains the first "lead organisation" for Whitehall documents.
 5 | ---
 6 | 
 7 | ## Problem
 8 | 
 9 | Historically, we've had "organisation tagging" on GOV.UK, but it's a bit
10 | muddled. In particular, the value of "organisations" in the content store means
11 | different things for different publishing apps.
12 | 
13 | ## Model
14 | 
15 | There will be three organisation tagging options for any page on GOV.UK:
16 | 
17 | - **Primary publishing organisation** This is the publisher, or "owner" of the
18 |   page.
19 | - **Additional publishing organisations** In cases when there's more than 1
20 |   organisation that has worked on the content.
21 | - **Related organisations** The content is related to these organisations.
22 | 
23 | ## Implementation
24 | 
25 | - The content store will expose the three organisation link types separately
26 | - The search API will expose the three organisation link types separately, but
27 |   also expose a compound field that contains all organisations
28 | - We'll send the three types as separate dimensions to analytics, but also a
29 |   compound field that contains all the organisations
30 | 
31 | ## Usage
32 | 
33 | - Survey team can use primary publishing org from the search API to show
34 |   breakdown per org
35 | - Content performance manager can use the primary publishing org to show
36 |   ownership
37 | - Content transformation will use the primary publishing org to assign content
38 | - We'll use all the organisations combined to do facetted search
39 | 
40 | ## Mapping
41 | 
42 | ### Whitehall
43 | 
44 | - The first lead org is the primary publishing org
45 | - The other lead orgs become the additional publishing orgs
46 | - The supporting orgs become the related orgs
47 | 
48 | ### Publisher
49 | 
50 | - The primary publishing org will be **Government Digital Service**
51 | - The current "Organisations" become related orgs
52 | 


--------------------------------------------------------------------------------
/rfc-103-merge-dependabot-pull-requests-with-a-single-review.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-04
 5 | ---
 6 | 
 7 | # Merge dependabot pull requests with a single review
 8 | 
 9 | ## Problem
10 | 
11 | I think there are two somewhat related problems that I'd like to see
12 | addressed.
13 | 
14 | Pull Requests by Dependabot, that the guidance describes as requiring
15 | a review from two people are being merged with a single review.
16 | 
17 | Pull Requests by Dependabot are up for long amounts of time, without
18 | being merged, including those that could be time sensitive like
19 | security fixes.
20 | 
21 | ## Proposal
22 | 
23 | I think a step forward in addressing both problems would be to not
24 | treat Dependabot as an external contributor when reviewing Pull
25 | Requests, and amend the guidance on reviewing and merging Pull
26 | Requests to permit merging Pull Requests by Dependabot with a single
27 | review.
28 | 
29 | ## Rationale
30 | 
31 | Currently, Pull Requests by Dependabot are viewed as coming from an
32 | external contributor, and as such, should be reviewed by two people
33 | employed by GDS, working on GOV.UK.
34 | 
35 | However, Dependabot is already a special case in the following ways:
36 | 
37 |  - It's an automated service, not a person raising the Pull Requests
38 |  - Dependabot pushes changes directly to the repositories hosted on
39 |    the alphagov GitHub organisation, rather than using a fork of the
40 |    repository
41 |  - It should only be changing files in the repositories relating to
42 |    the versions of dependencies (Gemfile and Gemfile.lock in the case
43 |    of Rubygems/Bundler)
44 | 
45 | Therefore, to try and decrease the amount of time Pull Requests remain
46 | open, and reduce the amount of time spent reviewing them, the proposal
47 | is to amend the guidance to only require a single review on GitHub.
48 | 
49 | Since the introduction of Dependabot, merging Pull Requests with only
50 | a single review has been happening, including when this doesn't adhere
51 | to the guidance. Having more approvals than necessary is good, but
52 | having fewer doesn't match up with the current guidance, so this
53 | change would make the guidance and practice line up better.
54 | 


--------------------------------------------------------------------------------
/rfc-030-customise-call-to-action-text-on-simple-smart-answers-start-page.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-06
 5 | ---
 6 | 
 7 | # Customise call-to-action text on simple Smart Answer start pages
 8 | 
 9 | ## Problem
10 | 
11 | According to a colleague people get confused by 'Start now' on some Simple Smart Answers' start pages when it's not related to the thing they're doing, eg paying or contacting a department/agency.
12 | 
13 | For example, there are roughly 30 contacts per day for 1st line about the contact [dvla](https://www.gov.uk/contact-the-dvla)&nbsp;/&nbsp;[dvsa](https://www.gov.uk/contact-dvsa) Simple Smart Answers. See [feedex entries for /contact-dvsa](https://support.production.alphagov.co.uk/anonymous_feedback?path=%2Fcontact-dvsa) and [/contact-the-dvla](https://support.production.alphagov.co.uk/anonymous_feedback?path=%2Fcontact-the-dvla).
14 | 
15 | ## Proposal
16 | 
17 | Allow customising the text value on the call to action button in the publisher application. Currently its value is [hard-coded to 'Start now'](https://github.com/alphagov/frontend/blob/d9e2852faf4d47a26c9e9c2192f3747f90a7ed3c/app/views/root/simple_smart_answer.html.erb#L9).&nbsp;
18 | 
19 | &nbsp;
20 | 
21 | Introducing this change would affect at least three repos:
22 | 
23 | &nbsp;
24 | 
25 | **1) govuk\_content\_models**
26 | 
27 | Simple Smart Answers would need an additional attribute to store the text value of the action button.&nbsp;Content designers suggest a free-text field limited to ~15 characters.
28 | 
29 | All existing Simple Smart Answers would need a data migration to set this value to 'Start now'.
30 | 
31 | &nbsp;
32 | 
33 | **2) publisher**
34 | 
35 | Publisher would need to show the new attribute in the UI. I've mocked up what it could look like (Action button):
36 | 
37 | &nbsp;
38 | 
39 | **3) frontend**
40 | 
41 | Frontend would need to show the variable attribute value on the call to action button.
42 | 
43 | Since the change would touch applications looked after by different teams, I would like to make sure it does not clash with the vision of the future of those applications.
44 | 
45 | I would also like to receive comments about whether this is the right solution to the problem and I am not violating any user experience guidelines.
46 | 
47 | &nbsp;
48 | 
49 | &nbsp;
50 | 
51 | 


--------------------------------------------------------------------------------
/rfc-015-environment-names.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-06
 5 | ---
 6 | 
 7 | # Environment names
 8 | 
 9 | ## Problem
10 | 
11 | The current naming scheme for our different environments is confusing and does not match the standard definitions. We need to come up with a set of definitive names that avoid confusion and allow us and our publishing users to distinctly identify particular environments.
12 | 
13 | ## Proposal
14 | 
15 | | Environment name | Live URL | Preview URL | Existing environment name | Existing live URL |
16 | | ---------------- | -------- | ----------- | ------------------------- | ----------------- |
17 | | Production       | www.gov.uk (www.publishing.service.gov.uk) | www-preview.publishing.service.gov.uk | Production | www.gov.uk |
18 | | Staging          | www.staging.publishing.service.gov.uk | www-preview.staging.publishing.service.gov.uk | Staging | production.alphagov.co.uk |
19 | | Integration      | www.integration.publishing.service.gov.uk | www-preview.integration.publishing.service.gov.uk | Preview | preview.alphagov.couk |
20 | 
21 | This involves reassigning some existing names for other purposes, and using some completely new names. In particular, what is currently known as the "Draft" stack becomes "Preview". This has two benefits:
22 | 
23 | - It matches what editors are expecting to do - that is, preview their content before making it live.
24 | - It weans them off using the separate environment that we currently call "preview", allowing us to rename it to match its actual intended usage, which is for integration&nbsp;testing.
25 | 
26 | Note that the transition will not be as painful as it could be, as no editors are currently using "draft". The main re-education task will be to stop thinking of "preview" as a scratchpad that gets reset regularly, but as somewhere to actually preview content that is going to be live.
27 | 
28 | Also note that thanks to some reconfiguration that Infrastructure are doing as part of the move away from Skyscape, the domain name for all environments will change from "alphagov.co.uk" to "publishing.service.gov.uk". Since we retain control of alphagov, we will be able to redirect as appropriate.
29 | 
30 | In addition, this work will give a specific URL to the staging environment, rather than using the production domain and having to edit the hosts file.
31 | 


--------------------------------------------------------------------------------
/rfc-088-external-content.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-04
 5 | ---
 6 | 
 7 | # External content
 8 | 
 9 | ## Summary
10 | 
11 | Create an `external_content` document type to represent pages that aren't part
12 | of GOV.UK but are relavent to our users.
13 | 
14 | ## Problem
15 | 
16 | Not everything government related is on GOV.UK.
17 | 
18 | Search admin stores "External links", which can be returned for certain search terms.
19 | 
20 | For example:
21 | - searches for "mp" can return the parliament.uk page http://www.parliament.uk/mps-lords-and-offices/mps/
22 | - searches for "fit for work" can return http://fitforwork.org/
23 | - searches for council names can return the council website
24 | 
25 | We are currently changing search indexing to source all its content from the publishing API, but external links
26 | have never been part of the publishing API.
27 | 
28 | ## Proposal
29 | 
30 | Create an `external_content` document type so we can store these links in the publishing API.
31 | 
32 | Create an `external_content` schema.
33 | 
34 | The schema MUST store in the details hash:
35 | - `hidden_search_terms` - a set of search keywords/phrases that should route users to the page. [This field has already been added for smart answers](https://github.com/alphagov/govuk-content-schemas/pull/685/files).
36 | -  `url` - a URL for the external resource
37 | 
38 | The schema MAY store information about change history.
39 | 
40 | The standard fields `title` and `description` MUST be set.
41 | 
42 | In accordance with [RFC 43](https://github.com/alphagov/govuk-rfcs/blob/master/rfc-043-content-items-without-a-base-path.md) The `base_path`, `rendering_app`, `redirects` and `routes` MUST NOT be set.
43 | 
44 | ## Consequences
45 | 
46 | - External content can be defined centrally and reused across the platform
47 | - The standard `links` hash can be used to refer to external content. The platform does not make any distinction between internal and external content.
48 | 
49 | This means that `external_related_links` in the details hash is technically redundant, and external links could be managed independently of the publishing workflow (for example, through content tagger).
50 | 
51 | This RFC acknowledges but does not address this duplication. The intention is only to move the search admin concept of an external link into the publishing API.
52 | 


--------------------------------------------------------------------------------
/rfc-041-separate-document-type-from-format.md:
--------------------------------------------------------------------------------
 1 | ## Problem
 2 | 
 3 | The publishing system uses the&nbsp;`format` element of a content item for three purposes:
 4 | 
 5 | - to identify the schema used to validate a content item
 6 | - to determine the way the item is displayed on the frontend
 7 | - (in phase 2) to filter lists of objects in publishing apps
 8 | 
 9 | Although these use cases are related, it is not necessarily the case that a schema maps directly to a document type: one example is in specialist-publisher, where there is one single schema for "specialist document" but many different types of document, eg CMA cases, AAIB reports, drug safety alerts, etc, which need to be displayed separately in the publishing app.
10 | 
11 | This is likely to become more significant when the work is done to consolidate formats; at that point we might only have a small selection of schemas and front-end templates, but potentially still need to distinguish sub-types in the publishing apps.
12 | 
13 | Currently, specialist documents define a "document type" field in the details hash; however, because it is not at the top level of the document, it is not available for use in filtering.
14 | 
15 | ## Proposal
16 | 
17 | We will deprecate the current `format` field, and replace it with two new fields:
18 | 
19 | - `schema_name`&nbsp;- determines which file in govuk-content-schemas is used to validate the item.
20 | - `document_type` - used for frontend display and for filtering in publishing apps&nbsp;
21 | 
22 | ### Migration
23 | 
24 | We will support both naming types for a period to ease transition, and modify publishing-api to supply missing data where necessary. If only the old&nbsp;`format` field is supplied, its value will be copied to&nbsp;`schema_name` and document`_type`. If&nbsp;`format` is not supplied, its value will be copied from&nbsp;`schema`. It will be an error to supply only one of&nbsp;`schema` and&nbsp;`content_type`.
25 | 
26 | During the deprecation period, all three fields will be supplied to content-store, and both `document_type`&nbsp;and&nbsp;`format` will be passed from there to the frontends. There are only a small number of apps that the `format`&nbsp;field in the frontend; as soon as they is updated to use `document_type`, the deprecated field will be dropped from content-store and the frontend representation, and the schemas updated.
27 | 
28 | &nbsp;
29 | 
30 | &nbsp;
31 | 
32 | 


--------------------------------------------------------------------------------
/rfc-083-case-insensitive-routing.md:
--------------------------------------------------------------------------------
 1 | # Case insensitive routing on GOV.UK
 2 | 
 3 | ## Summary
 4 | 
 5 | Make base paths case insensitive for routing purposes, and only allow one case of a base path to be registered with the router and publishing-api.
 6 | 
 7 | ## Problem
 8 | 
 9 | Our stack currently regards all base paths to be case-sensitive. The router carries out routing on this basis. Although nginx has a configuration that redirects all-uppercase base paths to their lowercase equivalent, there are no rules for mixed-case base paths.
10 | 
11 | In addition, publishing-api allows multiple content items where the base paths are only differentiated by case.
12 | 
13 | This has the potential to cause confusion for end users if they try to visit a page using a mixed-case path. In most circumstances, it results in a 404 error, but in the case of prefix routes, the result can be unpredictable since the routing is handled by the backend application.
14 | 
15 | It can be argued that most end users do not understand and do not care about case sensitivity, and would be surprised to learn that there is the potential for www.gov.uk/education to lead to a navigation page whereas www.gov.uk/Education leads to a 404 error, or even worse, can be claimed by a completely different app displaying a different page.
16 | 
17 | Case sensitivity as a general rule made sense when all requests were for files stored on a file system which was itself case sensitive. However, given that most of our base paths are virtual and routed to apps, this argument does not apply.
18 | 
19 | ## Proposal
20 | 
21 | * The router should only allow one case (uppercase, lowercase or mixed case) of a base path to be registered as a route (so `/education` can be registered but `/Education` will be regarded to be a duplicate)
22 | * The router should match requested base paths to routes in a case insensitive manner (so a request for `/EDUCATION` will match `/education`)
23 | * The router should redirect requested base paths that match a route lexically but not in case with a 301 redirect to the route in the case is it registered with (so a request for `/EDUCAtion` will result in a 301 redirect to `/education`)
24 | * publishing-api should be audited to find any existing content that differs only in the case of the base path, and these issues should be resolved
25 | * publishing-api should only allow one case of a base path to be registered
26 | 


--------------------------------------------------------------------------------
/rfc-063-naming-new-apps-gems-on-gov-uk.md:
--------------------------------------------------------------------------------
 1 | ## Problem
 2 | 
 3 | Apps and gems on GOV.UK have history followed different naming schemes.
 4 | Recently we've established a pattern we're quite happy with, but we've never
 5 | written that down.
 6 | 
 7 | This RFC is an attempt to write down the current conventions.
 8 | 
 9 | ## Naming applications
10 | 
11 | Firstly, the [service manual has good guidance on naming
12 | things](https://www.gov.uk/service-manual/design/naming-your-service).
13 | 
14 | The most important rules:
15 | 
16 | - The name should be self-descriptive. No branding or puns (like Rummager,
17 |   Needotron and Maslow)
18 | - Use **dashes** for the URL and GitHub repo
19 | - The name of the app should be the same on GitHub, Puppet and hostname
20 | 
21 | ### Publishing applications
22 | 
23 | Applications that publish things are named **x-publisher**.
24 | 
25 | Good:
26 | 
27 | - specialist-publisher
28 | - manuals-publisher
29 | 
30 | Not so good:
31 | 
32 | - publisher (too generic)
33 | - contacts-admin (could be contacts-publisher)
34 | 
35 | ### Frontend applications
36 | 
37 | Applications that render content to end users on GOV.UK are named
38 | **x-frontend**
39 | 
40 | Good:
41 | 
42 | - government-frontend
43 | - email-alert-frontend
44 | 
45 | Not so good:
46 | 
47 | - collections (could be collections-frontend)
48 | - frontend (too generic)
49 | 
50 | ### APIs
51 | 
52 | Applications that just expose an API are named **x-api**.
53 | 
54 | Good:
55 | 
56 | - publishing-api
57 | - email-alert-api
58 | - router-api
59 | 
60 | Not so good:
61 | 
62 | - rummager (should be search-api)
63 | 
64 | ### Admin applications
65 | 
66 | Applications that "manage" things can be called **x-manager** or **x-admin** or
67 | **thing-doer**.
68 | 
69 | Good:
70 | 
71 | - search-admin
72 | - local-links-manager
73 | - content-tagger
74 | 
75 | No so good:
76 | 
77 | - signonotron2000
78 | - maslow (needs-manager)
79 | 
80 | ## Naming gems
81 | 
82 | - Use the official [Rubygems naming
83 |   convention](http://guides.rubygems.org/name-your-gem/)
84 | - Use underscores for multiple words
85 | - Use `govuk_` prefix if the gem is only interesting to projects within GOV.UK
86 | 
87 | Good:
88 | 
89 | - govuk\_sidekiq
90 | - govuk\_content\_models
91 | - govuk\_admin\_template
92 | - vcloud-edge\_gateway
93 | 
94 | Not so good:
95 | 
96 | - slimmer
97 | - plek
98 | - gds-sso (should be gds\_sso, or govuk\_sso)
99 | 


--------------------------------------------------------------------------------
/rfc-017-simpler-draft-stack.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: superseded
 3 | implementation: superseded
 4 | status_last_reviewed: 2024-03-06
 5 | status_notes: We no longer use these infrastructure components.
 6 | ---
 7 | 
 8 | # Simpler draft stack
 9 | 
10 | ## Problem
11 | 
12 | The current draft stack design is laid out as an exact copy of the `frontend`, `API`, and `management` VDCs in the GOV.UK vCloud Organisation. Because we've replicated all our data store clusters (Mongo and ElasticSearch), as well as our monitoring stack, we have over 30 machines in this VDC.
13 | 
14 | Additionally, because we've hit the limit of VDCs in our current vCloud Org, we've had to build this VDC in a new org.
15 | 
16 | Creating a new org has some extra implications to our setup, namely:
17 | 
18 | - It creates deployment complexity - we need extra Jenkins instances to maintain a privileged connection to the new Org, and chain Jenkins jobs across Orgs together to deploy
19 | - It creates operational complexity
20 |   - we have to create extra SSH configuration and deploy this to all clients to allow us to easily get on the machines
21 |   - we now have at least 2 monitoring stacks to watch (in different orgs)
22 |   - we now have at least 2 Kibana instances, so logs are no longer all in one place
23 | - Data synchronisation is a problem because we have to create VPNs between Orgs
24 | 
25 | ## Proposal
26 | 
27 | To reduce the complexity, we propose to, within the main GOV.UK Production vCloud Org:
28 | 
29 | - Add 2 machines to the cache VDC: `draft-cache-{1,2}`
30 | - Add 2 machines to the frontend VDC:` draft-frontend-{1,2}`
31 | - Add 2 machines to the API VDC:` draft-content-store-{1,2}`
32 | 
33 | The applications that depend on data stores (`router, content-store, router-api`) will use the existing clusters, but use different database names (eg `draft_content_store_production`).
34 | 
35 | This has a few advantages over the previous design:
36 | 
37 | - It requires significantly less machines (6 vs 30+)
38 | - It simplifies deployments - we're deploying into a single Org so no need for extra Jenkins machines
39 | - It simplifies operations - we can use the existing Errbit, Icinga, and Kibana setups; and for 2nd line staff no change is required to SSH configuration
40 | 
41 | it also has one main negative - the security model previously offered (total separation of content and data/requests) is no longer as simple to come by. We plan to address this using vShield Edge firewalling (see ).
42 | 
43 | 


--------------------------------------------------------------------------------
/rfc-035-explicitly-make-the-details-hash-non-opaque.md:
--------------------------------------------------------------------------------
 1 | ## Problem
 2 | 
 3 | In the JSON schemas we use to send data between our systems, we've traditionally tried to treat the `details` part of the payload as entirely opaque (eg no special knowledge should be required of anything between the publishing tool sending and the frontend rendering the payload).
 4 | 
 5 | The reality is that in the [govuk-content-schemas](https://github.com/alphagov/govuk-content-schemas) repository, almost all formats have required fields in the details hash. Examples:
 6 | 
 7 | - [case studies](https://github.com/alphagov/govuk-content-schemas/blob/b02afaac06ddd965e114b3ff577faf1952c628e0/formats/case_study/publisher/details.json#L6-L7)
 8 | - [finders](https://github.com/alphagov/govuk-content-schemas/blob/b02afaac06ddd965e114b3ff577faf1952c628e0/formats/finder/publisher/details.json#L6-L7)
 9 | - [specialist documents](https://github.com/alphagov/govuk-content-schemas/blob/b02afaac06ddd965e114b3ff577faf1952c628e0/formats/specialist_document/publisher/details.json#L6-L8)
10 | 
11 | All these required fields are for the rendering apps to have enough data to fulfil the business logic required of the format.
12 | 
13 | In order to provide bespoke validation of the schemas within the publishing API, we need to inspect the JSON payloads and validate them using JSON schema, and provide feedback on specific errors.
14 | 
15 | In the future we may also want to provide additional parsing and understanding (eg a future dependency resolver will want to know which fields are&nbsp;`govspeak`&nbsp;vs plain text in order to rewrite links and dependencies, or when attachments get out of virus scanning).
16 | 
17 | ## Proposal
18 | 
19 | Therefore we propose to no longer treat the `details`&nbsp;part of the payload as entirely opaque, but instead allow for inspection and enhancement when required of the format.
20 | 
21 | This requires no work as we're already doing this in code in some places - but we would change our expectations about certain fields (eg datetime fields, govspeak fields, attachments).
22 | 
23 | These fields will be marked as custom types in the content schemas (code TBD), and documented, with their addition to be discussed as part of code review. They&nbsp; **will not** be added on a per format basis - any special fields added should be re-usable across the entire dataset.
24 | 
25 | &nbsp;
26 | 
27 | &nbsp;
28 | 
29 | &nbsp;
30 | 
31 | &nbsp;
32 | 
33 | &nbsp;
34 | 
35 | &nbsp;
36 | 
37 | &nbsp;
38 | 
39 | 


--------------------------------------------------------------------------------
/rfc-147-enable-speedcurve-http-protocol-capture.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-04
 5 | ---
 6 | 
 7 | # Allow SpeedCurve to capture user HTTP Protocol Data
 8 | 
 9 | ## Summary
10 | 
11 | We use Speedcurve RUM to capture detailed user performance data on GOV.UK. Fastly ammounced [HTTP/3 + QUIC](https://twitter.com/fastly/status/1520139864032874497) is now availabe for all customers for free as it is now out of beta. We already have an [RFC in draft](https://github.com/alphagov/govuk-rfcs/pull/139) about enabling this on GOV.UK.
12 | 
13 | But before we enable HTTP/3 + QUIC on GOV.UK I'd like to be able to capture what protocol a user is using via SpeedCurve. This will alow us to quantify if the change has made any difference for users (especially those on unreliable connections). I feel this could be a great bit of research / PR for how we are improving performance for all users of GOV.UK, no matter what their connection or device.
14 | 
15 | ## Problem
16 | 
17 | Data capture to quantify the change from HTTP/2 + TCP to HTTP/3 + QUIC.
18 | 
19 | ## Proposal
20 | 
21 | I'd like to include a small piece of JavaScript in the current SpeedCurve RUM implimentation that will only fire when a user accepts the cookie banner. This JavaScript will look at the current HTTP protocol the user is using and push this anonymous data into SpeedCurve using their API. 
22 | 
23 | This will then give us an additional dimension in the SpeedCurve GUI with which to compare before / after the change is made.
24 | 
25 | The actual JavaScript is small and will have a negligible effect of page performance as it is simply reading a string value from an object in a supporting browsers [Navigation Timing Level 2 API](https://www.w3.org/TR/navigation-timing-2/).
26 | 
27 | The actual code to include looks like this:
28 | 
29 | ```js
30 | // use the LUX.addData method to capture HTTP protocol information.
31 | LUX.addData("http-protocol", performance.getEntriesByType('navigation')[0].nextHopProtocol);
32 | ```
33 | 
34 | We should report the contents of `performance.getEntriesByType('navigation')[0].nextHopProtocol` using the `LUX.addData()` function - SpeedCurve have some good example of this [in their recipes][1].
35 | 
36 | Once this code is added, we should then have information about what protocol the user used and the difference it made to the performance, allowing us to compare aggregate data for HTTP/2 and HTTP/3 users over a set period of time (likely 1-2 months).
37 | 
38 | [1]: https://support.speedcurve.com/recipes/track-size-for-a-single-resource
39 | 


--------------------------------------------------------------------------------
/rfc-078-re-architect-signin-permissions-in-signon.md:
--------------------------------------------------------------------------------
 1 | # Re-architect signin permission in signon
 2 | 
 3 | ## Summary
 4 | 
 5 | Change the 'signin' permission in signon to be one that signon itself checks for as part of the oauth handshake and reject users trying to login to applications they don't have access to.
 6 | 
 7 | ## Problem
 8 | 
 9 | In signon all apps have a 'signin' permission and the UI for managing permissions on a user strongly suggests that if the user is not granted this permission for an application they will not be able to access that application.  The reality is that it is up to the applications themselves to care about the 'signin' permission via the gds-sso gem and the `require_signin_permission!` controller method.
10 | 
11 | This means that for some applications unchecking the "has access to?" box for a user will block them from logging in, but for others it won't.  There's nothing in signon that can explain this to an admin, because signon doesn't know which applications are coded to use `require_signin_permission!` and which allow any signon user.  Only applications that a user has explicit 'signin' permission for are listed on their dashboard so signon users may not even know they have access to an application that allows all users because it won't be listed if they haven't been granted 'signin' permission.
12 | 
13 | More importantly, for those apps that allow any signon user we have no mechanism to stop a user from accessing that application other than suspending their entire account and blocking them from using all applications that use signon.
14 | 
15 | ## Proposal
16 | 
17 | The proposed solution is that signon handles the 'signin' permission during the oauth handshake and applications cannot circumvent this.  This means that the existing signon UI represents the truth of the situation and all applications a user has access to will be listed in their dashboard.  To cater for those applications that do want to allow all users to access them signon will provide bulk permission granting and default permission granting functionality.
18 | 
19 | * Signon MUST check for 'signin' permission during the oauth handshake with an application and reject users that do not have this permission.
20 | * Signon MUST allow a super admin to grant permissions to all users in one go.
21 | * Signon SHOULD allow super admins to mark permissions for an application as 'default' meaning it is added to all new users.
22 | * gds-sso SHOULD deprecate `require_signin_permission!`.
23 | * applications using signon SHOULD be audited to make sure they will continue to function after these changes are made
24 | * applications using signon SHOULD be audited to collect the set of default permissions that should be created and bulk granted to all existing users
25 | 
26 | 


--------------------------------------------------------------------------------
/rfc-075-managing-users.md:
--------------------------------------------------------------------------------
 1 | # Managing users on GOV.UK
 2 | 
 3 | ## Summary
 4 | 
 5 | Create a central repository of users in GOV.UK. Use that to verify that the
 6 | correct people have access.
 7 | 
 8 | ## Context
 9 | 
10 | Tech staff on GOV.UK can have access to dozens of things, like SSH access, GitHub organisation membership, and AWS accounts. When someone leaves access is revoked [using a check list][leaver] in a leavers card on the 2nd line Trello.
11 | 
12 | ## Problem
13 | 
14 | 1. The leavers process is slow and manual (and a bit boring). This means sometimes leavers aren't removed as quickly as we want.
15 | 2. The process of removing people from access lists is based around making _changes_. There is no system in place that verifies that everyone who currently has access is actually allowed to. This means that if someone processes a leaver ticket and forgets a step, a user account can linger around forever.
16 | 3. We'd like to use [GitHub teams for authentication to our Jenkins][jenk]. There are some concerns around that because it's very easy to add people to teams, and there is little visibility around the membership changes.
17 | 
18 | [jenk]: https://github.com/alphagov/govuk-puppet/pull/5910
19 | 
20 | ## Proposal
21 | 
22 | We make a central list of GOV.UK users. We use this list to periodically verify that only the correct people have access. We do this by creating a Jenkins job that alerts Slack and Icinga if there are unexpected users with access to something.
23 | 
24 | This will make the leavers process more deterministic. It also prevents people from being added to the GitHub team outside our normal process.
25 | 
26 | There are a number of potential extensions to this, like adding an expiry date to the user accounts, which means we would get alerted when to remove a user (a concept that bob found in Gitlab).
27 | 
28 | ## Implementation
29 | 
30 | This is a proof of concept:
31 | 
32 | <https://github.com/alphagov/govuk-user-reviewer/tree/prototype>
33 | 
34 | It defines a list of users like this:
35 | 
36 | ```yaml:
37 | - username: johndoe
38 |   github_username: john-code
39 | ```
40 | 
41 | The script in the repo will check:
42 | 
43 | 1. SSH access to (old) CI machines
44 | 1. SSH access to mirrors
45 | 1. SSH access to backup machines
46 | 1. SSH access to integration
47 | 1. Access to Jenkins
48 | 1. Access to AWS in integration, staging & production (via Terraform)
49 | 1. GitHub team membership (the GOV.UK team)
50 | 
51 | These aren't all the places access lists are defined (see the [rest of the leaver ticket template][leaver]), but it's a start. Once our credential repos are moved to public GitHub, production access can also be checked. Possibly we could create an endpoint for Signon to allow us to verify the admins/superadmins.
52 | 
53 | [leaver]: https://trello.com/c/PmVyofn8/3-template-leaver-dev-webops
54 | 


--------------------------------------------------------------------------------
/rfc-025-managing-special-snowflake-urls-through-the-publishing-api.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: superseded
 3 | implementation: superseded
 4 | status_last_reviewed: 2024-03-06
 5 | status_notes: Content Store content is kept in sync with Publishing API so there is no longer a choice.
 6 | ---
 7 | 
 8 | # Managing special snowflake URLs through the Publishing API
 9 | 
10 | ## Problem
11 | 
12 | There are a [number of URLs on GOV.UK](https://docs.google.com/spreadsheets/d/1LUpym0SVeOkom-k6qnqUic1tyIRhnHgKrAsXwr1kX5w/edit#gid=0) which do not fall into the category of traditional leaf-node published content e.g. robots.txt, search, or the homepage.
13 | 
14 | These URLs are frequently registered with the router directly. &nbsp;Since they aren't entered into the URL arbiter this sometimes causes clashes and/or [downtime if important routes are overwritten](https://docs.google.com/document/d/1Ev_axmMdvsg3WTnYpdYVBciSAzM3q64QklB8MOZl7Go).
15 | 
16 | ## Proposal
17 | 
18 | The minimum required to add safety to this process is to have the routes pass through the URL arbiter. &nbsp;We could have all applications which register these snowflake routes directly instead check with the arbiter first, but this is an untidy option and open to error. &nbsp;We'd also like to reduce the number of applications which talk directly to the router API as much as possible. &nbsp;This suggests we use the publishing API as the registration mechanism.
19 | 
20 | So far there are two proposals on how to do this:
21 | 
22 | 1. **Use the publishing API as an endpoint but don't write to the content store**. &nbsp;This would require a new endpoint on the API which engages in URL arbitration but does not push to either live or draft content stores. &nbsp;It would also require the API to speak to the router API directly, something which it currently delegates to the content store.
23 | 2. **Add content store entries**. &nbsp;This would require no changes to the publishing API. &nbsp;The owning applications would publish a "snowflake" format document to the publishing API on deploy (or whenever else they normally do this, e.g. via rake tasks) containing some human-readable information about the path and why it exists (e.g. "`/search` handles the display of search results across the site" or "`/info` is a prefix to provide statistics about other routes on GOV.UK") and allow the standard publishing pipeline to deal with URL arbitration and route registration. &nbsp;This could later allow some of the routes to be not only registered via the publishing pipeline but also rendered out of the content store, for example in the case of `robots.txt`.
24 | 
25 | My personal preference is for proposal #2. &nbsp;Here's an example of what the document might look like for `robots.txt`:&nbsp;[https://gist.github.com/elliotcm/cddd6ea1f4e3989009bd](https://gist.github.com/elliotcm/cddd6ea1f4e3989009bd)
26 | 
27 | 


--------------------------------------------------------------------------------
/rfc-043-content-items-without-a-base-path.md:
--------------------------------------------------------------------------------
 1 | ## Problem
 2 | 
 3 | There's a need for "unadressable content" on GOV.UK. These are content items without a URL (base path) that won't be visible on GOV.UK. They would be used in the links hash and for dependency resolution.
 4 | 
 5 | The need for having these first surfaced when talking about modelling governments in the publishing-api. Having content-items of document type "government" in the publishing-api would allow us to use the links hash to specify to which government a piece of content belongs. With dependency resolution we can then put everything related to one government in history mode.
 6 | 
 7 | Other use cases for this are:
 8 | 
 9 | - **The tags for the new single taxonomy**. Linking to things without a base path means we don't need a page for each "tag", but we can have a more flexible architecture where we display multiple tags on one page.
10 | - **Adding external links to the search index**. These currently live in [recommended links repo](https://github.com/alphagov/recommended-links) and are put directly into rummager. In the future we'll populate the search index from the message queue exclusively. This means that everything that should be in search also needs to be in the publishing-api.
11 | - **Councils** &nbsp;may be added as part of the work the Custom team is doing to [rebuild local transactions](https://gov-uk.atlassian.net/wiki/display/GOVUK/RFC+33+Local+transactions+migration+approach).
12 | 
13 | There has been some work done on this:&nbsp;[https://trello.com/c/b77KFGgc/523-add-support-for-nil-base-paths-in-publishing-api](https://trello.com/c/b77KFGgc/523-add-support-for-nil-base-paths-in-publishing-api).&nbsp;
14 | 
15 | This RFC is intended to give us the opportunity to really think this through, as this is a important architectural feature.
16 | 
17 | ## Proposed requirements
18 | 
19 | 1. Publishing API&nbsp;should support these content items without knowing about the formats. Instead, whether or not the format is addressable or not should be set in the content-schemas.
20 | 2. The content item needs testable with [govuk-content-schemas](https://github.com/alphagov/govuk-content-schemas) (the current build process assumes that all schemas will have a required `base_path`)
21 | 3. Publishing API validates that the items don't have a `base_path`, `rendering_app`, `redirects` and `routes` when writing
22 | 4. Publishing API&nbsp;doesn't include `base_path`, `rendering_app`, `redirects` and `routes` in the GET responses and message queue payload  
23 |   
24 | 
25 | Open questions
26 | 
27 | - Do we add a explicit attribute on the presented content item that it's addressable/non-addressable?
28 | - Can all message queue consumers cope with non-addressable content?
29 | - Are there any other places where we assume that content items have a base path?
30 | 
31 | &nbsp;
32 | 
33 | &nbsp;
34 | 
35 | &nbsp;
36 | 
37 | 


--------------------------------------------------------------------------------
/rfc-057-default-values-for-application-secrets.md:
--------------------------------------------------------------------------------
 1 | ## Problem
 2 | 
 3 | We discovered that the EFG application had 2 Devise configuration files in production. This is because it uses alphagov-deployment to copy configuration files into place during deployment. Each of these Devise configuration files had a `secret_key` value (one was in the public repo and used for development, the other was from alphagov-deployment). By luck, we were using the value of the `secret_key` which was not public (this was due to the order that Rails loaded the initializers).
 4 | 
 5 | If we had been using the public value of the secret key this would have been a security incident for GOV.UK which may have resulted in all 3000 EFG users having their passwords reset.
 6 | 
 7 | ## Proposal
 8 | 
 9 | "Secrets" are defined as any values which would result in a security incident if disclosed. These can include, but are not limited to, cookie encryption seeds and password encryption seeds.
10 | 
11 | ### Configuration
12 | 
13 | This is a reinforcement of [RFC 26: 12-Factor Rails Apps](https://gov-uk.atlassian.net/wiki/display/GOVUK/RFC+26%3A+12-Factor+Rails+Apps). Applications must retrieve their secrets from the environment, not from files.
14 | 
15 | ### Default values for secrets
16 | 
17 | &nbsp;Application code must not contain default values for secrets. For example:
18 | 
19 | - 
20 | ```
21 | Bad: GovernmentApp::Application.config.secret_token = 'ohqu1iejohTh9oophieFah9UeDik0neixizeeVooSuush1xu'
22 | ```
23 | - 
24 | ```
25 | Bad: GovernmentApp::Application.config.secret_token = ENV['SECRET_TOKEN'] || 'ohqu1iejohTh9oophieFah9UeDik0neixizeeVooSuush1xu'
26 | ```
27 | - 
28 | ```
29 | Good: GovernmentApp::Application.config.secret_token = ENV['SECRET_TOKEN']
30 | ```
31 | 
32 | ### Maintaining dev-prod parity for secret configuration
33 | 
34 | Secrets must be configured in the environment for development as well as in production. An acceptable alternative to using environment variables in development is to use the [`config/secrets.yml` functionality available in Rails 4.1 and later](http://edgeguides.rubyonrails.org/4_1_release_notes.html#config-secrets-yml):
35 | 
36 | ```
37 | development: secret_key_base: development_example_secret_keyproduction: secret_key_base: ENV['SECRET_KEY_BASE']
38 | ```
39 | 
40 | This maintains close enough configuration between development and production without needing environment variables in every environment.
41 | 
42 | ### Default behaviour when secrets are not present
43 | 
44 | If a piece of secret configuration being missing could result in a security incident, applications must fail to start if it is not present at boot or is set to an empty string. A good example of this is an empty cookie token, which in a bad framework could result in unencrypted cookies. For example:
45 | 
46 | ```
47 | if Rails.application.secrets[:secret_key_base].blank? raise 'Required setting for secret_key_base is not set'end
48 | ```
49 | 
50 | 


--------------------------------------------------------------------------------
/rfc-086-draft-stack-rummager.md:
--------------------------------------------------------------------------------
 1 | # Draft Stack Rummager
 2 | 
 3 | ## Summary
 4 | 
 5 | RFC 86 proposes setting up a instance of [Rummager][rummager] for the
 6 | [draft stack][draft-stack-docs].
 7 | 
 8 | ## Problem
 9 | 
10 | The [draft stack (content preview)][draft-stack-docs] part of GOV.UK
11 | relies in places on the [rummager service (Search API)][rummager].
12 | 
13 | There is only one instance of this service, which is primarily used
14 | for the live stack, but is also currently used from some services on
15 | the draft stack (see [govuk-puppet][puppet-frontend-rummager-config]).
16 | 
17 | The current Rummager instance may expose some associations between
18 | live content and draft content (e.g. draft mainstream browse pages or
19 | taxons), depending on the internal index which is used for the live
20 | content. However, the current trend is that associations with draft
21 | content will be supported for less live content, as more types of
22 | content are migrated to the `govuk` index (see [ADR
23 | 4][rummager-adr-4]), as the process of populating this index does not
24 | involve fetching draft content from the Publishing API.
25 | 
26 | Having some way of including draft content in the responses from
27 | Rummager on the draft stack would make the draft stack more
28 | useful. Previewing pages that use Rummager for some of the data won't
29 | give an accurate representation of what the page would look like if
30 | the relevant content is published.
31 | 
32 | ## Proposal
33 | 
34 | The proposal made here to improve the user experience on the draft
35 | stack, is to setup a draft instance of rummager, populated with the
36 | appropriate content via the Publishing API, and then switch to using
37 | this from draft services.
38 | 
39 | ## Action Plan
40 | 
41 |  1. Deploy rummager to the draft stack
42 |     - Involves changing `govuk-puppet`
43 |     - Possibly requires some new machines
44 |  2. Send draft content to a new `draft_documents` RabbitMQ Exchange
45 |     - From the Publishing API, for at least the PutContent, and
46 |       Unpublish commands
47 |     - Don't send any access limited content, to avoid Rummager
48 |       indexing it
49 |  3. Subscribe the draft instance of Rummager to the `draft_documents`
50 |     exchange, as well as the existing `published_documents` exchange
51 | 
52 | If the above work is done, at this point, there will be a draft
53 | Rummager service. To switch the draft frontend apps to use this
54 | instead, the next step is to make an assessment regarding whether it
55 | will work better than the live instance. This will probably depend on
56 | how much content has been migrated to the `govuk` index.
57 | 
58 | [rummager]: https://github.com/alphagov/rummager
59 | [draft-stack-docs]: https://docs.publishing.service.gov.uk/manual/content-preview.html
60 | [puppet-frontend-rummager-config]: https://github.com/alphagov/govuk-puppet/blob/3a874ba1afec98c0aeb7f34c9fe34128340e7363/modules/govuk/manifests/node/s_draft_frontend.pp#L24
61 | [rummager-adr-4]: https://github.com/alphagov/rummager/blob/master/doc/arch/adr-004-transition-mainstream-to-publishing-api-index.md
62 | 


--------------------------------------------------------------------------------
/rfc-058-publishing-api-events.md:
--------------------------------------------------------------------------------
 1 | ## Problem
 2 | 
 3 | An event is created for every request that comes into the publishing API.
 4 | These events are used as a form of Event Sourcing to track the application state
 5 | as a series of events over time, and can be used to reconstruct past states.
 6 | 
 7 | One of the problems is the sheer number of events that go through the
 8 | publishing API, roughly growing at 600MB a month, currently sitting at 6GB,
 9 | 4.2 million events, which has implications for replication, and for developers
10 | copying that data to their local environments.
11 | 
12 | The other problem being that the event information, as it stands, starts losing
13 | its effectiveness over time, with code changes, schema changes and API version
14 | changes, the ability of being able to replay the events from scratch gets lost.
15 | 
16 | ## Proposal
17 | 
18 | The event information is useful for replaying events and debugging but we would
19 | 
20 | like to archive the events in the events table, but with still having ability to:
21 | 
22 | 1. Easily search through events for debugging
23 | 2. The ability to retrieve events, to replay in the case of failure.
24 | 3. Store future events
25 | 
26 | ### Option A:
27 | Start storing events in Elasticsearch.
28 | 
29 | This would imply that we no longer store events in the Publishing API, but rather
30 | every time an event occurs, we asynchronously store it in Elasticsearch. We would
31 | need to decouple events from any command, and store the event in the same
32 | transaction. Which has further implications of sending versioned data downstream
33 | to the content store, which currently is managed by the event id.
34 | 
35 | pros:
36 | 
37 | - Ease of storing and retrieving events and rich query language makes this useful
38 | for debugging
39 | - No development time needed to build an interface (Kibana)
40 | - Already part of our infrastructure
41 | - Easily retrieve events to replay
42 | 
43 | cons:
44 | 
45 | - Reliability, if someone accidentally deletes the index
46 | - Effectively becomes the primary data source of events
47 | - Might also need archiving as a backup
48 | 
49 | ### Option B:
50 | Log payload params directly to Logstash
51 | 
52 | This would mean simply logging the request params in the log, and loses any
53 | concept of an event
54 | 
55 | pros:
56 | 
57 | - Very little work initially
58 | - No other Elasticsearch clusters
59 | 
60 | cons:
61 | 
62 | - Difficult to backdate events
63 | - More difficult to replay events
64 | - Not as easy to query since it will be interspersed with other data
65 | 
66 | 
67 | ### Option C:
68 | 
69 | Another Postgres DB in the Publishing API
70 | 
71 | By establishing a different connection in the Events model
72 | [http://api.rubyonrails.org/classes/ActiveRecord/Base.html](http://api.rubyonrails.org/classes/ActiveRecord/Base.html)
73 | 
74 | pros:
75 | 
76 | - Same infrastructure as current Publishing API
77 | - Little to change
78 | 
79 | cons:
80 | 
81 | - Doesn't solve the problem of data replication
82 | - Increases complexity
83 | 
84 | 
85 | ### Other Options:
86 | 
87 | Archive to S3 and carry on as normal
88 | 


--------------------------------------------------------------------------------
/rfc-091-sharing-assets.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: unclear
 3 | implementation: abandoned
 4 | status_last_reviewed: 2024-03-04
 5 | status_notes: We're currently thinking about consolidating apps as a better alternative to changing the way we share assets.
 6 | ---
 7 | 
 8 | # Sharing assets
 9 | 
10 | ## Summary
11 | 
12 | We need to share assets to make sure users only have to download a thing once. To do this, we'll upload each application's assets to the same location.
13 | 
14 | ## Problem
15 | 
16 | In [RFC84 we proposed replacing Static with a gem](https://github.com/alphagov/govuk-rfcs/pull/84). One of the architectural challenges with this approach is that by moving assets (like CSS, JS and images) into the gem, the assets won't be shared for users. This means that users visiting GOV.UK would have to download assets multiple times, depending on what app renders the page.
17 | 
18 | ## Proposal
19 | 
20 | We upload the assets to a central location. For example, if both `collections` and `government-frontend` have a file named `foo.css`, they will upload the file to `assets.publishing.service.gov.uk/shared/foo.css`.
21 | 
22 | Because [Rails fingerprints assets](http://guides.rubyonrails.org/asset_pipeline.html#what-is-fingerprinting-and-why-should-i-care-questionmark), the asset actually uploaded will be `assets.publishing.service.gov.uk/shared/foo-8d811b8c3badbc0b0e2f6e25d3660a96cc0cca7993e6f32e98785f205fc40907.css`, unless the files are different, in which case they'll have different fingerprints and won't overwrite.
23 | 
24 | As part of this solution we'd need a way to ensure or encourage different frontend apps to use assets in a way that is shareable. For example, applications could include a separate stylesheet with all the component styles (`components.css`), which would be shared across apps.
25 | 
26 | How to achieve this:
27 | 
28 | * In govuk-app-deployment, we configure capistrano to precompile the assets and then upload the resulting files to a central location.
29 | * Proxy assets.publishing.service.gov.uk/shared to this location
30 | * Point applications to use this URL as asset host so that it's used in production
31 | 
32 | Upsides
33 | 
34 | * We remove Nginx config to proxy to individual applications, perhaps removing some hops
35 | * Whenever there's an asset to share, it will be shared automatically. If an application copies an image from another app, on production they'll have the same URL.
36 | * Each application ensures that its own assets are present on the shared bucket.
37 | * Applications don't serve files anymore.
38 | 
39 | Downsides
40 | 
41 | * We'll end up with a directory full of assets which are of unknown origin.
42 | 
43 | ## Alternatives
44 | 
45 | ### 1. Do nothing
46 | 
47 | The simplest solution is to not share assets between applications. This would have significant impact on the user, especially if the font will be distributed in the gem.
48 | 
49 | ### 2. Deploy the gem
50 | 
51 | This would work roughly that every time we publish a new gem version, we'll deploy the resulting assets to a central place. In production apps point to that shared asset URL instead of
52 | 
53 | * Disparity between development and production
54 | * Inevitable deploy dependency - the gem deploy needs to happen before the application is deployed
55 | 


--------------------------------------------------------------------------------
/rfc-050-do-end-to-end-testing-of-gov-uk-applications.md:
--------------------------------------------------------------------------------
 1 | &nbsp;
 2 | 
 3 | &nbsp;
 4 | 
 5 | ---
 6 | status: "OPEN"
 7 | notes: "Open for comments and questions"
 8 | ---
 9 | 
10 | ## Background
11 | 
12 | We have almost finished migrating Specialist Publisher onto the Publishing Platform.
13 | 
14 | In doing so, we've written lots of tests to ensure the application meets its requirements.
15 | 
16 | The majority of our tests follow this pattern:
17 | 
18 | 1. Stub external services to return a canned response
19 | 2. Perform some action in the publishing application
20 | 3. Assert that the correct requests were made to external services
21 | 
22 | ## Problems
23 | 
24 | **Unable to test user-journeys**
25 | 
26 | Because we're returning canned responses, we're unable to test user-journeys within the publishing app. We **can't** write tests like this:
27 | 
28 | 1. Visit the new document page
29 | 2. Create a new document
30 | 3. Assert that the document is visible in the interface
31 | 
32 | We can't write these tests because no state is preserved after we have created the document. The next time a request is made to get that document, it will return the original response that the stub set up and won't reflect that change that would have happened in the Publishing API if we were interacting with the application for real.
33 | 
34 | The same is true for all user-journeys in the application. There are no tests that span more than a single page of the publishing app..
35 | 
36 | **Unable to test side-effects**
37 | 
38 | Similarly, we are unable to test that side-effects actually happen. We have no way to test that an email has actually been sent. Side-effects include:
39 | 
40 | 1. Checking that draft content is visible
41 | 2. Checking that published content is visible
42 | 3. Checking that an email has been sent
43 | 4. Checking that documents appear in the search index
44 | 
45 | The closest we can get is to assert that we have sent requests to the appropriate services, but this doesn't actually check that these things have happened.
46 | 
47 | **Interface brittleness**
48 | 
49 | When interfaces change on those external services, all of our tests will continue to pass, but the application will be broken.
50 | 
51 | We may be able to do lots of manual testing to plug the holes that we have with end-to-end testing, but that doesn't account for changes to interfaces that might happen in the future.
52 | 
53 | ## Proposal
54 | 
55 | I propose that we consider how to do end-to-end testing of applications on GOV.UK.
56 | 
57 | I think that we should have a high-level test-suite that can orchestrate more than one application and make assertions at external boundaries of the system, such as user-interfaces or email inboxes.
58 | 
59 | Ideally, it would treat the system as a black-box and only care about the user-facing behaviour of the system, giving us the freedom to change the internals of the system independently of this end-to-end testing process.
60 | 
61 | When developing applications, writing these high-level tests should be an inherent part of that process, rather than an after-thought.
62 | 
63 | ## Solution
64 | 
65 | I don't want to prescribe what the technical solution should be.
66 | 
67 | If this RFC is accepted, I think we should have a separate forum to figure out what to do.
68 | 
69 | &nbsp;
70 | 
71 | &nbsp;
72 | 
73 | 


--------------------------------------------------------------------------------
/rfc-165-add-api-endpoints-to-transition.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: to_be_deleted
 3 | implementation: not_planned
 4 | status_last_reviewed: 2024-03-04
 5 | status_notes: >
 6 |   People aren't clear whether the value of doing this outweighs the cost of implementing it.
 7 |   Given that disagreement, it seems unlikely that any team is going to pick this up.
 8 | ---
 9 | 
10 | # Add API endpoints to Transition for consumption by Bouncer
11 | 
12 | ## Summary
13 | 
14 | This proposes that the Bouncer application should access transition data via
15 | API endpoints on Transition instead of via direct database access.
16 | 
17 | ## Problem
18 | 
19 | The [Transition
20 | System](https://docs.publishing.service.gov.uk/manual/transition-architecture.html)
21 | is built to transition government websites to GOV.UK. It includes:
22 | - Transition - Ruby on Rails application used by admins to create mappings from
23 |   old URLs to pages on GOV.UK.
24 | - Bouncer - Rack-based application that uses the mappings created by Transition
25 |   and handles requests to those old domains.
26 | 
27 | Currently, Bouncer retrieves the mappings written by Transition through direct
28 | read-only access to the database shared by the two applications.
29 | 
30 | This is problematic as any code related to database access must be kept in sync
31 | between Transition and Bouncer.
32 | This includes the use of ActiveRecord; we must take the effort to keep
33 | ActiveRecord subclasses and dependency versions consistent.
34 | 
35 | As this approach is unusual in the wider context of GOV.UK, this also has the
36 | potential to make working with this system more difficult to grasp and process
37 | for software developers. The additional cognitive load that this places on
38 | individuals and teams may hurt productivity and make tasks such as onboarding
39 | new developers more difficult.
40 | 
41 | ## Proposal
42 | 
43 | Bouncer should retrieve data from Transition via an API. This involves:
44 | - Adding API endpoints that allow us to retrieve the minimum amount of data for
45 |   Bouncer to function.
46 | - Adding adapters to GDS API Adapters.
47 | - Removing database dependencies from Bouncer.
48 | 
49 | There has been a push recently to fully separate our frontend and backend apps.
50 | This approach would fit in well with this theme, essentially turning Transition
51 | into a publishing app and Bouncer into a platform concern.
52 | 
53 | This approach is an alternative to merging to the Bouncer codebase into
54 | Transition entirely. Merging the applications is a more complex task that
55 | involves:
56 | - Rewriting large parts of Bouncer as it's currently not a Rails app.
57 | - Handling of routes that are the same in Bouncer and Transition.
58 | - Dealing with infrastructure complexity - references to Bouncer are littered 
59 |   through the infrastructure.
60 | 
61 | The documentation in the Transition and Bouncer repositories must be updated to
62 | reflect these changes.
63 | 
64 | ## Consequences
65 | 
66 | - Bouncer's dependency on the database shared with Transition is removed.
67 | - Bouncer's codebase will be less complex.
68 | - This approach will likely be less performant than the current approach. We
69 |   plan to take this into consideration during development by identifying and
70 |   measuring relevant performance metrics.
71 | - Bouncer depends on the availability of Transition's API.
72 | 


--------------------------------------------------------------------------------
/rfc-159-switch-off-whitehall-apis.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-04
 5 | ---
 6 | 
 7 | # Switch off Whitehall's public APIs
 8 | 
 9 | ## Summary
10 | 
11 | Whitehall has a number of public APIs, as listed below:
12 | 
13 | - `/api/governments`
14 | - `/api/governments/{slug}`
15 | - `/api/world-locations`
16 | - `/api/world-locations/{slug}`
17 | - `/api/world-locations/{slug}/organisations`
18 | - `/api/worldwide-organisations/{slug}`
19 | 
20 | These APIs are not advertised publicly, except in [some documentation in Whitehall’s repository](https://docs.publishing.service.gov.uk/repos/whitehall/api.html) and in an [‘alpha’ API catalogue](https://www.api.gov.uk/gds/gov-uk-governments/#gov-uk-governments). Only one of these endpoints (`/api/world-locations`) is used internally by GOV.UK applications (via. the `gds-api-adapters` gem).
21 | 
22 | Usage is low, with the vast majority of traffic from robots. Over a 14 day period in April/May 2023, the number of hits to origin machines was as follows (after excluding those from our team whilst investigating these APIs):
23 | 
24 | - `/api/governments`: 114 hits on index page
25 | - `/api/government/{slug}`: 0 hits
26 | - `/api/world-locations`: 290 hits (excluding requests from other GOV.UK apps)
27 | - `/api/world-locations/{slug}/organisations`: 0 hits
28 | - `/api/worldwide-organisations/{slug}`: 34 hits
29 | 
30 | ## Problem
31 | 
32 | Publishing Platform are aiming to switch off Whitehall Frontend (and remove all of the associated rendering code from the application) before the end of Q1 2023. The rendering of these APIs would need to be migrated into other applications, which would involve development cost. The traffic is very low, therefore the development cost would outweigh the benefit of migrating these to another application.
33 | 
34 | However users are already able to retrieve the same data through alternative means, except for one of these endpoints:
35 | 
36 | - `/api/governments` → not available (we could probably add this into a content item, if there’s a user need to maintain this without having a specific API for it)
37 | - `/api/government/{slug}` → `/api/content/government/{slug}`
38 | - `/api/world-locations` → `/api/content/world` (available soon, once we’ve migrated the page out of Whitehall)
39 | - `/api/world-locations/{slug}` → `/api/content/world/{slug}`
40 | - `/api/world-locations/{slug}/organisations` → `/api/search.json?filter_format=worldwide_organisation&fields=title,format,updated_at,link,slug,world_locations` (although this doesn’t filter by World Location, so the user will need to filter this themselves once they’ve got the results)
41 | 
42 | ## Proposal
43 | 
44 | Update gds-api-adapters to not need the `/api/world-locations` endpoint of the Whitehall API and return the world locations (and their organisations) by using the relevant alternatives listed above instead. This will end our internal reliance on these APIs.
45 | 
46 | Switch off the APIs without migrating to another application in a two-step process:
47 | 
48 | 1. Replace the existing APIs with a ‘gone’ response and provide content that states how users are able to access the same information (as detailed in the ‘Problem’ section above).  This will be in place for one month.
49 | 2. After one month, switch off the APIs and return a ‘gone’ response to any requests with no content.
50 | 


--------------------------------------------------------------------------------
/rfc-023-putting-detailed-guides-paths-under-guidance.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-06
 5 | ---
 6 | 
 7 | # Putting Detailed Guide paths under /guidance
 8 | 
 9 | ## Problem
10 | 
11 | Currently detailed guides live at urls like: [https://www.gov.uk/british-forces-overseas-posting-cyprus](https://www.gov.uk/british-forces-overseas-posting-cyprus)  
12 | This can cause problems because editors are able to create documents and claim high-profile slugs which shouldn't really be used for guidance  
13 | Instead they should be published as https://www.gov.uk/guidance/[thing]
14 | 
15 | Note that currently manuals are also published under `/guidance` so we will need to check using url arbiter or equivalent to ensure the path can be claimed.
16 | 
17 | ## Proposal
18 | 
19 | ### Ensuring new detailed guides are served under guidance/
20 | 
21 | in Whitehall, adding 'guidance/' in front of the detailed\_guides#show route in routes.rb. However keeping the old route live to cover deploy time (to be removed 30mn+ after deploy). It will be the same as the current route,&nbsp;without 'as:detailed\_guides'.
22 | 
23 | in Whitehall, update&nbsp;the [presenter](https://github.com/alphagov/whitehall/blob/master/app/models/registerable_edition.rb#L26-L32)for sending the paths to panopticon to reflect the changes in the paths
24 | 
25 | In Panopticon, we&nbsp;might also need to [update the slug validation code](https://github.com/alphagov/govuk_content_models/blob/master/app/validators/slug_validator.rb) as it may not accept detailed\_guide artefacts with a `/` in the slug  
26 | 
27 | NB: in the routes, /specialist/detailed-guide-slug redirects to root/detailed-guide-slug, so as another story we should make it redirect to /guidance/detailed-guide-slug directly.  
28 | NB: also, in govuk\_content\_models, we will need to disallow URLs at the root once the migration of old guides is complete  
29 |   
30 | ### Ensuring the existing detailed guides are served under guidance/
31 | 
32 | Panopticon migration to reslug all detailed guides and avoid creating duplicates
33 | 
34 | We need to create a rake task to republish to publishing-api.&nbsp;There is already a&nbsp;[PublishingApiRepublisher](https://github.com/alphagov/whitehall/blob/master/lib/data_hygiene/publishing_api_republisher.rb)&nbsp;class in&nbsp;lib/data\_hygiene that takes an edition scope (eg DetailedGuide.published) and republishes them - we would need to call that from the rake task.
35 | 
36 | In Whitehall, run the [rummager::reset::detailed](https://github.com/alphagov/whitehall/blob/master/lib/tasks/rummager.rake#L44) rake task to reindex the detailed guides in search.
37 | 
38 | In collections-publisher, run a data migration to update all api-urls for when detailed guidance is curated into topics
39 | 
40 | ### Redirecting old paths of existing detailed guides
41 | 
42 | Several options:
43 | 
44 | - extract the URLs to a CSV and add them into router-data
45 | - in Whitehall, add a "was\_previously\_under\_root" boolean attribute to the DetailedGuide model; for&nbsp;each detailed guide which has this attribute as true,&nbsp;push a redirect to the Publishing API
46 | - add redirects as a model so that a detailed guide has a redirects association, and the redirects are pushed to the Publishing API when a detailed guide is published.
47 | 
48 | ## To do
49 | 
50 | Arrange the above steps in such a way that nothing breaks.
51 | 


--------------------------------------------------------------------------------
/rfc-022-putting-elasticsearch-backups-in-s3.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-06
 5 | ---
 6 | 
 7 | # Putting Elasticsearch backups into S3
 8 | 
 9 | ## Problem
10 | 
11 | Currently, we don't have good (any?) backups for our elasticsearch indexes. &nbsp;This is particularly critical for the index which powers the main site search, since it takes at least 2 hours to rebuild the indexes for the "/government" content from scratch, and requires lots of separate apps to be prodded to rebuild the indexes for the mainstream index.
12 | 
13 | Our current mechanism for syncing the search indexes to other environments, and to developer machines, is:
14 | 
15 | - slow (can take 90 minutes to perform the sync, because it rebuilds the indexes from JSON, and we now perform lots of complicated analysis in elasticsearch)
16 | - frequently fails (lots of dependencies and moving parts)
17 | - hard to maintain
18 | - requires ssh access to preview machines to get developer builds
19 | - developers can't get an up-to-date copy of the search index - latency of at least a day (more when nightly jobs have failed)
20 | 
21 | ## Proposal
22 | 
23 | Since elasticsearch 1.0, elasticsearch has supported&nbsp;["Snapshot and Restore"](https://www.elastic.co/guide/en/elasticsearch/reference/1.4/modules-snapshots.html), which allows a snapshot of a search index to be copied to either a shared filesystem, or to amazon S3. &nbsp;I propose that we use S3, for its ease of setup, and because it will allow us to share the resulting backups easily.
24 | 
25 | &nbsp;
26 | 
27 | In detail:
28 | 
29 | - snapshots are configured per-index.&nbsp;each index should have a snapshot set up to copy it from production to S3.
30 | - the snapshot should be updated frequently (eg, every 5 minutes)
31 | - the snapshots should be&nbsp;made readable to everyone (ie, no access control needed to access the snapshots).&nbsp;This is much more convenient than having to give all developers access to a special set of S3 keys. All the indexes which currently exist contain only publically available content, so there is no confidentiality issue with this right now. If draft ("preview") content is put into the search system, it should be put into a separate elasticsearch index, which will **not** be made publically readable - a separate S3 bucket would be used for this, and this might require accreditation approval since it would not be public data.
32 | - the sync process from production to staging and preview environments will be replaced with a restore process. &nbsp;This can be made "atomic" (I'm not sure if the restore process is atomic by default, but if not we can make it restore to a new index, and use aliases to switch it in place once the restore complete, much as we currently do).
33 | - we should also ensure that the scripts for restoring to staging and preview work for restoring to production, for disaster recovery.
34 | - sync to developer machines should also be replaced with using the restore process
35 | 
36 | As well as fixing existing problems with syncing, this proposal has the following benefits:
37 | 
38 | - We would have working backups of the search index, and since we'd be using them for data sync, we'd notice quickly if the backups stopped being viable.  
39 | - Being open about this should enable other uses to be made of the data (for example, I know academics who would love to have a local copy of the GOV.UK search index so that they can perform experiments on it).
40 | - Future upgrades of elasticsearch should become much easier and safer, since we could easily revert an upgrade (and go back to an earlier index snapshot), and could also perform an&nbsp;upgrade by building a new cluster and switching over easily, if desired.
41 | 
42 | &nbsp;
43 | 
44 | &nbsp;
45 | 
46 | 


--------------------------------------------------------------------------------
/rfc-042-testing-backend-applications-against-production-traffic.md:
--------------------------------------------------------------------------------
 1 | &nbsp;
 2 | 
 3 | &nbsp;
 4 | 
 5 | ---
 6 | status: "OPEN"
 7 | notes: "Open for review"
 8 | ---
 9 | 
10 | ## Problem
11 | 
12 | We replay frontend traffic from production to staging to expose problems with new code during deploy. We do not replay backend traffic of any kind, and we only replay frontend traffic for GETs.
13 | 
14 | We would like a way to expose new code to backend traffic from production.
15 | 
16 | Traffic which is not currently replayed falls into 2 groups:
17 | 
18 | 1. Backend GET requests
19 | 2. Frontend and backend non-GET requests
20 | 
21 | Category 1 is easy to fix as the GET requests do not change state or cause any side-effects. Category 2 is more difficult.
22 | 
23 | Staging's data is not kept consistent with production because:
24 | 
25 | 1. Data is sync'd nightly from production to staging
26 | 2. Developers often make small modifications in staging when testing a deploy.
27 | 
28 | Replaying requests if the data doesn't match could cause errors or other behaviour which would not happen in production.
29 | 
30 | Floods of errors could also make life tougher during deploys, masking more important problems with staging.
31 | 
32 | While we could just turn on the traffic replay and see what happens, but we'd rather have something more structured.
33 | 
34 | ## Proposal
35 | 
36 | Github have a gem called&nbsp;[Scientist](https://github.com/github/scientist)&nbsp;for A/B testing critical code paths. This is in contrast to other libraries which aim to increase conversion.
37 | 
38 | The gem checks the result of the old code against the new, finding inconsistencies and performance problems.
39 | 
40 | The catch is that the code has to be free of side effects like writing to the database. &nbsp;Running code with side effects twice would lead to unexpected behaviour and bad data.
41 | 
42 | If we wanted to run these two paths of side-effect–having code we would need separate environments for each path. &nbsp;Our small- and micro-services can help us here, as we can run separate instances of applications which are changing and use the GDS API adapters as our point of test.
43 | 
44 | Another problem is that we don't want performance regressions to slow down our A-branch. &nbsp;Currently Scientist runs both paths in the same thread and checks both results at the end. &nbsp;We'd want to offload the B-branch to a worker and have the A-branch upload its results to a shared database (like Redis) so the worker can do the science outside the main request/response cycle.
45 | 
46 | This also provides time to do the more sophisticated checks that might be required to make sure a change hasn't caused problems. &nbsp;Have the right number of emails been sent (via a sandbox GovDelivery account)? &nbsp;What does the database look like? &nbsp;etc.
47 | 
48 | #### Required work
49 | 
50 | - We would need to make spinning up applications in standalone environments easier and more automated. &nbsp;We discussed changing Puppet to easily spin up a "B-branch" version of an app and its dependencies, but that seems to be as much or more work than being able to spin up in AWS or via Docker, so we'll likely take the latter approach.
51 | - We would need to write some helper code to wrap Scientist in a way that enables the testing of side-effect code. &nbsp;This would likely take the form of a standard worker which can be expanded upon and some entry point code for the API adapters.
52 | 
53 | The strong benefit of this whole approach is to safely test alternate branches from the production environment. &nbsp;Since we'll be able to target our testing more carefully for both side-effect and functional code it might also remove the need for a fully-duplicated staging environment.
54 | 
55 | &nbsp;
56 | 
57 | &nbsp;
58 | 
59 | 


--------------------------------------------------------------------------------
/rfc-036-stop-preserving-order-for-links.md:
--------------------------------------------------------------------------------
 1 | ## Problem
 2 | 
 3 | A links hash in a content item looks like this:
 4 | 
 5 | `"links": {`  
 6 | `  "lead_organisation": ['ORG-CONTENT-ID'],`  
 7 | `  "organisations": ['ORG-CONTENT-ID', 'ANOTHER-ORG-CONTENT-ID'],`  
 8 | `  "topics": ['TOPIC-CONTENT-ID'],`  
 9 | `  "available_translations": [... automatically generated ...]`  
10 | `}`
11 | 
12 | The array of links is currently guaranteed to preserve its order when sent to the publishing-api. From the content-store documentation:
13 | 
14 | > The list\_of\_links is an array of content items, order is preserved.
15 | > 
16 | > [https://github.com/alphagov/content-store/blob/4b4a82a279de11a2af27b05dfc61d5f18a250c75/doc/content\_item\_fields.md#links](https://github.com/alphagov/content-store/blob/4b4a82a279de11a2af27b05dfc61d5f18a250c75/doc/content_item_fields.md#links)
17 | 
18 | The order preserving complicates the following things:
19 | 
20 | 1. **Complicates tagging tools**. We are building a generic tagging tool that defines the relationships between content items. It shouldn't be concerned with how these relationships are presented on the site.
21 | 2. **Complicates implementation.** &nbsp;The publishing-api currently saves the links array and forwards it without manipulating it. To build a more flexible system, the links are being extracted into it's own table ([https://trello.com/c/zppxFP6p](https://trello.com/c/zppxFP6p)). We'll lose the "free" preservation with that change and will have to add code specifically to preserve the ordering.
22 | 3. In most cases, the ordering of the links should be a **presentation concern** anyway. For example, the [collections-publisher app sorts the related topics by title](https://github.com/alphagov/collections-publisher/blob/37830fd561b9cd8c212a9c63b126ed93bb655dc1/app/presenters/mainstream_browse_page_presenter.rb#L15) before sending the links to the publishing-api, which effectively reserves the `related_topics` for this use. Contrived example: if we were to use the related\_topics on a prototype sorted chronologically, it would need to "override" the ordering specified in collections-publisher. It would be confusing that sometimes the ordering is defined on the frontend, and sometimes by the publisher.
23 | 4. It's **easily abused to add meaning**. We use the first item in the `sections` tag in govuk\_content\_models for the breadcrumb ([code](https://github.com/alphagov/govuk_content_models/blob/master/app/traits/taggable.rb#L29-L48)). This means we can't easily query "pages that have x as their breadcrumb".  
24 | 5. It may make&nbsp; **bulk tagging** &nbsp;more difficult. (we don't have a specific plan for that, but I can imagine a case where a bulk-action of "remove mainstream browse page tag x from these content items" would change the breadcrumb for some items, but not others)  
25 |   
26 | 
27 | ## Proposal
28 | 
29 | - We stop guaranteeing the order of the links.
30 | - During the tagging migration we get rid of the usage of the first item as breadcrumb. &nbsp;
31 | 
32 | ## Impact
33 | 
34 | - Audit pages using links to make sure nothing is using it.
35 | 
36 | ## Impact on breadrumbs
37 | 
38 | Our proposal is to&nbsp;add a&nbsp;`parent`&nbsp;tag-type and populate it&nbsp;[during the tagging migration](https://github.com/alphagov/panopticon/blob/8d0c3bf8fe013ad06a61a6adb4f773ee6b3e60f5/lib/tagging_migrator.rb#L31)&nbsp;with the primary mainstream browse page. We also make&nbsp;sure this tag is merged back into the section tags ([in the TagUpdater](https://github.com/alphagov/panopticon/blob/893857e2eb7c1f21e7382f761dde806fdd2cd8b0/app/queue_consumers/tagging_updater.rb#L46)) to keep current breadcrumbs intact.
39 | 
40 | This&nbsp;`parent`&nbsp;type would then be usable by all apps to populate a breadcrumb.&nbsp;Content-store would recursively resolve all the parents to return this in the item:
41 | 
42 | &nbsp;
43 | 
44 | &nbsp;
45 | 
46 | &nbsp;
47 | 
48 | &nbsp;
49 | 
50 | 


--------------------------------------------------------------------------------
/rfc-044-unpublishing-content-items.md:
--------------------------------------------------------------------------------
 1 | &nbsp;
 2 | 
 3 | &nbsp;
 4 | 
 5 | ---
 6 | status: "IN DEV"
 7 | notes: "Development has started (2nd May, 2016)"
 8 | ---
 9 | 
10 | ## Problem
11 | 
12 | The workflow around unpublishing and withdrawing content items in the publishing-api is unclear and does not cover all the use cases of the different publishing apps. We need to simplify the processes, while making sure that all current actions are supported.
13 | 
14 | There are various different types of withdrawal action. They can be grouped as follows:
15 | 
16 | 1. **Withdrawing with a message** : This is used for content that was previously valid but is no longer current. The content continues to display, with the addition of a large message block at the top stating that the content is withdrawn, and it is no longer findable in search. (Whitehall)
17 | 2. **Unpublishing with a message** : The content is removed from the site, and an explanatory message is displayed at that URL instead. The message can optionally contain an alternative URL which is displayed as a link, but does not redirect automatically.&nbsp;(Whitehall)
18 | 3. **Unpublishing with a redirect** : The page is replaced by an automatic redirect to a different URL. (Whitehall, Publisher)
19 | 4. **Unpublishing with a Gone** : The page is replaced by a 410 Gone page. (Publisher, Specialist Publisher)
20 | 5. In addition, Whitehall sends a Gone for items that were scheduled to be published but then unscheduled, as long as there is no previously published version, and when non-edition items such as organisations or persons are deleted.
21 | 
22 | &nbsp;
23 | 
24 | Proposal
25 | 
26 | Currently it is up to the publishing apps to send the relevant formats to publishing-api in each of these cases above. This involves sending a new draft item - of the original format updated to include the message for case 1, an Unpublishing item for case 2, a Redirect for case 3 and a Gone for cases 4 and 5 - and then a second call to publish.
27 | 
28 | We recommend the following steps:
29 | 
30 | 1. **Clarify our language**. &nbsp;We adopted "withdrawing" as a verb in the publishing API when really that should be constrained to occasions when the content remains on the site but is removed from the canon of the current government. &nbsp;Everything else is "unpublishing".
31 | 2. **Flag documents as withdrawn** &nbsp;from the publishing app rather than supporting "withdrawn" as a first-class state, so the front-ends can render the appropriate banner. &nbsp;This should be handled the same way when we do "historic" documents.
32 | 3. **Refuse to clobber existing documents**. &nbsp;We currently allow gones/redirects to take the place (base\_path) of documents, and vice versa. &nbsp;This provides an in-app way to implement unpublishing (and republishing). &nbsp;We should remove the first case of this functionality and require the existing content to be unpublished via the new endpoints&nbsp;`/v2/content/:id/unpublish` or moved by altering its base\_path and publishing (which automatically creates a redirect). &nbsp;Allowing gones/redirects to be clobbered still makes sense as it is a reclamation of an unused path.
33 | 4. **Create `/v2/content/:id/unpublish`** &nbsp;accepting a POST with \<format TBC\>, which returns the content item to draft and creates & publishes the resulting format (gone/unpublishing/redirect). &nbsp;Question: should we autodiscard an existing draft if there is one, or refuse to take action?
34 | 5. **Populate gone items** with enough information to render the page seen after unpublishing, which in the case of Whitehall is the reason for unpublishing and/or alternate URL. &nbsp;This page presentation can then be tidied up and standardised across publishers / frontends.
35 | 6. **Indicate unpublished state** back to the user. &nbsp;We need to be able to indicate that a piece of content was More thought warranted on this.
36 | 
37 | &nbsp;
38 | 
39 | &nbsp;
40 | 
41 | &nbsp;
42 | 
43 | 


--------------------------------------------------------------------------------
/rfc-106-docker-for-local-development.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-04
 5 | ---
 6 | 
 7 | # RFC 106: Use Docker for local development
 8 | 
 9 | ## Summary
10 | 
11 | Adopt a Docker-based approach for local development instead of the Vagrant VM.
12 | 
13 | ## Problem
14 | 
15 | We currently rely on the Vagrant development VM for the day to day dev cycle.
16 | 
17 | - It has [many documented issues](https://docs.publishing.service.gov.uk/manual.html#development-vm)
18 | - Changes to puppet often result in a broken VM, because the infrastructure team don't use it.
19 | - We are adopting more cloud-based services into our stack, like S3 and Amazon's Elasticsearch. This means that our development environment doesn't look like production anymore, but we still need code in govuk-puppet.
20 | - It's hard to keep updated. Running the puppet command will often error and take a long time.
21 | - GOV.UK is moving to a containerised infrastructure (likely the [GDS Supported Platform](https://github.com/alphagov/gsp) (GSP))
22 | 
23 | ## Possible approaches to local development
24 | 
25 | - An approach based on **Docker and docker-compose**. Ben Thorner has a working prototype called [govuk-docker](https://github.com/benthorner/govuk-docker) and we have the [end-to-end tests](https://github.com/alphagov/publishing-e2e-tests) running in the same. Other GDS programmes like GOV.UK Pay use this approach.
26 | - An approach based on **GNU Guix** package manager, which Chris Baines uses via govuk-guix.
27 | - Because they are read-only, frontend applications on GOV.UK can be **run against production** using the startup.sh --live flag.
28 | - Developers on the GOV.UK PaaS work by developing against a **remote set of services** - we’d run many versions of GOV.UK in the cloud, which developers can use in development.
29 | - Applications can be run locally by **installing dependencies manually**. This is what some of GOV.UK’s Linux users do.
30 | - The RE team has been working on local development tooling called **[gsp-local](https://github.com/alphagov/gsp/blob/master/docs/gds-supported-platform/getting-started-gsp-local.md)**.
31 | 
32 | Approach | Pro | Cons
33 | -- | -- | --
34 | Vagrant VM | Can look like production | Resource intensive, brittle
35 | Docker | Aligns with future hosting. Reproducible (with caveats). | Slow because on a Mac it needs a VM. Development won’t match production/CI exactly until we migrate onto a container-based platform.
36 | Guix | Is almost feature complete (runs Signon, TLS) | Technology unknown in GOV.UK
37 | Against production | Low setup, fast | Only works for frontend applications. Still needs local install. Limits you to testing one application at a time.
38 | Remote dependencies | Low setup. Can match Production exactly in terms of technologies used | Expensive, no offline development
39 | Manual install | Fast in development | Lots of setup, brittle
40 | gsp-local | GOV.UK will move to the GSP at some point | Multiple new layers of technology, quite new
41 | 
42 | ## Proposal
43 | 
44 | We'll adopt a Docker based local development environment. Docker is widely supported in the community and is most aligned with our tech strategy. In the long term we might replace it with a local environment of the GDS Supported Platform.
45 | 
46 | ## Impact and follow up work
47 | 
48 | We'll commit to doing these things:
49 | 
50 | - Document how to test puppet locally using Vagrant now that the development VM is going away.
51 | - Come up with a process of ownership so that the tooling improves over time.
52 | 
53 | Things we'll need to look out for:
54 | 
55 | - Make sure we can limit the number of dependencies we run, to avoid having the environment being too resource intensive.
56 | - Our local development will diverge more from our production environment. This might lead to more testing on integration. We should be sure this is viable and doesn't become a blocker if lots of people are testing things at the same time.
57 | 


--------------------------------------------------------------------------------
/rfc-097-verify-specific-start-pages.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-04
 5 | ---
 6 | 
 7 | # Providing alternative GOV.UK service start pages for the GOV.UK Verify single IDP journey
 8 | 
 9 | ## Summary
10 | 
11 | This document describes an approach for providing alternative service start pages 
12 | that will be displayed to users on a GOV.UK Verify single IDP journey.
13 | 
14 | ## Terms
15 | 
16 | * IDP - Identity Provider (also known as a Certified Company)
17 | * RP - Relying Party (also known as a transaction or service, e.g. view your driving license or check your state pension)
18 | 
19 | ## Problem
20 | 
21 | We have recently introduced a new type of journey to Verify. This journey type allows Identity Providers (IDPs) to target their existing customers in order to drive customers into using their identity service with GOV.UK services. Unlike traditional Verify journeys, the user will actually start on a page at, or an e-mail from, a specific IDP where they will choose which service to use. During this journey Verify restricts their ability to choose an alternative IDP. This is know as the Single IDP journey and is described in Verify [RFC-041](https://github.com/alphagov/verify-architecture/blob/master/rfcs/rfc-041-single-idp-journey.md).
22 | 
23 | When following a single IDP journey, the user needs to be redirected via the RP (service) in order to pick up a valid SAML request. Initial thoughts were to direct the RP headless start page, which generates the SAML request and results in an immediate redirect back to the Verify hub.
24 | This approach, however, is not desirable as there is often important information given to the
25 | user on the service's normal start page. Similarly, showing the normal start page is not
26 | desirable as it may give service sign-in options that would direct the user away from Verify and the
27 | user's chosen IDP (for example, if the user chooses to login with Government Gateway).
28 | 
29 | The approach described in this document is to implement alternate service start pages, that
30 | will display all important information to the user but only give them the option of using
31 | Verify and, hence, their chosen IDP. 
32 | 
33 | ### Considerations
34 | 
35 | #### Discoverability
36 | 
37 | Any customised content is designed to be solely reachable, indirectly via the hub, using hyperlinks IDPs publish to their users.
38 | 
39 | Therefore the new content MUST not be indexed and reachable by performing, for example, a Google search or GOV.UK search. 
40 | 
41 | #### Content Schema
42 | 
43 | The GOV.UK content schema used for service start pages is the `transaction` schema, the changes proposed will require changes to this schema which will, potentially, require changes to be made to existing data that conforms to this schema.
44 | 
45 | ## Proposal
46 |   
47 | The solution would be for GOV.UK to support 'variants' of content in the transaction content schema. This should allow a variant of the page to be requested in the URL, as shown, `http://www.gov.uk/<base_url>/<variant-name>`. For example, `http://www.gov.uk/my-rp-transaction-page/verify`.
48 | 
49 | This would require multiple changes to GOV.UK including schemas, publisher and government frontend.
50 | 
51 | Publisher already supports the concept of a 'parted' content item, so the publisher app should be modified to add the 'parted' functionality to the transaction editor. Corresponding changes would also need to be made to the `transaction` content schema to ensure that the publishing API accepts multiple parts. Government frontend would also have to be modified to render the pages correctly.
52 | 
53 | #### Searching
54 | 
55 | Even though the new alternate parts will not be linked from anywhere, it should not be discoverable by crawlers, GOV.UK does publish a `sitemap.xml`. We need to ensure the variant pages include the `ROBOTS` meta tag with a `NOINDEX` value.
56 | 
57 | This will be achieved by allowing a variant/part to be flagged with a "Exclude from Index" option. If present on a variant, when Government frontend renders the page it should include the NOINDEX value in the ROBOTS meta tag.
58 | 
59 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # GOV.UK Request For Comments
 2 | 
 3 | GOV.UK staff use this repository as a forum to discuss and make technical decisions. The outcomes of these discussions can be either an action plan, or a new standard that GOV.UK should follow. This repository is open as a reference for other teams within GDS and wider government.
 4 | 
 5 | ## Process
 6 | 
 7 | 1. Create a new branch on this repo and copy `rfc-000-template.md` to `rfc-000-my-proposal.md` and edit.
 8 | 2. Include any images etc in a separate directory named `rfc-000` and link to them.
 9 | 3. Make a Pull Request (PR) for your branch and open as a draft.
10 | 4. Rename your file and directory with the number of the PR and push changes.
11 | 5. When your RFC is ready to be commented on, mark you PR as "Ready for review" and set a deadline for comments (2 weeks is a common choice) and record that in the PR description.
12 | 5. Post a link to your PR in #govuk-developers on Slack and to the [govuk-tech-members][govuk-tech-members] Google Group.
13 | 6. GOV.UK members discuss your proposal using both inline comments against your RFC document and the general PR comments section. Non-technical staff will need to create a free Github account in order to comment.
14 | 7. As changes are requested and agreed in comments, make the changes in your RFC and push them as new commits.
15 | 8. Stay active in the discussion and encourage other relevant people to participate. If you’re unsure who should be involved in a discussion, ask your Tech Lead or a Lead Developer. If you start an RFC it’s up to you to push it through the process and engage people.
16 | 9. Once the deadline is reached you are able to merge if the PR has been approved by a member of GOV.UK Senior Tech (if they haven't or you're not sure, you can contact them on Slack with @govuk-senior-tech-people) - this action is the accepting of an RFC.
17 | 
18 | ### If consensus isn't reached
19 | 
20 | Should consensus not be reached by the deadline, an RFC can either have the deadline extended or the RFC can be closed as something not approiate for accepting now. For a deadline extension, you should make an effort to inform #govuk-developers about this status and request extra input. If the RFC is being closed, leave a comment explaining the status and close the PR.
21 | 
22 | ### If the proposal is no longer needed
23 | 
24 | If the proposal is no longer applicable the RFC PR can be closed. Please include a comment explaining why to help anyone considering the problem in future.
25 | 
26 | ## Editing past RFCs
27 | 
28 | RFCs should not be substantially altered after they are accepted as they intended to be kept as a point-in-time record of a decision. There are however a few reasons why you may change one that has been accepted:
29 | 
30 | - to fix typos and other minor mistakes
31 | - to record a status change of the RFC in the YAML frontmatter (remember to update the status_last_reviewed date)
32 | - to mark an RFC as being superseded with a link to the RFC that supersedes it
33 | - any relevant post implementation, or post abandonment, supplementary details that would be useful for someone interested in the area.
34 | 
35 | ## Historical RFCs
36 | 
37 | Some RFCs in this repository were migrated from Confluence. They’ve been automatically converted to Markdown, so some formatting might be incorrect. Please fix any issues as you find them in new PRs.
38 | 
39 | ## RFC metadata as YAML frontmatter
40 | 
41 | Some RFCs have YAML frontmatter which allows us to track their status / implementation etc.
42 | 
43 | <details>
44 | <summary>Script to list all RFC metadata</summary>
45 | 
46 | ```ruby
47 | #!/usr/bin/env ruby
48 | 
49 | require "csv"
50 | require "yaml"
51 | 
52 | frontmatter_columns = %w[status implementation status_last_reviewed status_notes]
53 | CSV do |csv|
54 |   csv << ["filename", *frontmatter_columns]
55 |   Dir.glob("rfc-*.md") do |filename|
56 |     first_line = File.readlines(filename).first
57 |     frontmatter = {}
58 |     frontmatter = YAML.load_file(filename, permitted_classes: [Date]) if first_line =~ /^---$/
59 |     csv << [filename, *frontmatter.values_at(*frontmatter_columns)]
60 |   end
61 | end
62 | ```
63 | 
64 | </details>
65 | 
66 | [govuk-tech-members]: https://groups.google.com/a/digital.cabinet-office.gov.uk/forum/#!forum/govuk-tech-members
67 | 


--------------------------------------------------------------------------------
/rfc-055-content-history.md:
--------------------------------------------------------------------------------
 1 | ## Problem
 2 | 
 3 | Publishing apps store change history for content items in a variety of different ways. As we move to a more centralised workflow, we need a consistent way of recording this data in the publishing-api and providing it for display to editors in publishing apps and to users in the frontends.
 4 | 
 5 | ## Current behaviour
 6 | 
 7 | Content history is usefully split into two mostly separate types. Public change history is exposed to users on the front end, and usually consists of a list of major update dates and manually entered change notes. Version history is shown to users in the publishing apps, and consists of dates when the content was changed or versioned; it might also include notes to other editors or fact checkers.
 8 | 
 9 | The existing apps support a mixture of these history types:
10 | 
11 | **Whitehall**
12 | 
13 | - Public change note: note, public timestamp - one per edition
14 | - Fact check requests: instructions (i.e. note from editor), comments (i.e. response from fact-checker)
15 | - Editorial remarks: body, author
16 | - Versions: event (create, update), state (draft, published, submitted), user
17 | 
18 | **Travel Advice Publisher**
19 | 
20 | - Version history and public change notes only
21 | 
22 | **Publisher**
23 | 
24 | - Public change note is available as a field to edit but it is not displayed anywhere
25 | - Important note, edition note and action are all the same model: recipient, requester, comment, request type (new\_version, assign, request\_review, important\_note...)
26 | - Fact check responses are stored as actions but requests are not currently stored
27 | 
28 | **Specialist Publisher**
29 | 
30 | - Public change history only: date, change note (currently being sent embedded in details, built up manually)
31 | 
32 | **Service Manual Publisher**
33 | 
34 | - Public change history: date, change summary, reason for change
35 | - State change events: new draft, assign author, add comment, request review, approve, publish
36 | 
37 | **Content tagger**
38 | 
39 | - No history
40 | 
41 | ## Proposal
42 | 
43 | ###   
44 | Public change history
45 | 
46 | The&nbsp;`change_history`&nbsp;element will be deprecated in the publisher schemas; apps will stop building up this history themselves, and instead just send the current public change note.&nbsp;
47 | 
48 | Publishing API will record, in a separate table, the set of **change notes**associated with a specific content ID. The&nbsp;`publish` command will, for major versions only, add an entry in this table with the contents of the&nbsp;`change_note`&nbsp;in the ContentItem to be published and the publish time. To support Service Manual Publisher,&nbsp;`reason_for_change` will be accepted in the schemas directly at the same level as&nbsp;`change_note`&nbsp;and recorded alongside it in the change notes table.
49 | 
50 | The downstream presenters will assemble the public content history from the list of change notes.
51 | 
52 | ### Version history
53 | 
54 | Publishing API should support the concept of **actions**. An action will link to a ContentItem or LinkSet and record all the activity that happens to that version - create, update, publish, unpublish etc - along with the ID of the user who performed that action, the email address of any recipient, and the text of any note/remark. Actions could also link to the Event that caused the change.
55 | 
56 | &nbsp;will be each create an action when they are called. In addition, we could store the diff between the current and previous versions on each change; a&nbsp;[spike into this](https://github.com/alphagov/publishing-api/compare/content_history) already exists.
57 | 
58 | The list of action types will be a superset of all those supported by the publishing apps, and no extra validation will be carried out to ensure that the action makes sense given the current state; at this point we are only recording history, we are not providing workflow or a state machine.
59 | 
60 | Whitehall (and Publisher once we start migrating it) will also need to start sending data specifically for those actions that do not result from existing commands - eg add note/remark, send for fact check, etc. This will probably need to be on a new&nbsp;`action` endpoint; we might later decide to split these out into separate endpoints when we start implementing the workflow itself, but it will be helpful to start storing the data now.
61 | 
62 | 


--------------------------------------------------------------------------------
/rfc-059-workflow-for-making-changes-to-the-schemas.md:
--------------------------------------------------------------------------------
 1 | ## Problem
 2 | 
 3 | Recently the publishing-api has begun to validate incoming payloads against the
 4 | [govuk-content-schemas][].
 5 | 
 6 | How it works now:
 7 | 
 8 | - The schemas are manually "deployed" using a [task in Jenkins][jenkins-task]
 9 |   that [copies][deploy-script] over the schemas to the publishing-api
10 | - Pull requests on publishing apps are tested against the master branch of
11 |   govuk-content-schemas ([example][example-1])
12 | - Pull requests on govuk-content-schemas are tested against the master branch
13 |   of the downstream applications ([example][example-2])
14 | 
15 | [govuk-content-schemas]: https://github.com/alphagov/govuk-content-schemas
16 | [jenkins-task]: https://deploy.integration.publishing.service.gov.uk/job/Deploy_GOVUK_Content_Schemas/
17 | [deploy-script]: https://github.com/alphagov/govuk-content-schemas/blob/master/deploy.sh
18 | [example-1]: https://github.com/alphagov/calendars/blob/51a9583b4de80aeca53c9f3762f6412c24a3c951/jenkins.sh#L45
19 | [example-2]: https://ci.dev.publishing.service.gov.uk/job/govuk_business_support_finder_schema_tests/configure
20 | 
21 | This opens up two issues that could cause the publishing-api to reject valid
22 | content, causing errors or delays for editors.
23 | 
24 | ### 1. Undeployed changes in schemas
25 | 
26 | Example:
27 | 
28 | - You add a field to govuk-content-schemas, but don't deploy
29 | - You add the field to the content item payload in the publisher application.
30 |   The PR will be green because the new payload is valid to the content schema
31 |   on master
32 | - PR gets merged and deployed
33 | - Now the app will fail on production because publishing-api doesn't know about
34 |   your new attribute yet
35 | 
36 | ### 2. Making schema changes with undeployed apps
37 | 
38 | Example:
39 | 
40 | - You want to remove an attribute from govuk-content-schemas. You remove it
41 |   from the payload in the publisher application. 
42 | - The PR is merged, but the application is not yet deployed to production.
43 | - Raise a PR on content-schemas to remove the payload. Downstream apps pass
44 |   because the publisher app master isn't sending the attribute anymore.
45 | - When you deploy govuk-content-schemas the publisher app will be sending
46 |   invalid content.
47 | 
48 | ## Proposal
49 | 
50 | This RFC proposes:
51 | 
52 | - Application tests are to be run against the **deployed** version of
53 |   govuk-content-schemas
54 | - Pull requests on govuk-content-schemas are to be tested against the
55 |   **deployed** version of the downstream applications
56 | 
57 | The scenarios above now can't happen:
58 | 
59 | ### 1. Undeployed changes in schemas
60 | 
61 | Example:
62 | 
63 | - You add a field to govuk-content-schemas, but don't deploy
64 | - You add the field to the content item payload in the publisher application.
65 |   The PR will **not pass** because the new payload is invalid against the
66 |   released-to-production branch of content-schemas
67 | 
68 | ### 2. Making schema changes with undeployed apps
69 | 
70 | Example:
71 | 
72 | - You want to remove an attribute from govuk-content-schemas. You remove it
73 |   from the payload in the publisher application. 
74 | - The PR is merged, but the application is not yet deployed to production.
75 | - Raise a PR on content-schemas to remove the payload. Downstream apps **do not
76 |   pass** because the publisher app still has the attribute in
77 |   released-to-production
78 | 
79 | Because we don't deploy automatically to production, there's a situation that
80 | will cause a PR to get merged that will fail on production:
81 | 
82 | 1. Add an attribute to govuk-content-schemas. Merge the PR & deploy to
83 |    production.
84 | 2. Raise PR to send this attribute from the publisher application. Your tests
85 |    pass (because it's testing against released-to-production schemas)
86 | 3. Merge the PR on the publisher application (but do not deploy)
87 | 4. Now raise a PR to remove attribute from content schemas. Because the change
88 |    hasn't been deployed the tests pass **will pass**. Merge it.
89 | 5. If you deploy the PR on the publisher application now, the publisher app
90 |    will start sending invalid data to the publishing-api and fail
91 | 
92 | To illustrate how schema changes take place, [we've made some diagrams
93 | describing the process][diagrams].
94 | 
95 | [diagrams]: https://gov-uk.atlassian.net/wiki/display/GOVUK/Illustration+of+schema+development+workflow
96 | 


--------------------------------------------------------------------------------
/rfc-016-how-to-prevent-published-live-frontends-from-reading-from-the-draft-content-store.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: superseded
 3 | implementation: superseded
 4 | status_last_reviewed: 2024-03-06
 5 | status_notes: We no longer use this infrastructure.
 6 | ---
 7 | 
 8 | # How to prevent published frontends from reading from the draft content store
 9 | 
10 | ## Problem
11 | 
12 | The old draft stack used firewall rules to ensure that instances of frontend applications intended to serve published content were prevented from inadvertently reading from the instance of the content-store application used to serve draft (unpublished) content.
13 | 
14 | This is an operational risk for GOV.UK, since unpublished content may be [embargoed](https://en.wikipedia.org/wiki/News_embargo) for publication or otherwise contain sensitive information that should not become public until published.
15 | 
16 | Now that applications serving draft content are hosted in the same vCloud organisation as their counterpart instances&nbsp;used for published content&nbsp;(see ), we must identify any vectors through which the published frontends could read from the draft content store.
17 | 
18 | ## Proposal
19 | 
20 | ### Risk vectors
21 | 
22 | We have identified two ways that a published frontend application might inadvertently read from the draft instance of the content-store application:
23 | 
24 | 1. The instance of the content-store used for published content could inadvertently read from the draft instance of the content-store database if its database settings were misconfigured.
25 | 2. Published instances of frontend applications could inadvertently read from the draft instance of the content-store application if their configuration was incorrect.
26 | 
27 | ### Mitigation for vector (1)
28 | 
29 | Vector (1) has been mitigated by [avoiding the use of a default](https://github.gds/gds/puppet/commit/3fa80cdceb7138dc2f1a7e4aba90976274a3ce65#diff-317c81f58cd20afec981e1e6c339703f) for the name of the content-store database when configuring the content-store application, meaning that each machine class (i.e. [content\_store](https://github.gds/gds/puppet/blob/effc4c0cab1/hieradata/class/content_store.yaml#L3) or [draft\_content\_store](https://github.gds/gds/puppet/blob/effc4c0cab1/hieradata/class/draft_content_store.yaml#L3)) must explicitly set the database name.
30 | 
31 | ### Mitigation for vector (2)
32 | 
33 | The draft instance of the content-store application is fronted by Nginx the existing api\_lb machines, much the same way that the published instance of content-store is fronted by Nginx. The decision to share the api\_lb for requests for both draft and published content was made to avoid the additional expense for creating two new draft\_api\_lb machines for serving draft content.
34 | 
35 | We initially explored access limiting by IP address within the Nginx virtual host for requests going to `draft-content-store.production.alphagov.co.uk` on the api\_lb machines so as to block requests coming from published instances of frontend applications, however this was not possible because the client IP address in the HTTP request is presented as originating from the API vDC network's gateway IP address. This makes it impossible, in this configuration, to differentiate between requests from published and draft instances of frontend applications.
36 | 
37 | The only other possibility for implementing access control by IP address would be in the vShield Edge Gateway's firewall rules. Applying a firewall rule for the [existing API vSE load balancer](https://github.gds/gds/govuk-provisioning/blob/c33df9b/vcloud-edge_gateway/rules/lb.yaml.mustache#L169-L175) would not work as both published and draft applications would make requests through the same path and there would be no way to distinguish between requests intended for the published versus draft instances of content-store.&nbsp;
38 | 
39 | To make that distinction possible, we propose to:
40 | 
41 | - Instantiate a new vShield Edge Gateway (vSE) load balancer named 'DraftAPI', which will use the existing api\_lb machines as its pool members on port 8443.
42 | - Add a new virtual host to Nginx on the api\_lb machines that listens on port 8443 and have it serve requests destined for `draft-content-store.production.alphagov.co.uk`
43 | - Add a firewall rule to the vShield Edge Gateway that prevents IP addresses other than those used by the draft\_frontend machines from connecting to the new 'DraftAPI' vSE load balancer.
44 | 
45 | &nbsp;
46 | 
47 | &nbsp;
48 | 
49 | &nbsp;
50 | 
51 | 


--------------------------------------------------------------------------------
/rfc-087-dealing-with-errors.md:
--------------------------------------------------------------------------------
  1 | # Dealing with errors
  2 | 
  3 | ## Summary
  4 | 
  5 | This describes how we treat errors on GOV.UK
  6 | 
  7 | ## Problem
  8 | 
  9 | We've recently migrated to a new error tracking service, Sentry. This provides us an opportunity to rethink how we treat errors.
 10 | 
 11 | ## Proposal
 12 | 
 13 | There are 2 principles:
 14 | 
 15 | ### 1. When something goes wrong, we should be notified
 16 | 
 17 | Applications should report exceptions to Sentry. Applications must not swallow errors.
 18 | 
 19 | ### 2. Notifications should be actionable
 20 | 
 21 | Sentry notifications should be something that requires a developer of the app to do something about it. Not just a piece of information.
 22 | 
 23 | ### 3. Applications should not error
 24 | 
 25 | The goal of GOV.UK is that applications should not error. When something goes wrong it should be fixed.
 26 | 
 27 | ## Classifying errors
 28 | 
 29 | ### Bug
 30 | 
 31 | A code change makes the application crash.
 32 | 
 33 | Desired behaviour: error is sent to Sentry, developers are notified and fix the error. Developers mark the error in Sentry as `Resolved`. This means a recurrence of the error will alert developers again.
 34 | 
 35 | ### Intermittent errors without user impact
 36 | 
 37 | Frontend applications often see timeouts when talking to the content-store.
 38 | 
 39 | There's no or little user impact because the request will be answered by the caching layer.
 40 | 
 41 | Example: <https://sentry.io/govuk/app-finder-frontend/issues/352985400>
 42 | 
 43 | Desired behaviour: error is not sent to Sentry. Instead, we rely on Smokey and Icinga checks to make sure we the site functions.
 44 | 
 45 | ### Intermittent errors with user impact
 46 | 
 47 | Publishing applications sometimes see timeouts when talking to publishing-api. This results in the publisher seeing an error page and possibly losing data.
 48 | 
 49 | Example: <https://sentry.io/govuk/app-content-tagger/issues/367277928>
 50 | 
 51 | Desired behaviour: apps handle these errors better, for example by offloading the work to a Sidekiq worker. Since these errors aren't actionable, they should not be reported to Sentry. They should be tracked in Graphite.
 52 | 
 53 | ### Intermittent retryable errors
 54 | 
 55 | Sidekiq worker sends something to the publishing-api, which times out. Sidekiq retries, the next time it works.
 56 | 
 57 | Desired behaviour: errors are not reported to Sentry until retries are exhausted. See [this PR for an example](https://github.com/alphagov/content-performance-manager/pull/353).
 58 | 
 59 | Relevant: https://github.com/getsentry/raven-ruby/pull/784
 60 | 
 61 | ### Expected environment-based errors
 62 | 
 63 | MySQL errors on staging while data sync happens.
 64 | 
 65 | Example: <https://sentry.io/govuk/app-whitehall/issues/343619055>
 66 | 
 67 | Desired behaviour: our environment is set up such that these errors do not occur.
 68 | 
 69 | ### Bad request errors
 70 | 
 71 | User makes a request the application can't handle ([example][bad-request]).
 72 | 
 73 | Often happens in [security checks](https://sentry.io/govuk/app-frontend/issues/400074979).
 74 | 
 75 | Example: <https://sentry.io/govuk/app-frontend/issues/400074979>
 76 | 
 77 | Desired behaviour: user gets feedback, error is not reported to Sentry
 78 | 
 79 | [bad-request]: https://sentry.io/govuk/app-service-manual-frontend/issues/400074003
 80 | 
 81 | ### Incorrect bubbling up of errors
 82 | 
 83 | Rummager crashes on date parsing, returns `422`, which raises an error in finder-frontend.
 84 | 
 85 | Example: <https://sentry.io/govuk/app-finder-frontend/issues/400074507>
 86 | 
 87 | Desired behaviour: a 4XX reponse is returned to the browser, including an error message. Nothing is ever sent to Sentry.
 88 | 
 89 | ### Manually logged errors
 90 | 
 91 | Something goes wrong and we need to let developers know.
 92 | 
 93 | Example: [Slimmer's old behaviour](https://github.com/alphagov/slimmer/pull/203/files#diff-e5615a250f587cf4e2147f6163616a1a)
 94 | 
 95 | Desired behaviour: developers do not use Sentry for logging. The app either raises the actual error (which causes the user to see the error) or logs the error to Kibana.
 96 | 
 97 | ### IP spoof errors
 98 | 
 99 | Rails reports `ActionDispatch::RemoteIp::IpSpoofAttackError`.
100 | 
101 | Example: <https://sentry.io/govuk/app-service-manual-frontend/issues/365951370>
102 | 
103 | Desired behaviour: HTTP 400 is returned, error is not reported to Sentry.
104 | 
105 | ### Database entry not found
106 | 
107 | Often a controller will do something like `Thing.find(params[:id])` and rely on Rails to show a 404 page for the `ActiveRecord::RecordNotFound` it raises ([context](https://stackoverflow.com/questions/27925282/activerecordrecordnotfound-raises-404-instead-of-500)).
108 | 
109 | Desired behaviour: errors are not reported to Sentry
110 | 


--------------------------------------------------------------------------------
/rfc-027-supporting-slug-changes-in-the-publishing-api.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-06
 5 | ---
 6 | 
 7 | # Supporting slug changes in the Publishing API
 8 | 
 9 | ## Problem
10 | 
11 | - slug changes are **costly and increasingly common** for live content
12 | - we also need to handle slug changes for **draft content**
13 | 
14 | ### Slug changes in live content
15 | 
16 | GOV.UK's existing publishing tools were built upon the assumption that document slugs do not change. This was a helpful simplifying assumption in the early days of GOV.UK and mostly held true. In the rare cases where a published slug did need to change, the cost of manual intervention from a developer to correct the issue was reasonable.
17 | 
18 | However as GOV.UK has matured, the factors in this trade-off have changed:
19 | 
20 | - we now have vastly more content on GOV.UK, so although the chance of a particular document needing a slug change remains low, slug change requests are still much more frequent
21 | - our systems have become more complex over time and the effort involved in performing a slug change manually without introducing errors has increased
22 | 
23 | If we need evidence of the cost and the frequency of slug changes, we could review the workload of 2nd line support.
24 | 
25 | ### Slug changes in draft content
26 | 
27 | We have always allowed slugs of draft content to change. This has never been an issue because draft items were contained within a single system, so there was no requirement to maintain consistency between multiple systems representing the same content item.
28 | 
29 | With the introduction of the 'content preview' system in Publishing API, we now handle draft items whose slugs can change.&nbsp;
30 | 
31 | The primary identifier used for content items in the publishing API is the&nbsp;`base_path`. Using&nbsp;`base_path` as the primary identifier is based on the assumption that it does not change.
32 | 
33 | If the slug of a draft content item changes, our only option at present would be to&nbsp;require the publishing application to notify the publishing API of this change so that it can remove the document at the previous slug (publishing API currently does not support deletion of content items).
34 | 
35 | ## Proposal
36 | 
37 | ### Proposal 1: we should assume that slug changes will happen and incorporate this into the design of the publishing API
38 | 
39 | The simplifying assumption that slug changes do not happen is no longer serving us.
40 | 
41 | ### Proposal 2: use content\_id as the primary identifier of content items
42 | 
43 | In order to cater for the above change in assumptions, we should use a persistent abstract identifier for content items. We already have such an identifier in the systems, in the form of&nbsp;`content_id`.&nbsp;
44 | 
45 | Since&nbsp;it's a GUID it can be generated independently and asynchronously by the publishing applications (no need for a central coordinating authority).
46 | 
47 | All publishing API endpoints should accept&nbsp;`content_id` rather than&nbsp;`base_path`, ie. instead of:
48 | 
49 | This implies that&nbsp;`content_id` would be **required** &nbsp;for all content items (not an onerous requirement).
50 | 
51 | In order to transition to this approach there are a few options:
52 | 
53 | - introduce a set of publishing API endpoints which accept content by guid (e.g.&nbsp;`PUT /content_by_guid/,`&nbsp;`PUT /draft_content_by_guid/` or something similar)
54 | - allow the existing endpoints to detect a slug which looks like a guid and treat it as such. Although slightly hacky, the chance that a normal slug would match the pattern for a guid is extremely low.
55 | 
56 | #### Benefits of using `content_id`&nbsp;as primary identifier
57 | 
58 | This will allow the publishing API to understand when the slug of a content item posted has changed. It will then allow publishing API to either:
59 | 
60 | - disallow the change
61 | - gracefully handle the change by propogating the change to any downstream systems (e.g. router, url arbiter etc). It could even put in place a redirect from the old url to the new url.
62 | 
63 | Further down the line, if we move to a system where publishing API keeps some kind of 'transaction log' record, this API will allow us to keep a record of the changes of a documents slug over time. Having this data in a single transaction log will mean that we have all the information in one place to verify and enforce consistency in downstream systems.
64 | 
65 | ## Status of this RFC
66 | 
67 | This is an early draft, there are probably many things I have missed or not thought about.
68 | 
69 | - Do you agree with the end goal?
70 | - Do you see any issues with migrating to this?
71 | - Can you see any problems or risks I haven't identified?  
72 | - Does anything need fleshing out further?
73 | 
74 | Thanks for reading and for your input!
75 | 


--------------------------------------------------------------------------------
/rfc-145-unarchive-govuk_admin_template.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-04
 5 | status_notes: "We're getting close to being able to archive this again, with the publishing apps removing their dependency on Bootstrap"
 6 | ---
 7 | 
 8 | # Unarchive govuk_admin_template Bootstrap project
 9 | ## Summary
10 | 
11 | The [govuk-admin-template](https://github.com/alphagov/govuk_admin_template) gem has been deprecated for a while, and new GOV.UK admin applications should be built using the [layout component in govuk_publishing_components](https://govuk-publishing-components.herokuapp.com/component-guide/layout_for_admin). Ideally, we would have migrated all our applications to use the new Design System and publishing components, however there are still several in active use which still rely on govuk-admin-template. The GitHub project is archived, and is therefore read-only.
12 | 
13 | This RFC proposes un-archiving the project to allow iterative accessibility fixes to be applied to live applications.
14 | 
15 | Importantly, this RFC _does not_ propose removing the deprecation notice, or advocate for significant development of govuk_admin_template. It is intended as a stop-gap that acknowledges the slow pace of migration to the Design System, and its intention is to yield accessibility fixes for users prior to those migrations taking place.
16 | 
17 | ## Problem
18 | 
19 | When [govuk-admin-template](https://github.com/alphagov/govuk_admin_template) was deprecated and made read-only back in July 2018, we anticipated projects would be swiftly migrated to the GOV.UK Design System. There are many benefits to this, and projects should still aim to be migrated.
20 | 
21 | Three and a half years on, we still have [18](#projects-using-govuk_admin_template) projects referencing govuk-admin-template. This includes many publishing apps, most prominently [Whitehall](https://github.com/alphagov/whitehall).
22 | 
23 | Several accessibility issues have been identified our publishing applications which stem from govuk-admin-template. Some of these are relatively easy fixes, however we can't apply them because the repository is read-only.
24 | 
25 | ## Proposal
26 | 
27 | With the intention of enabling quick wins for some perennial accessibility issues, we propose:
28 | 
29 | 1. We will un-archive [govuk-admin-template](https://github.com/alphagov/govuk_admin_template) on GitHub, making it possible to release new versions of the gem with accessibility fixes.
30 | 
31 | 2. GOV.UK developers may release new versions of the gem to address **critical** accessibility, usability, and security issues.
32 | 
33 | 3. Teams **MUST** endeavour to migrate pages and applications to the GOV.UK Design System where practical.
34 | 
35 | 4. (Unchanged) Teams **MUST NOT** create new applications, user journeys or significant new features using `govuk-admin-template`.
36 | 
37 | ## Consequences
38 | 
39 | - By improving accessibility in the govuk-admin-template, there may be less impetus to migrate to the GOV.UK Design System. We believe the immediate benefits to users outweighs this concern. All applications are still expected to be migrated, and it's important that we retain senior management buy-in for the migration. 
40 | 
41 | - By enabling fixes in the underlying gems, we can start iteratively improving some of our oldest and most painful systems.
42 | 
43 | ## Appendices
44 | 
45 | ### Projects using govuk_admin_template
46 | 
47 | > It's not possible to filter archived repositories in GitHub search yet, but manually filtering [this search](https://github.com/search?p=3&q=org%3Aalphagov+%22gem+govuk_admin_template%22&type=Code) yields:
48 | 
49 | - [collections-publisher](https://github.com/alphagov/collections-publisher)
50 | - [contacts-admin](https://github.com/alphagov/contacts-admin)
51 | - [content-tagger](https://github.com/alphagov/content-tagger)
52 | - [imminence](https://github.com/alphagov/imminence)
53 | - [local-links-manager](https://github.com/alphagov/local-links-manager)
54 | - [manuals-publisher](https://github.com/alphagov/manuals-publisher)
55 | - [maslow](https://github.com/alphagov/maslow)
56 | - [publisher](https://github.com/alphagov/publisher)
57 | - [search-admin](https://github.com/alphagov/search-admin)
58 | - [search-performance-explorer](https://github.com/alphagov/search-performance-explorer)
59 | - [service-manual-publisher](https://github.com/alphagov/service-manual-publisher)
60 | - [short-url-manager](https://github.com/alphagov/short-url-manager)
61 | - [signon](https://github.com/alphagov/signon)
62 | - [specialist-publisher](https://github.com/alphagov/specialist-publisher)
63 | - [support](https://github.com/alphagov/support)
64 | - [transition](https://github.com/alphagov/transition)
65 | - [travel-advice-publisher](https://github.com/alphagov/travel-advice-publisher)
66 | - [whitehall](https://github.com/alphagov/whitehall)
67 | 


--------------------------------------------------------------------------------
/rfc-098-csp.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-04
 5 | status_notes: RFC may not be accurate in the implementation details.
 6 | ---
 7 | 
 8 | # RFC 98: Implement Content Security Policy
 9 | 
10 | ## Summary
11 | 
12 | This RFC proposes to configure a Content Security Policy for GOV.UK.
13 | 
14 | ## Background
15 | 
16 | We'd like to implement a Content Security Policy (CSP) on www.gov.uk.
17 | 
18 | > The HTTP Content-Security-Policy response header allows web site administrators to control resources the user agent is allowed to load for a given page. With a few exceptions, policies mostly involve specifying server origins and script endpoints. This helps guard against cross-site scripting attacks (XSS).
19 | 
20 | https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy
21 | 
22 | CSP works by sending a header with each HTTP response. It looks something like this:
23 | 
24 | ```
25 | Content-Security-Policy: default-src 'self' assets.publishing.service.gov.uk;
26 | ```
27 | 
28 | The above will cause the browser to reject any scripts that aren't on the current domain (`self`) or `assets.publishing.service.gov.uk`. It will also reject Javascript loaded via `<script>` tags.
29 | 
30 | For an example close to home, GOV.UK Verify use the following policy:
31 | 
32 | ```
33 | $ curl -si 'https://www.signin.service.gov.uk/start' | grep 'Content-Security-Policy:'
34 | Content-Security-Policy: default-src 'self'; font-src 'self'; img-src 'self'; object-src 'none'; script-src 'self' 'unsafe-eval' 'sha256-+6WnXIl4mbFTCARd8N3COQmT3bJJmo32N8q8ZSQAIcU=' 'sha256-G29/qSW/JHHANtFhlrZVDZW1HOkCDRc78ggbqwwIJ2g=' 'unsafe-inline'; style-src 'self' 'unsafe-inline'
35 | ```
36 | 
37 | It's implemented [in the verify-frontend Rails app](https://github.com/alphagov/verify-frontend/blob/5a12b82e8cf4dc202335e52a8e6875ad5179420d/config/application.rb#L46-L55).
38 | 
39 | ### Why CSP
40 | 
41 | We've got 2 things in mind that CSP will help with:
42 | 
43 | - It an extra defence against cross-site scripting vulnerabilities, such as the one [we saw earlier this year on finder-frontend](https://github.com/alphagov/govuk_publishing_components/pull/283).
44 | - Publishers use [Govspeak](https://github.com/alphagov/govspeak) in publishing applications to mark up their content. When the content is published, it's converted into HTML. The resulting HTML is persisted in the content-store for the frontends to use (which [we have to trust](https://github.com/alphagov/government-frontend/search?q=html_safe)). This means that if the the content-store is compromised we could be serving malicious HTML from the frontends. CSP mitigates against that by limiting the type of things the browser will run.
45 | 
46 | ## Problem
47 | 
48 | We have 3 options for adding the header to the HTTP response: on the CDN, in the application, and in Nginx.
49 | 
50 | - **Configure in CDN** - configure Fastly so it sends the header on each request. This is what [we've done during our initial experimentation](https://github.com/alphagov/govuk-cdn-config/pull/94).
51 | - **Configure in app** - configure the Rails apps to send the header. This is [what verify-frontend does](https://github.com/alphagov/verify-frontend/blob/5a12b82e8cf4dc202335e52a8e6875ad5179420d/config/application.rb#L46-L55).
52 | - **Configure in Nginx** - configure Nginx so it sends the header on each request. This is what [is done to set up STS](https://github.com/alphagov/govuk-puppet/blob/8f5152e86fb6b105817bd977752c18c603395585/modules/nginx/files/etc/nginx/add-sts.conf).
53 | 
54 | ### Trade-offs
55 | 
56 | | | Configure in CDN | Configure in app | Configure in Nginx |
57 | | --- | --- | --- | --- |
58 | | Deployment | The CDN is easily and fast to deploy | Slow to roll out and iterate. We'd probably add it to `govuk_app_config`, which requires a version bump in ~15 applications. Allows staged rollout. | Slow deployments via Puppet |
59 | | Policies | The CSP header is set consistently for all of the requests, even ones that aren't served from a Rails app like [Licensing](https://github.com/alphagov/licensify) | Allows per-app custom policies - for example, whitelisting [webchat domains only for contact pages](https://github.com/alphagov/govuk-cdn-config/pull/96/commits/913202a1de8f4993b1ff4605553d7328b9e8e640) | Allows sharing of CSP between the frontend apps and publisher apps |
60 | | Development | It doesn't work locally - your app will work in development and on Heroku, but might not work on integration, staging, and production | Works locally just like in production. We'll have to update the CSP in development to allow `localhost` and `dev.gov.uk` domains | Works locally if using the VM (doesn't work on Heroku or non-VM) |
61 | 
62 | ## Proposal
63 | 
64 | We'll configure the CSP header each application.
65 | 
66 | We're optimising for safety and incremental rollout, at the cost of consistency across GOV.UK, and completeness.
67 | 


--------------------------------------------------------------------------------
/rfc-100-linting.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-04
 5 | ---
 6 | 
 7 | # RFC 💯: Update linting system
 8 | 
 9 | ## Summary
10 | 
11 | Update the way we do linting by using Rubocop directly and using NPM-packages to lint CSS and JS.
12 | 
13 | ## Background
14 | 
15 | At present, we use a gem called [govuk-lint][] for linting in our projects.
16 | 
17 | - It's a wrapper around [rubocop], a community driven styleguide for Ruby, so that we can have [central configuration files][rules] that specify our linting rules
18 | - It also adds the [--diff --cached commands][commands] to rubocop, which allow us to lint only the changed files
19 | - It provides a wrapper for [scss_lint][] to provide [shared rules][css-rules] for CSS. These tools are used by the Design System team, are well maintained and have an autocorrect feature.
20 | 
21 | We [wrote about introducing linting](https://gdstechnology.blog.gov.uk/2016/09/30/easing-the-process-of-pull-request-reviews/) in 2016.
22 | 
23 | [govuk-lint]: https://github.com/alphagov/govuk-lint
24 | [rubocop]: https://github.com/rubocop-hq/rubocop
25 | [scss_lint]: https://github.com/brigade/scss-lint
26 | [rules]: https://github.com/alphagov/govuk-lint/tree/master/configs/rubocop
27 | [commands]: https://github.com/alphagov/govuk-lint#ruby
28 | [css-rules]: https://github.com/alphagov/govuk-lint/blob/9c501a15824a156718d58e7e1a107a7d78171c5f/configs/scss_lint/gds-sass-styleguide.yml
29 | 
30 | ## Issues
31 | 
32 | 1. Rubocop now has a native way of sharing rules: you include rules by specifing `inherit_gem` in `.rubocop.yml`. This has been successfully [trialed in content-performance-manager](https://github.com/alphagov/content-performance-manager/pull/1082). As mentioned in that PR, this makes the linting faster and makes it compatible with developer environments (see [this issue from 2016](https://github.com/alphagov/govuk-lint/issues/61) for evidence of a need).
33 | 1. A lot of hard work has gone into fixing older Ruby linting violations ([for example in finder-frontend](https://github.com/alphagov/finder-frontend/pull/579)). This has allowed us to [turn off the "diffing" behaviour](https://github.com/alphagov/finder-frontend/pull/581) for most projects - we now lint all of the code all of the time.
34 | 1. We currently do not lint our Javascript. This is a gap first raised [in 2016](https://github.com/alphagov/govuk-lint/issues/51) and further discussed in an [issue talking about Standard.js](https://github.com/alphagov/govuk-lint/issues/63).
35 | 1. The [scss_lint tool is being deprecated](https://github.com/alphagov/govuk-lint/issues/70). The authors suggest using a different library.
36 | 1. The CSS linting hasn't been fully adopted in GOV.UK projects - only a few frontend applications have it enabled. Others [explicitly disable it](https://github.com/search?q=org%3Aalphagov+sassLint%3A+false&type=Code). The main reason for this is that `scss_lint` does not have an autocorrect feature, so it's a significant (and boring) investment to adopt the linting for older projects.
37 | 1. While govuk-lint has been adopted by [a lot of Ruby projects in GDS and wider government](https://github.com/alphagov/govuk-lint/network/dependents), the CSS linting feature isn't always used. Projects that don't use CSS still need to pull in the `scss_lint` dependency.
38 | 1. Last year, [GOV.UK Frontend](https://github.com/alphagov/govuk-frontend) (part of the [Design System](https://design-system.service.gov.uk/)) was officially launched. Since this is the defacto standard for building frontend things in government, GOV.UK should be adopting the same tools it uses. This will allow us to easily push things upstream and re-use GOV.UK Frontend patterns.
39 | 1. There's been a shift in the Ruby community regarding the use of Javascript packages. Where previously it was preferred to use tools written in Ruby like `scss_lint`, Rails now [ships with Yarn](https://guides.rubyonrails.org/5_1_release_notes.html#yarn-support) and supports [Webpack](https://guides.rubyonrails.org/5_1_release_notes.html#optional-webpack-support).
40 | 
41 | ## Proposal
42 | 
43 | 1. Retire the `govuk-lint` gem
44 | 2. Create a new gem called `govuk_rubocop` that includes all rules configuration and a dependency on the `rubocop` gem
45 | 3. Adopt NPM modules [standard](https://www.npmjs.com/package/standard) for Javascript linting and [sass-lint](https://www.npmjs.com/package/sass-lint) for CSS linting
46 | 
47 | ## Consequences
48 | 
49 | ### Impact
50 | 
51 | - All applications that provide a frontend will have a development / CI dependency on Yarn.
52 | 
53 | ### Implementation
54 | 
55 | 1. Create new `govuk_rubocop` gem
56 | 2. Update all GOV.UK repos to switch out `govuk-lint` for `govuk_rubocop`,  add `inherit_gem` to `.rubocop.yml`, and add a `package.json` with standard and sass-lint and tasks defined. We would probably be able to automate this.
57 | 3. Update [govuk-jenkinslib](https://github.com/alphagov/govuk-jenkinslib) to automatically `yarn install` and run the linting tasks
58 | 4. Update documentation
59 | 5. Enable Dependabot for NPM modules in repos
60 | 


--------------------------------------------------------------------------------
/rfc-064-killing-router-data.md:
--------------------------------------------------------------------------------
  1 | ## Problem
  2 | 
  3 | Currently the official way to add ad-hoc redirects is to [add them into one of
  4 | the CSV files in router-data][router-data] and deploy that repo. However, this
  5 | is unnecessarily complex; it is slow to run, and requires an explicit
  6 | deployment every time; and it often fails because existing redirects have
  7 | themselves moved, meaning developers have to do extra work to resolve things.
  8 | 
  9 | [router-data]: https://github.digital.cabinet-office.gov.uk/gds/router-data
 10 | 
 11 | In addition, publishing-api has taken on the goal of containing the canonical
 12 | list of content and routes on the site. But items added directly to via
 13 | router-data are not recorded in publishing-api, making this harder to achieve.
 14 | Also, since publishing-api now does record history, it can also fulfil the role
 15 | of audit log which was one of the reasons behind creating router-data in the
 16 | first place.
 17 | 
 18 | Therefore we should aim to migrate the functionality of adding redirects into
 19 | that publishing-api. There are a few ways this could be done:
 20 | 
 21 | - Create a minimal standalone “redirects” app that puts redirect items into
 22 |   publishing-api
 23 | - Add a web interface to publishing-api to add redirects (and potentially other
 24 |   content-items) directly
 25 | - Get developers to PUT new redirects to publishing-api via curl or equivalent
 26 | - Write a rake task that adds a redirect, which can be called via the existing
 27 |   rake runner in Jenkins - or add a specific Jenkins job to call this task with
 28 |   the relevant parameters
 29 | 
 30 | Whichever option is chosen, it will also be necessary to backport all the
 31 | existing redirects from router-data into publishing-api via a rake task or
 32 | migration.
 33 | 
 34 | ## Proposal
 35 | 
 36 | ### Option 1: Standalone redirects app
 37 | 
 38 | We create a new publishing app whose only responsibility is to add redirects to
 39 | publishing-api. This could be very simple, perhaps only allowing the
 40 | functionality to put redirect content items and not even supporting browsing
 41 | existing items.
 42 | 
 43 | - Pros: simple interface, easy to add redirects
 44 | - Cons: added complexity of maintaining a whole new app for a single task
 45 | 
 46 | ### Option 2: Add a web interface to publishing-api
 47 | 
 48 | Similar to option 1, except that the interface to add redirects lives directly
 49 | in publishing-api.
 50 | 
 51 | - Pros: simple interface, no need for separate app
 52 | - Cons: Needs added configuration to make publishing-api directly addressable;
 53 |   undesirable precedent in turning API into a web app
 54 | 
 55 | ### Option 3: Get devs to send new redirects to publishing-api manually
 56 | 
 57 | This involves no up-front developer work. Under this option, to add a redirect
 58 | devs would have to send PUT requests manually on the command line via curl or
 59 | equivalent.
 60 | 
 61 | - Pros: No work required except for documentation
 62 | - Cons: Annoying for devs, as it would require constructing the request in the
 63 |   appropriate format on the command line each time
 64 | 
 65 | ### Option 4: Create a rake task in publishing-api to add a redirect
 66 | 
 67 | We add a rake task in publishing-api which calls the existing command actions
 68 | to create a new redirect item, passing the correct payload for a redirect with
 69 | the specific parameters (old path, new path, redirect type) interpolated. This
 70 | task could be run from the generic Jenkins rake task runner, or a new job could
 71 | be added to Jenkins to accept the parameters specifically and run the rake
 72 | task.
 73 | 
 74 | - Pros: Simple to set up, relatively simple to use
 75 | - Cons: No helpful interface
 76 | 
 77 | ## Summary of discussion
 78 | 
 79 | An alternative proposal was made, namely to modify the existing
 80 | short-url-publisher app to include this functionality. This will enable moving
 81 | this task so it is no longer the responsibility of developers but can be done
 82 | by content editors, for example.
 83 | 
 84 | The requirements for this work are as follows:
 85 | 
 86 | 1. The purpose of this functionality is to manage ad-hoc redirects that are not
 87 |    owned by any other publishing app. This will be enforced by the existing
 88 |    path reservation feature in publishing-api.
 89 | 2. This functionality should be “fully migrated” from the start: i.e. it does
 90 |    not keep redirect data locally (except for users from gds-sso), and writes
 91 |    directly to publishing-api. This may involve migrating the existing
 92 |    Redirects functionality within this app; however it probably does not make
 93 |    sense to migrate the "request a short url" part of the app.
 94 | 3. Users can add redirect by specifying from\_path, to\_path, and type
 95 |    (exact/prefix).
 96 | 4. Additionally it would be useful to have an optional Comment or Reason field
 97 |    to store the reason the redirect was added, to take the place of the commit
 98 |    comment in router-data. This will require a schema change.
 99 | 5. Nice to have: a bulk add interface to allow multiple redirects to be created
100 |    and published at once.
101 | 


--------------------------------------------------------------------------------
/rfc-013-thoughts-on-access-limiting-in-draft.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-06
 5 | ---
 6 | 
 7 | # Thoughts on access limiting in draft
 8 | 
 9 | ## Problem
10 | 
11 | Editions in Whitehall have the ability to be marked as "access limited". This means that until they are published, only users in the organisation the edition belongs to, or (for world news articles) associated with the world locations of the organisation, can view the edition in draft. This functionality needs to be replicated in the draft stack so that access-limited content is only viewable by the relevant users before publication.
12 | 
13 | ## Agreed solution
14 | 
15 | After discussion it has been decided to split authorisation and authentication. The content-store will be responsible for authorisation, i.e. determining if a user can see a given content item. A new minimal app will live in front of the router and be responsible for authentication, i.e. identifying a user with a signon session. This will involve the following changes:
16 | 
17 | **A new "Authenticating proxy" application**
18 | 
19 | - authenticates users via signonotron oauth
20 | - proxies requests from authenticated users adding an&nbsp;&nbsp;`X-GOVUK-AUTHENTICATED-USER` header containing the user id
21 | - denies all unauthenticated requests
22 | 
23 | **Updates to content-store**
24 | 
25 | - Uses the the &nbsp;`X-GOVUK-AUTHENTICATED-USER`&nbsp;header to check the list of permitted users in content item data
26 | - Returns 403 if user is not permitted to see the item
27 | - Ensures that responses for access limited items are marked with no\_cache headers. This is in accordance with our principle that frontend applications honour cache headers from content-store.
28 | 
29 | **Updates to gds-api-adapters**
30 | 
31 | - propagates&nbsp;the user ID in the&nbsp;`X-GOVUK-AUTHENTICATED-USER`&nbsp;header to content-store API requests
32 | 
33 | **Updates to government-frontend**
34 | 
35 | - serves a nice 403 page for unauthorised user requests
36 | 
37 | Flow diagram
38 | 
39 | &nbsp;
40 | 
41 | Flow diagram generated using [http://bramp.github.io/js-sequence-diagrams/](http://bramp.github.io/js-sequence-diagrams/)&nbsp;graph source at&nbsp;[https://gist.github.com/heathd/44dd5e0aaede647121f2](https://gist.github.com/heathd/44dd5e0aaede647121f2)
42 | 
43 | ### Data representation
44 | 
45 | The content item should contain the list of user ids who are allowed to access this item. The publishing application is responsible for providing this list; for example in the case of Whitehall, where access limits are defined by organisation, the app would expand the organisation into a list of its member users. Since draft items are relatively ephemeral, there should no need to provide functionality to republish them if organisation membership changes.
46 | 
47 | The `access_limited`&nbsp;object may look something like this:
48 | 
49 | Defining this as an object with a single "users" key provides flexibility in case we do need to add alternative authorisation methods in the future. The exact structure is defined in [govuk-content-schemas](https://github.com/alphagov/govuk-content-schemas/blob/master/formats/metadata.json#L64).
50 | 
51 | &nbsp;
52 | 
53 | ### **Original Proposals we considered**
54 | 
55 | Add an optional&nbsp;`access_limited`&nbsp;boolean to formats/metadata.json so it is added to the base of all edition schemas. (Don't need to provide the actual org or location since they are already provided in links?)
56 | 
57 | **Option 1: in content store**
58 | 
59 | Add gds-sso gem to content-store and create User model (shouldn't need any extra fields).  
60 | When contentitem has access\_limited, check organisation matches.  
61 | Pass through OAuth token and JSON blob when making API request to content-store - how?  
62 | How to redirect to signon if not authenticated?&nbsp;
63 | 
64 | **Option 2: new authenticator service _(chosen as final option)_**
65 | 
66 | Sits between router and the frontend apps, uses gds-sso gem.  
67 | Would need to store data from signon which means having a database. Is there no way to use gds-sso without storing data locally?  
68 | Makes request to content-store to determine if item is access limited. Maybe a new content-store endpoint to just return org slug, to avoid passing the whole content item around twice?  
69 | How to route via authenticator, since router will point to government-frontend?
70 | 
71 | **Option 3: do it in government-frontend**
72 | 
73 | Use gds-sso here, so would need to add a database&nbsp;(again, is there really no way of doing it without?)  
74 | Simplifies routing and redirection, and would only need to request content once  
75 | Possible duplication if other frontends also need this functionality.
76 | 
77 | **For the sake of completeness, option 4: in router**
78 | 
79 | Horrible as a) blurs responsibility of router and b) it's go so we would need to re-implement all the sso stuff.&nbsp;  
80 | But, it has a database, it's central, and would avoid problems with routing and redirection.&nbsp;  
81 | Wouldn't solve issue of requesting data twice though, and would need to have a way of rendering 'not for you' page.&nbsp;
82 | 
83 | &nbsp;
84 | 
85 | &nbsp;
86 | 
87 | &nbsp;
88 | 
89 | 


--------------------------------------------------------------------------------
/rfc-052-pull-request-merging-process.md:
--------------------------------------------------------------------------------
 1 | ## Problem
 2 | 
 3 | There has been recent confusion about the processes under which we merge Pull Requests (PRs). We have one application (Whitehall) that uses a different set of rules to other applications, and this is confusing for new starters. The only documentation about our PR review process is Whitehall specific and is two years old.
 4 | 
 5 | ## Proposed Standard
 6 | 
 7 | There are just four rules of reviewing and merging PRs:
 8 | 
 9 | 1. `master` must be able to be released at any time.
10 | 2. The change must have two reviews from people from GDS (preferably GOV.UK). This can (and normally will) include the author.
11 | 3. Use the Github Review UI to mark a PR as approved or requiring changes.
12 | 4. Use the Github UI to merge the PR. This ensures the PR number is added to the merge commit.
13 | 
14 | These rules apply to all applications, including Whitehall. As long as these rules are followed, PRs can be reviewed and merged in a way best suited to the situation.
15 | 
16 | ### Example scenarios
17 | 
18 | #### A simple change
19 | 
20 | A small PR against a well-understood application, written by someone with good knowledge of the problem domain and with a well defined scope. When a PR is raised, someone with similar understanding of the application and change can just approve and merge the PR. Both people are very confident that master will be deployable once this PR is merged.
21 | 
22 | #### A simple change for a repository that has a long running test suite
23 | 
24 | Similar to the above. If a PR is against a repository that has a long running test suite and you've approved the change and are confident that the suite will pass, the reviewer can approve the Pull Request. The author of the PR now has approval to merge the PR themselves once the test suite has passed.
25 | 
26 | #### A change where a reviewer doesn't have the full context or knowledge required
27 | 
28 | Some changes require different levels of context and knowledge. For example, a PR which involves a lot of CSS changes may not be easily reviewable by a backend developer. They may have the product context required to be able to review the before and after screenshots and say "this looks good to me", but not understand the implications of the code changes. In this instance you can leave a review comment with something like "Looks good to me, but I'd appreciate an additional review from someone with more frontend knowledge". If you can, you should also&nbsp;@mention someone who you think may be better placed to review it. This is essentially registering your review as a half review. Someone else with the other half of the knowledge (or full knowledge) can then merge the PR once they've reviewed it.
29 | 
30 | #### A change that has timing or dependency implications
31 | 
32 | If a change is ready to be reviewed but must wait to be merged for some other event, the title should be prefixed with&nbsp;`[Do not merge]`. A description of what the PR is waiting for should be included in the main description of the PR. When a change like this is reviewed, you can simply approve the Pull Request. It's then up to the author to merge that PR when the correct conditions are met.
33 | 
34 | #### A change from an external contributor
35 | 
36 | We occasionally receive PRs from external contributors who use our code. These will come from forks of the main repo. In the majority of cases, our test suite will not run automatically against these PRs. First, review the code carefully for anything that might be malicious and damaging if run inside our infrastructure. Once you're satisfied, follow Github's instructions to pull the forked branch locally, then push it to origin. This will cause the test suite to run with the original commits, which will cause Github to (hopefully, eventually) green light the original PR. Two people from GDS should review this PR. The first reviewer should approve the PR, and the second reviewer should merge. You should also thank the contributor with an amount of emoji proportional to the time they're saving GDS developers.
37 | 
38 | #### A change where two people worked on the same branch
39 | 
40 | If two members of GOV.UK staff worked on the same branch and individually contributed commits, they can approve each other's work. If they worked as a pair, that pair counts as a single contributor, so someone else should review the work.
41 | 
42 | ### Other considerations
43 | 
44 | 1. When raising a PR, if you feel you don't have full confidence in your change and want a particular review from someone, it's ok to ask for that review in the PR description. For example, a puppet change might warrant a particular review from a member of the Infrastructure team.
45 | 2. It's ok for someone other than the author to merge a PR, particularly if the author is off work. The merger should be confident that the change doesn't have dependencies on other changes, and that it won't break master.
46 | 3. If a PR is particularly good, remember to praise the author for it. Emoji are a great way of showing appreciation for a PR that fixes a problem you've been having, or implements something you've wanted to do for a while.
47 | 4. It's sometimes ok for merges to happen when test suites are failing. This ability is limited to repo administrators and account owners, so ask them if you need them to force a merge. This is particularly useful in a catch-22 situation of two repos with failing test suites that depend on each other.
48 | 
49 | 


--------------------------------------------------------------------------------
/rfc-056-ordered-link-types.md:
--------------------------------------------------------------------------------
 1 | # TL;DR
 2 | 
 3 | Links should be ordered again, but only when the name of the link type communicates that they're ordered.
 4 | 
 5 | # How links work now
 6 | 
 7 | The publishing API allows us to represent links between content items by posting a links hash containing arrays of content ids.
 8 | 
 9 | In our current model, a link stores 3 pieces of information about the relationship between two items:
10 | 
11 | We have [many link types](https://gist.github.com/MatMoore/e047a2807807c960e1f7c5fc3a7e34e3), which are used in different ways by the frontend applications.
12 | 
13 | Links can be updated as part of the publishing workflow, or they can updated separately (for example through content tagger). When we change the links originating from a content item, we make a PATCH request to its links URL, with a JSON object describing&nbsp;_link sets_.&nbsp;For example:
14 | 
15 | The ordering of items within the arrays is arbitrary in these requests: the publishing API deliberately ignores the ordering.
16 | 
17 | # Problem
18 | 
19 | As we've migrated more things to the new publishing platform, we've come across use cases where ordering of links matters, and have had to implement error-prone workarounds.
20 | 
21 | 1. [ link type. There is a natural ordering around age of the child, which helps the user identify the section they need more quickly.  
22 |   
23 | Without the manual override, the sections would be sorted alphabetically, which is generally not very useful unless the user knows the exact name of the section they need. In the absence of more content-specific metadata to order by, manual ordering gets the job done.  
24 | &nbsp;
25 | 2. &nbsp;  
26 | &nbsp;
27 | 3. _ **&nbsp;the links are manually ordered by relevance rather than A-Z.
28 | 
29 | # Proposal&nbsp;
30 | 
31 | 1. It should be possible to define, for each link type, whether the links are ordered or unordered. No change is required to the patch links payload.  
32 | &nbsp;
33 | 2. Unordered link sets will be handled in the same way we handle all link sets now.  
34 | &nbsp;
35 | 3. When a patch links request operates on ordered links, the publishing API must retain the ordering of the array when persisting the links to its database and loading them, and the expanded links passed to the content store should retain this ordering.  
36 | &nbsp;
37 | 4. Whether the link type is ordered or unordered should be conveyed to the frontend applications. We suggest that the ordered/unordered distinction is conveyed by a naming convention: link types are unordered by default unless they start with the string "ordered\_". For example, to introduce ordering on related links, we would introduce a new link type, "ordered\_related\_links".
38 | 
39 | ## Semantics
40 | 
41 | - Ordered links are intended to be used as a curation mechanism when there is a natural ordering to a link set that is not possible to infer from metadata of the target content items.  
42 | &nbsp;
43 | - Ordered links should not be used if the ordering can be inferred from the content itself by comparing attributes of the expanded links hash; for example, A-Z ordering by title.  
44 | &nbsp;
45 | - Frontend applications shouldn't make any assumptions about what ordered links are ordered by; publishers should be free to choose an appropriate ordering for their content; for example, "early years comes before schools".  
46 | &nbsp;
47 | - Using an ordered link type shouldn't place any restrictions on frontend rendering. A rendering app could sort the links differently; for example, by providing alternate views that show recently updated pages first.  
48 | &nbsp;
49 | - The PATCH semantics of publishing API will be unchanged: it is not possible to change part of a link set without sending the entire thing to the publishing API.  
50 | &nbsp;
51 | - Frontend applications should use _ordered\_foo_ over&nbsp;_foo_ if both are available.&nbsp;  
52 | &nbsp;
53 | - Setting a link\_set of type&nbsp;_ordered\_foo_&nbsp;should automatically clear links with type&nbsp;_foo.  
54 | &nbsp;_  
55 | - Setting a link\_set of type&nbsp;_foo_ when a link set of type&nbsp;_ordered\_foo_ exists is allowed, so that publishing apps have the option to capture the ordering in a separate step.  
56 | &nbsp;
57 | 
58 | - Ordered link types should not contain unordered link sets. Instead, use two link types, one ordered and one unordered. It should be obvious from the user interface what is being captured; for example, collections publisher presents ordering as an extra step:
59 | 
60 | # Alternatives rejected by this proposal
61 | 
62 | ## Making all link arrays ordered
63 | 
64 | Essentially reverting&nbsp;
65 | 
66 | - This has fuzzy semantics - it becomes unclear if the ordering is something the user intended or just the order they happened to enter it
67 | 
68 | ## Changing the format of the links hash to include additional metadata
69 | 
70 | Previously rejected in&nbsp;
71 | 
72 | This is more flexiblebut also more work to implement.&nbsp;
73 | 
74 | - Difficult to make backwards compatible
75 | - Lots of publishing apps would need to change
76 | - Lots of frontend apps would need to change
77 | 
78 | ## Keeping links unordered and adding a new container for ordered links
79 | 
80 | For example, adding a links\_metadata hash to the content store representation, while supporting the existing links hash as a fallback.
81 | 
82 | - 
83 | 
84 | Apps are forced to handle both cases separately, so there is more room for error
85 | 
86 | - 
87 | 
88 | Doesn't provide any additional benefit over RFC 38
89 | 
90 | # Related RFCs
91 | 
92 | - 
93 | - 
94 | - 
95 | 
96 | 


--------------------------------------------------------------------------------
/rfc-163-cdn-rationalisation.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-04
 5 | ---
 6 | 
 7 | # Simplify our CDN config, and standardise on Terraform for deployment
 8 | 
 9 | ## Summary
10 | 
11 | - Where possible, relocate logic from the CDN layer to other parts of the stack
12 | - Avoid adding new VCL unless absolutely necessary
13 | - Standardise on Terraform for CDN configuration
14 | 
15 | ## Problems
16 | 
17 | We currently configure our Fastly services using [Fastly's custom fork](https://developer.fastly.com/learning/vcl/using/) of the VCL domain-specific language, within [govuk-fastly](https://github.com/alphagov/govuk-fastly).
18 | 
19 | We also have a work-in-progress CloudFront distribution that we can use for failover in the event that Fastly experiences an outage. This is due to be merged soon in alphagov/govuk-infrastructure#929.
20 | 
21 | In its current state, our CDN configuration is highly complex and does many things, making it brittle and difficult to maintain, and hampering efforts to ensure parity between the two CDNs.
22 | 
23 | Its current approach to testing also leaves a lot to be desired. In its previous guise as `govuk-cdn-config`, we simply [ran RuboCop over the repo](https://github.com/alphagov/govuk-cdn-config/blob/5bff7b9d3b7ef51b493bb00e609fc714da2dc67a/Rakefile#L8), [verified that the VCL rendered from our ERB templates matched what we expected](https://github.com/alphagov/govuk-cdn-config/blob/5bff7b9d3b7ef51b493bb00e609fc714da2dc67a/spec/www_vcl_erb_spec.rb), and [verified that our hand-written Ruby deploy scripts work](https://github.com/alphagov/govuk-cdn-config/blob/5bff7b9d3b7ef51b493bb00e609fc714da2dc67a/spec/deploy_service_spec.rb). The new repo, `govuk-fastly`, currently lacks tests and continuous integration entirely.
24 | 
25 | The lack of feature parity between the two CDNs presents us with several issues:
26 | 
27 | - Lack of clarity or confidence around what works in Cloudfront (e.g. developers have to be aware of this when building things, SMT need to be reminded of impact failover will have, we may be more hesitant in triggering the failover, etc.)
28 | - Complexities also introduced downstream (e.g. if we failover and A/B testing stops working, data analysts now need to scrub out the failover period in their analysis)
29 | - We're less likely to drill the failover in Production to ensure it works (as it will cause a somewhat degraded experience)
30 | - It restricts our ability to consider a multi CDN strategy where traffic is split between two CDNs simultaneously 
31 | 
32 | A number of the things that we currently handle at the CDN level might perhaps be better handled at other places in the stack. By moving things out of the CDN layer, we could minimise the amount of duplicated effort in maintaining equivalent but different configurations for each CDN.
33 | 
34 | Fastly have recently introduced an edge compute platform called [Compute@Edge](https://www.fastly.com/products/edge-compute). This platform would allow us to express our CDN logic in a programming language, using an SDK provided by Fastly, instead of having to grapple with VCL. This could make it easier to maintain the functionality that can't be relocated to other parts of the stack, as well as allowing us to set up integration tests for our CDN logic. We are not (yet) proposing migrating to Compute@Edge, but if and when we decide to do so, it will be easier if there is less functionality to migrate.
35 | 
36 | Across GDS we are standardising on infrastructure as code, and the use of Terraform to describe this infrastructure. We should standardise on Terraform for CDN deployment across all of our services.
37 | 
38 | ## Proposal
39 | 
40 | ### Where possible, relocate logic from the CDN layer to other parts of the stack
41 | 
42 | Please see the [appendix](rfc-163/relocating-logic-from-cdn.md) for specific examples of what functionality we are proposing to move, and where we are proposing to move it to. Note however that this is not an exhaustive list, and is subject to change; the main point is that we want to move as much functionality out of our CDN services as possible, to make them easier to maintain.
43 | 
44 | ### Avoid adding new VCL unless absolutely necessary
45 | 
46 | See section above. Wherever possible, we should consider where else our logic might live, whether that's at the application layer, the Router layer or the WAF layer. Now that we've started using Terraform to manage our Fastly configuration, some features may also be able to be represented in Terraform rather than VCL.
47 | ### Standardise on Terraform for CDN configuration
48 | 
49 | Platform Engineering have already migrated most of our Fastly services to Terraform, with the new code living in [`govuk-fastly`](https://github.com/alphagov/govuk-fastly) and [`govuk-fastly-secrets`](https://github.com/alphagov/govuk-fastly-secrets). The service domain redirect service (which handles redirecting from https://service.gov.uk to https://www.gov.uk), and the TLD redirect service (which handles redirecting from https://gov.uk to https://www.gov.uk) should also be migrated to this new repo.
50 | 
51 | The data.gov.uk Fastly services are [already deployed with Terraform](https://github.com/alphagov/govuk-aws/tree/main/terraform/projects/fastly-datagovuk) - we may want to consider migrating this code from `govuk-aws` to `govuk-fastly`, to keep all of our Fastly configuration in one place.
52 | 
53 | The Apt service should no longer be needed once all of our EC2 infrastructure has been decommissioned, and so it makes little sense to refactor or migrate this code.
54 | 
55 | ## [Appendix: Proposal for relocating logic from the CDN layer to other parts of the stack](rfc-163/relocating-logic-from-cdn.md)
56 | 


--------------------------------------------------------------------------------
/rfc-186-replacement-of-email-based-fact-checking-process-in-publisher.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: proposed
 3 | implementation: proposed
 4 | status_last_reviewed:
 5 | ---
 6 | 
 7 | # Replacement of Email-based Fact Checking process in Publisher
 8 | 
 9 | ## Summary
10 | 
11 | Replace Publisher’s current email-based fact check workflow with a secure, auditable system that supports multiple reviewers and reduces manual effort. 
12 | 
13 | ## Problem
14 | 
15 | The supporting email service is being deprecated, and its unreliability has already created ongoing toil for the Mainstream team (e.g. issues in email processing).
16 | The current workflow also introduces multiple user pain points, most of which could be resolved by creating a dedicated tool.
17 | 
18 | As a team, we have a broad understanding of the fact check process, [User Research](https://docs.google.com/presentation/d/1Krl4a7owEX-6C2afnLRVJY51vMUe_BtP-GLREqI0R2k/edit?slide=id.g34114577801_0_0#slide=id.g34114577801_0_0) has been conducted, giving us clear indication of main user pain points. We have created a [prototype](https://govuk-mainstream-publishing-e8a90a952334.herokuapp.com/fact-check-prototype/index) and visualized main concerns on [Mural board](https://app.mural.co/t/govukdelivery7534/m/govukdelivery7534/1757931122471/7df9fba94e61240ef8901505f35d59e6f77b9542), as well as shared [Scoping document](https://docs.google.com/document/d/12Tiq7_rnRMDqwXWAjD6COklpEzvXtaIJeYjtUKGnbS0/edit?tab=t.0#heading=h.m3oovrysj2ps) with other teams.
19 | 
20 | 
21 | ## Proposal
22 | 
23 | This RFC summarises technical implementation options for a replacement workflow. From all available approaches, we are now focusing on three options that best address user, security, and performance needs. Other options considered but not pursued are listed at the end of this document.
24 | Objectives of the new process:
25 | 
26 | - Allow SMEs to review and respond to fact checks without using email.
27 | - Reduce toil and improve usability for content designers and reviewers.
28 | - Handle multiple parallel or sequential responses.
29 | - Provide a clear audit trail of who approved and who requested changes.
30 | - Support multiple reviewers with structured feedback.
31 | - Meet security requirements via Signon or magic-link authentication.
32 | - Enable visibility of content for potential future integration with Zendesk.
33 | 
34 | ### Options
35 | 
36 | We are considering three options, all of which meet the objectives above:
37 | 
38 | 1. Tool integrated directly into the Mainstream Publisher app.
39 | 2. Standalone Fact Check Application.
40 | 3. Rails Engine within Publisher.
41 | 
42 | #### Option 1: Tool integrated into the Mainstream Publisher app
43 | 
44 | This option would involve building the fact check functionality directly into the existing Mainstream Publisher application. The main advantage is that it would be fully integrated with Publisher’s business logic, eliminating the need for a separate application or any additional API integrations. However, this approach would add complexity to the Publisher codebase and increase the long-term maintenance burden. It would also raise security risks by broadening external access to Publisher.
45 | 
46 | #### Option 2: Standalone Fact Check Application
47 | 
48 | A standalone application would provide a clear separation of concerns, making the fact check process distinct from Publisher and potentially more scalable for future needs. It could also, in theory, be reused by other GOV.UK publishing apps, although this is of limited value at present since we do not expect other apps to adopt the same process. The downsides of this approach are the high development and integration effort required, the added complexity of keeping data in sync between systems, and the need for additional infrastructure and ongoing maintenance.
49 | 
50 | #### Option 3: Rails Engine within Publisher
51 | 
52 | The third option is to build the functionality as a Rails Engine within Publisher. This would allow us to reuse Publisher’s existing code and logic, reducing the implementation effort, while also providing a design that can minimise internal complexity if done carefully. A further benefit of this approach is its flexibility: the engine could later be extracted into a standalone service if reuse across other applications became necessary. The disadvantages are the need for careful management of boundaries between the engine and the rest of the application, as well as the initial setup effort and some ongoing maintenance overhead.
53 | 
54 | ### Recommendation
55 | Pending further team discussions. We are inclined towards Option 3 (Rails Engine). This approach balances effort and feasibility: it is faster to deliver than a standalone application, while leaving the door open to evolve into a separate service later if reuse demands increase.
56 | 
57 | #### Options considered but not pursued
58 | 
59 | Tokenized Interface (no login) / Secure Link + Response Form
60 | 
61 | This option would have been straightforward to implement, but it introduces several issues. Security risks are significant, as tokens could be leaked or shared. Attribution would be weak, relying on anonymous submissions or manual name entry. The approach also provides poor support for multiple reviewers, making it unsuitable for the needs identified.
62 | 
63 | Integration with External Tools / Email Proxies (e.g. Microsoft tools)
64 | 
65 | Another option was to leverage tools already familiar to SMEs, such as Slack, Teams, or Zendesk, and feed responses back into Publisher. While this could reduce adoption friction, it would still rely on external infrastructure and would not provide the necessary auditability. In practice, it risks creating additional operational toil and producing an inconsistent user experience across departments.
66 | 
67 | 


--------------------------------------------------------------------------------
/rfc-146-production-deploy-access.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-04
 5 | ---
 6 | 
 7 | # RFC 146 - Implement "Production Deploy Access"
 8 | 
 9 | ## Summary
10 | 
11 | GOV.UK engineers currently have one of two levels of access: "Integration access" or "Production access".
12 | 
13 | This RFC proposes a middle level of access - "Production deploy access" - which allows engineers to deploy code but not administer related systems. This level of access should be granted to both civil servants and contractors as needed.
14 | 
15 | Engineers should be given "Production deploy access" once they've been working with us long enough for their tech lead to have confidence they understand how to use the access and where to get help if they need it.
16 | 
17 | To avoid confusion, the existing "Production" access level should be renamed to "Production Admin" access.
18 | 
19 | ## Problem
20 | 
21 | There is [a defined set of rules for getting production access](https://docs.publishing.service.gov.uk/manual/rules-for-getting-production-access.html) which requires engineers to do two 2ndline support shifts before they can be granted access.
22 | 
23 | This can take a long time even in the best case, and doesn't work well in several common situations:
24 | 
25 | - contractors tend not to do 2ndline shifts, as it's outside their scope of work
26 | - we make ad hoc access exceptions for senior hires (such as lead technical architects) who need some access to systems, but don't necessarily need full production admin access
27 | - experience of doing 2ndline shifts is not necessarily the best preparation for working with production systems, particularly for frontend developers
28 | - our process for when junior developers and apprentices should be given production access isn't clear
29 | 
30 | There have been numerous cases where teams with only a few people with production access are slowed down because deployments become a bottleneck.
31 | 
32 | [RFC 128 - Continuous Deployment](https://github.com/alphagov/govuk-rfcs/blob/main/rfc-128-continuous-deployment.md) has in some ways made the situation worse. We wanted to make sure that rolling out continuous deployment didn't inadvertently weaken a policy, so we disabled merge access for people without production access on repositories which were continuously deployed.
33 | 
34 | There were [attempts at formalising this process around 2016](https://docs.google.com/document/d/1lo1JAkFIeWrIl-7bzkzps8rYi2hscV6IOpDzc8dhFLs/edit#). Unfortunately we didn't go into detail on why 2ndline shifts were required. The theory is that the logic went something like this:
35 | 
36 | - In the early days of GOV.UK our deployment pipeline and monitoring was immature
37 | - This meant that debugging issues with deployments often required ssh access
38 | - In turn, this meant that only people with ssh access were allowed to do deployments
39 | - Because it's easy to make mistakes when using highly privileged access through ssh, we required people to do a couple of 2ndline shifts to make sure they knew what they were doing
40 | 
41 | Note that by this logic, the 2ndline requirements are defending against accidental mistakes, not against malicious access. We have separate defences against malicious employees (security clearance, probation etc.).
42 | 
43 | ## Proposal
44 | 
45 | We should implement a new level of access - "Production Deploy Access". This level of access should include:
46 | 
47 | - Permission to deploy apps in Jenkins (ideally without Jenkins admin permission)
48 | - Permission to merge continuously deployed applications
49 | - Readonly access to AWS, Fastly, logging systems, etc.
50 | - normal access to GOV.UK Signon in production (with app permissions granted as needed)
51 | 
52 | This level of access should not include:
53 | 
54 | - ssh permission to production or staging
55 | - admin access to the Deploy Jenkins in production or staging
56 | - admin or poweruser access to AWS
57 | - admin access to other systems (e.g. Fastly etc.)
58 | - superadmin access to GOV.UK Signon (create and edit all user types and edit applications)
59 | 
60 | Access should be granted at the discretion of the engineer's tech lead, once the engineer has the required level of security clearance. Before approving access, tech leads should ensure that the engineer:
61 | 
62 | - is aware of our processes and standards around code review
63 | - understands the responsibilities that releasing code brings with it
64 | - knows how to roll back to an older release if there are any issues
65 | - knows how to get help from someone with more access if they need it
66 | 
67 | These rules should be the same for contractors and civil servants at all levels, including juniors and apprentices.
68 | 
69 | The rules for gaining Production Admin Access should remain the same - engineers will still be required to complete two 2ndline shifts.
70 | 
71 | ## Consequences
72 | 
73 | We will update the [Rules for Getting Production Access](https://docs.publishing.service.gov.uk/manual/rules-for-getting-production-access.html) to match the outcome of this RFC.
74 | 
75 | We will create a new GitHub Team for the middle level of access, so we'll have "GOV.UK", "GOV.UK Deploy", and "GOV.UK Production".
76 | 
77 | We will set up the staging and production deploy Jenkins instances to allow people in the GOV.UK Deploy GitHub Team to trigger builds.
78 | 
79 | The bottleneck of needing people with production access to merge and deploy code will be substantially widened, since it will be much easier to give people deploy access.
80 | 
81 | ## Appendices
82 | 
83 | ### Access Levels
84 | 
85 | [This spreadsheet gives the levels of access we expect people to have](https://docs.google.com/spreadsheets/d/1oqy7tKpB8mHBhHQ9jAZu0NR0GKKZXOqtQGBKHYVnpmk/edit#gid=0).
86 | 


--------------------------------------------------------------------------------
/rfc-108-including-gem-component-assets.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-04
 5 | ---
 6 | 
 7 | # Include specific component assets in applications
 8 | 
 9 | ## Summary
10 | 
11 | The govuk_publishing_components gem provides shared components to frontend applications on GOV.UK - chunks of consistent and reusable frontend such as buttons and form elements.
12 | 
13 | The gem adds the CSS and JS for all components into an application, even if they're not used. This RFC proposes changing the gem to allow only required components to be included, to reduce unnecessary page weight.
14 | 
15 | ## Problem
16 | 
17 | The [current method for including the components](https://github.com/alphagov/govuk_publishing_components/blob/master/docs/install-and-use.md#4-include-the-assets) in a frontend application (such as `government-frontend`) is to include `_all_components.scss` and `_all_components_print.scss` in the Sass of that application. This has some benefits:
18 | 
19 | - once the application is configured to use the gem, no further configuration is needed
20 | - all components are automatically included in the application
21 | 
22 | This also has some detriments:
23 | 
24 | - all components are included regardless of whether they are used, so unused CSS/JS will be included
25 | - only publically facing components can be added to the gem, or unused CSS/JS would be included (we now have components that need to be shared across only private applications that use the gem)
26 | 
27 | ## Proposal
28 | 
29 | Instead of including all of the component Sass, each application should be configured to only include the components that it needs. For example, in an applications `_application.scss`:
30 | 
31 | ```
32 | // currently
33 | @import 'govuk_publishing_components/all_components';
34 | ```
35 | 
36 | ```
37 | // proposed
38 | @import "govuk_publishing_components/component_support";
39 | @import 'govuk_publishing_components/components/accordion';
40 | @import 'govuk_publishing_components/components/button';
41 | // (etc.)
42 | ```
43 | 
44 | The component_support sass file would include any needed `govuk-frontend` imports plus any mixins or variables from sass in the gem. A similar solution could be applied to print styles for components.
45 | 
46 | This solution would allow us to add any components we like to the gem without worrying that they are needlessly adding page weight to the publicly facing GOV.UK.
47 | 
48 | The component guide will not change as it imports the sass for components itself. However it cannot give full confidence that a component will appear correctly in an application, see the isolation section below.
49 | 
50 | We will configure the gem so that both this new approach to consuming Sass and the old approach co-exist, so we don't have to upgrade all the apps at once. The old approach will then be deprecated as we migrate apps.
51 | 
52 | ### Javascript
53 | 
54 | We should also be able to include only the required Javascript in each application, using a similar mechanism to the above. JS modules will be initialised using the existing code for initialisation.
55 | 
56 | Currently some component JS relies upon jQuery and some does not. We will therefore need to include jQuery if component JS is required (eventually we plan to remove this dependency).
57 | 
58 | JS tests should remain the same. Tests will exist for each component and when the tests are run in the gem all tests should be run.
59 | 
60 | We will configure the gem so that both this new approach to consuming JS and the old approach co-exist, so we don't have to upgrade all the apps at once. The old approach will then be deprecated as we migrate apps. If a change to the JS proves problematic, we could continue to use the current approach - this should not block the rollout of changes to the Sass model.
61 | 
62 | ### Finding which components are in use
63 | 
64 | Adding a not-in-use component into an application with this new model would be a relatively safe procedure, as the developer would notice immediately if the styles and Javascript for that component had not been included in the application. For the future we will need a tool that avoids leaving components unstyled or non-functional.
65 | 
66 | We can modify the component guide to tell us which components are in use by the current application.
67 | 
68 | We could also write a test for that application to ensure the right assets are being included. This test would need to be kept up to date as components are added and removed.
69 | 
70 | ### Certainty that a component renders correctly in isolation
71 | 
72 | If the gem is changed as proposed it would be helpful to test that all components will render correctly in isolation, but we don't know how to solve this problem now. We will rely on manual testing and vigilance until a better solution is found.
73 | 
74 | One option could be to build a page in the component guide with each component rendered in isolation using iframes, and apply visual regression testing to it. It's worth noting that components have in the past rendered correctly in the guide but not in applications due to conflicts with an application's styles, so testing in isolation cannot provide 100% certainty that a component will work outside of the guide.
75 | 
76 | ## Benefits and drawbacks
77 | 
78 | Benefits:
79 | 
80 | - the size of the CSS and JS included in each application is reduced (assuming the application uses some but not all of the components. If it uses all of the components then there's no change)
81 | - possible compilation performance benefits
82 | - will be backwards compatible, so we can roll it out in our own time
83 | - component guide remains the same
84 | - we will be able to add components into the gem that are not used by the publicly facing GOV.UK, without any negative impacts
85 | 
86 | Drawbacks:
87 | 
88 | - inconsistent approaches to JS and Sass
89 | - no means to test a component is styled correctly in isolation
90 | 


--------------------------------------------------------------------------------
/rfc-093-retire-govuk-cdn-logs-monitor.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | status: accepted
  3 | implementation: done
  4 | status_last_reviewed: 2024-03-04
  5 | ---
  6 | 
  7 | # Retire GOV.UK CDN Logs Monitor
  8 | 
  9 | ## Summary
 10 | 
 11 | This RFC proposes the retiring of the application
 12 | [GOV.UK CDN Logs Monitor][cdn-logs-repo] for the following reasons:
 13 | 
 14 | - We do not make use of the data it generates;
 15 | - It is a complicated tool that few GOV.UK developers understand the purpose of;
 16 | - Similarly, due to lack of knowledge we don't know if the data is accurate
 17 |   when problems occur;
 18 | - Since there is a very infrequent need to use the application there isn't
 19 |   much justification to invest in developers learning it;
 20 | - It uses a significant amount of disk space that requires maintenance.
 21 | 
 22 | The purpose of this RFC is to make the case that we should retire it based off
 23 | the knowledge we have. We are hoping that by circulating this suggestion
 24 | across the wider GOV.UK technical community we can identify any issues that
 25 | we haven't yet identified.
 26 | 
 27 | ## Background
 28 | 
 29 | We understand that the original motivation for creating GOV.UK CDN Logs Monitor
 30 | was in response to an incident and it was intended as a means to monitor
 31 | when URLs on GOV.UK change from responding with a success (2xx) status code to a
 32 | different one.
 33 | 
 34 | A summary of the responsibilities of the application is as follows:
 35 | 
 36 | - Monitor the log files that Fastly sends directly to the box via syslog.
 37 | - Increments statsd counters for the amount of responses the Fastly is serving
 38 |   with a particular status code.
 39 |   E.g. stats.govuk.app.govuk-cdn-logs-monitor.logs-cdn-1.status.200
 40 | - Increments statsd counters for which backend (e.g. origin or mirror) are
 41 |   serving a request.
 42 | - It outputs data to stdout - which subsequently goes to logit - of any
 43 |   requests that are not served by origin or the CDN itself.
 44 | - On a nightly basis various log files are assembled which:
 45 |   - Count how many times a path was accessed via a particular method and
 46 |     backend in an hour.
 47 |   - Store a list of all paths that were accessed successfully that day.
 48 | 
 49 | There is more in-depth documentation in the [repo][repo-docs].
 50 | 
 51 | ## Comparative data
 52 | 
 53 | There does not appear to be any tools that are monitoring the graphite data
 54 | that statsd populates. This was checked by searching govuk_puppet for any
 55 | references to the govuk-cdn-logs-monitor namespaces.
 56 | 
 57 | We do monitor similar graphite databases for CDN health by utilising
 58 | `monitoring-1_management.cdn_fastly-govuk.requests-status_*` which are fed by
 59 | collectd usage of Fastly API.
 60 | 
 61 | We don't appear to have an equivalent statistic to the one provided which
 62 | tracks requests per backend - presumably though we could collate this if needed
 63 | by comparing other graphite sources.
 64 | 
 65 | If we were to turn off this application we would not have the CDN requests
 66 | sent to logit. However we would suggest that the ones we have now are a source
 67 | of confusion as it is not clear why only some reach logit.
 68 | 
 69 | The most likely sources we have for similar data relating to when paths changed
 70 | from a successful status code to an unsuccessful one are Google Analytics,
 71 | Logit, and access to the raw CDN logs. We are not aware of anyone making use of
 72 | the files produced by this application with this data.
 73 | 
 74 | The [Future steps](#future-steps) section of this document explores using
 75 | [AWS Athena][] as a means to query for the data sources that are lost.
 76 | 
 77 | ## Data usage
 78 | 
 79 | This application currently uses 413GB on `logs-cdn-1` and stores > 100GB of
 80 | Graphite databases on `graphite-1`. A significant portion of the graphite storage
 81 | is due to unnoticed misconfiguration.
 82 | 
 83 | ## Proposal
 84 | 
 85 | If we are to gain consensus through this RFC that it is beneficial to retire
 86 | this application we will intend to remove it from GOV.UK architecture and
 87 | archive application data associated with it.
 88 | 
 89 | An earlier draft of this RFC suggested that we could retire the machine that
 90 | hosts this application, however it has been since learnt that the applications,
 91 | [transition-stats][] and [pre-transition-stats][] are hosted on this machine.
 92 | Thus the retiring of the machine is now considered outside the scope of this
 93 | RFC.
 94 | 
 95 | The earlier draft, based on the expectation of removing the machine, suggested
 96 | moving the drafts fully to S3 however this has now been revised to storing
 97 | them both on S3 and the Logs CDN machine. This is in the view that S3 should
 98 | become the definitive source but to avoid the disruption of removing a service
 99 | that may be needed. In light of this the suggestion is to reduce the storage
100 | on the Logs CDN machine from 30 days to 7 days.
101 | 
102 | The revised steps of this proposal are now:
103 | 
104 | - Stop running the application through a configuration change in govuk_puppet,
105 |   then allow time to see if we are alerted to any services or monitoring systems
106 |   that break due to the lack of data
107 | - Create an S3 bucket which can be used to store the data that will be removed
108 |   from logs-cdn-1
109 | - Prune the data from Graphite
110 | - Apply Fastly to send the CDN logs to an S3 bucket in addition to logs-cdn-1
111 | - Remove the application and associated services from govuk_puppet
112 | - Archive the [GOV.UK CDN Logs Monitor][cdn-logs-repo] repository
113 | 
114 | ## Future steps
115 | 
116 | By hooking the eventual S3 bucket into [AWS Athena][] we can set up a query
117 | interface to search the logs which should provide answers to a number of the
118 | queries that we we hoped GOV.UK CDN Logs Monitor would answer.
119 | 
120 | [cdn-logs-repo]: https://github.com/alphagov/govuk-cdn-logs-monitor
121 | [repo-docs]: https://github.com/alphagov/govuk-cdn-logs-monitor/blob/master/docs/design.md
122 | [AWS Athena]: https://aws.amazon.com/athena/
123 | [transition-stats]: https://github.com/alphagov/transition-stats/
124 | [pre-transition-stats]: https://github.com/alphagov/pre-transition-stats/
125 | 


--------------------------------------------------------------------------------
/rfc-142-add-interest-cohort-permssion-policy-header.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-04
 5 | ---
 6 | 
 7 | # Add Interest cohort permission policy header
 8 | 
 9 | ## Summary
10 | 
11 | Federated Learning of Cohorts is a new technology that exists in versions of Chrome 89+. Its purpose is to replace third party cookies used for tracking and targeting advertising by assigning Chrome users to groups (or cohorts) depending on their browser history. These cohorts can be used for targeted advertising by Google and other advertising networks.
12 | 
13 | Once enabled, FLoC [calculates a hash based on a user's browser history](https://raw.githubusercontent.com/google/ads-privacy/master/proposals/FLoC/FLOC-Whitepaper-Google.pdf). This hash is synced with the Google servers for the [FLEDGE phase](https://adtechexplained.com/fledge-explained/) of the experiment and is used alongside other Chrome metrics when a FLoC ID is created. Chrome downloads a global FLoC data set to examine if the FLoC ID should be used for exposing ads to a specific user. The FLEDGE phase involves collecting data from [users who meet the following criteria](https://github.com/WICG/floc#qualifying-users-for-whom-a-cohort-will-be-logged-with-their-sync-data) to help train the algorithms they have written for assigning users to interest cohorts. During final implementation the FLoC ID will be generated on a user's device with no server exchanges taking place (for the hash phase).
14 | 
15 | ## Problem
16 | 
17 | Although documentation mentions the use of sensitive categories in the FLoC specification, it openly admits that:
18 | 
19 | > Some people are sensitive to categories that others are not, and there is no globally accepted notion of sensitive categories…. It should be clear that FLoC will never be able to prevent all misuse.
20 | 
21 | At the moment, since this is a small origin trial it is simply opt-in for certain origins. These origins will most likely use advertising, hence why they have signed up to test it. But it is unknown if this will be the case in the future.
22 | 
23 | In the short term the addition of FLoC actually increases the amount of information available to external observers, as mentioned in [this article from Mozilla](https://blog.mozilla.org/en/mozilla/privacy-analysis-of-floc/):
24 | 
25 | > It’s possible for a tracker with a significant amount of first-party interest data to operate a service which just answers questions about the interests of a given FLoC ID.
26 | 
27 | FLoC benefits those observers with lots of first-party data (i.e. Google) and can be far more beneficial than cookies. It allows those companies to understand new interests without the need for trackers to be present on all sites a user visits, but the key here is that they already are aware of the tie between that cohort <> interest via first-party data.
28 | 
29 | FLoC also circumvents some mitigations already put in place to limit data shared across domains. Browsers are rolling out partitioned storage, which restricts observers to only see information from their single domain. As the FLoC ID is generated from the browser history from a range of sites, FLoC actually results in the leaking of more data. This results in the observer seeing more information than they would without FLoC enabled. See “FLoC leaks more information than you want” in [Mozilla’s Privacy analysis of FLoC]( https://blog.mozilla.org/en/mozilla/privacy-analysis-of-floc/).
30 | 
31 | It’s also worth noting that privacy lobbies are concerned with FLoC. The Electronic Frontier Foundation (EFF) published articles “[Don't Play in Google's Privacy Sandbox](https://www.eff.org/deeplinks/2019/08/dont-play-googles-privacy-sandbox-1)” and “[Google’s FLoC Is a Terrible Idea](https://www.eff.org/deeplinks/2021/03/googles-floc-terrible-idea)”. They highlight privacy issues related to fingerprinting, Cross-context exposure (as mentioned above), as well as other areas they are concerned about beyond privacy. These including how targeted advertising can be used for for exploitation, discrimination, and harm.
32 | 
33 | Since the FLoC specification specifically mentions that it “will never be able to prevent all misuse”, and the GOV.UK domain contains guidance and links to subjects that would be considered very sensitive for users, e.g.:
34 | 
35 | - Universal credit login
36 | - Bankruptcy
37 | - Birth, marriage, deaths
38 | - Disability
39 | - Crime, justice and the law
40 | 
41 | The FLoC specification does suggest that for sensitive categories:
42 | 
43 | > As a first mitigation, the browser should remove sensitive categories from its data collection. But this does not mean sensitive information can’t be leaked.
44 | 
45 | However it does not guarantee sensitive information can't be leaked.
46 | 
47 | To ensure information isn't leaked, an [explicit header can be set](https://github.com/WICG/floc#opting-out-of-computation).
48 | 
49 | ## Proposal
50 | 
51 | To safeguard our users' sensitive browsing data, GOV.UK should explicitly set this header.
52 | 
53 | We have 3 options for adding the header to the HTTP response: on the CDN, in the application, and in Nginx.
54 | 
55 | - **Configure in CDN** - configure Fastly so it sends the header on each request.
56 | - **Configure in app** - configure the Rails apps to send the header.
57 | - **Configure in Nginx** - configure Nginx so it sends the header on each request.
58 | 
59 | ### Trade-offs
60 | 
61 | | | Configure in CDN | Configure in app | Configure in Nginx |
62 | | --- | --- | --- | --- |
63 | | Deployment | The CDN is relatively easy and fast to deploy | Slow to roll out and iterate. We'd probably add it to `govuk_app_config`, which requires a version bump in multiple applications. Allows staged rollout. | Slow deployments via Puppet |
64 | | Policies | The header is set consistently for all of the requests, even ones that aren't served from a Rails app like [Licensing](https://github.com/alphagov/licensify) | Allows us to target per app and potentially per document type | Allows setting the header for the frontend apps and publisher apps |
65 | 
66 | ### Preferred option
67 | 
68 | We'll configure the following header in NGINX to ensure we cover frontend apps as well as publishing apps across GOV.UK.
69 | 
70 | ```
71 | Permissions-Policy: interest-cohort=()
72 | ```
73 | 
74 | For applications on the PaaS, we'll set the header at the application level.
75 | 


--------------------------------------------------------------------------------
/rfc-138-enable-brotli-compression.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-04
 5 | ---
 6 | 
 7 | # Enable Brotli compression on GOV.UK
 8 | 
 9 | ## Summary
10 | 
11 | Brotli compression is a new compression algorithm that is available in modern browsers that can offer 10-20% better compression over gzip compression. By enabling this option on the CDN users with browsers that support it will receive smaller static files over the wire (HTML, CSS, JavaScript), and those that don't will continue to receive assets that are gzipped.
12 | 
13 | ## Problem
14 | 
15 | On GOV.UK we should always be looking to reduce the number of bytes a user has to download to view our pages. This is both good for web performance and data consumption on user devices. Any browser that supports WOFF2 fonts will also support Brotli compression (since this is the compression used in the fonts). According to the latest stats from [caniuse.com](https://caniuse.com/brotli), the Brotli `Accept-Encoding`/`Content-Encoding` functionality is supported by 95% of user browsers globally, and I expect this figure will actually be bigger if we only consider UK based users:
16 | 
17 | ![Image of browser support for the brotli compression header.](rfc-138/brotli-usage-caniuse.png)
18 | 
19 | This compression has been enabled on both integration and staging. We saw the following savings in terms of total page bytes for HTML, CSS, and JavaScript across the frontend applications of GOV.UK:
20 | 
21 | | Page                               | Application             | Before (b) | After (b) | Diff (%) |
22 | |------------------------------------|-------------------------|------------|-----------|----------|
23 | | /                                  | frontend                | 259,371    | 248,653   | -4.13    |
24 | | /coronavirus                       | info-frontend           | 288,650    | 272,687   | -5.53    |
25 | | /browse/driving/driving-licences   | collections             | 278,385    | 265,832   | -4.51    |
26 | | /contact                           | feedback                | 242,047    | 232,124   | -4.10    |
27 | | /search/all                        | finder-frontend         | 313,400    | 295,979   | -5.56    |
28 | | /bank-holidays                     | frontend                | 263,686    | 251,555   | -4.60    |
29 | | /national-minimum-wage-rates       | government-frontend     | 266,699    | 254,073   | -4.73    |
30 | | /info/coronavirus/business-support | info-frontend           | 233,133    | 226,382   | -2.90    |
31 | | /licence-finder/sectors            | licence-finder          | 242,555    | 234,969   | -3.13    |
32 | | /guidance/immigration-rules        | manuals-frontend        | 250,431    | 240,763   | -3.86    |
33 | | /service-manual/service-standard   | service-manual-frontend | 254,529    | 245,881   | -3.40    |
34 | | /additional-commodity-code/y       | frontend                | 250,664    | 240,974   | -3.87    |
35 | | /government/people/theresa-may     | whitehall               | 277,888    | 265,642   | -4.41    |
36 | 
37 | 
38 | These savings for the pages tested converted into the following performance improvements. All tested on a simulated Moto G4 mobile on a 3G connection.
39 | 
40 | ### Homepage
41 | 
42 | The visual progress of the homepage has improved by approximately 110 ms as can be seen in the visual progress graph:
43 | 
44 | ![Visual progress of the homepage using both compression methods.](rfc-138/visual-progress-homepage.png)
45 | 
46 | And we see the a reduction in bytes for all the expected assets:
47 | 
48 | ![bytes for assets compared](rfc-138/total-bytes-homepage.png)
49 | 
50 | ### Coronavirus page
51 | 
52 | The visual progress graph for this page starts off worse by 400ms, but quickly catches up and the viewport for Brotli completes rendering 200ms before gzip:
53 | 
54 | ![visual progress graph for the covid page.](rfc-138/visual-progress-covid.png)
55 | 
56 | And we see the a reduction in bytes for all the expected assets:
57 | 
58 | ![bytes for assets compared](rfc-138/total-bytes-coronavirus.png)
59 | 
60 | Showing a 5.53% reduction.
61 | 
62 | ### Bank holidays
63 | 
64 | The visual progress of the bank holidays has improved by approximately 200ms as can be seen in the visual progress graph:
65 | 
66 | ![Visual progress of the bank holidays using both compression methods.](rfc-138/visual-progress-bank-holidays.png)
67 | 
68 | And we see the a reduction in bytes for all the expected assets:
69 | 
70 | ![bytes for assets compared](rfc-138/total-bytes-bank-holidays.png)
71 | 
72 | Showing a 4.60% reduction.
73 | 
74 | ### Search page
75 | 
76 | The visual progress of the search page has improved by approximately 400ms initially, 200ms at the end. As can be seen in the visual progress graph:
77 | 
78 | ![Visual progress of the bank holidays using both compression methods.](rfc-138/visual-progress-search.png)
79 | 
80 | And we see the a reduction in bytes for all the expected assets:
81 | 
82 | ![bytes for assets compared](rfc-138/total-bytes-search.png)
83 | 
84 | Showing a 5.56% reduction.
85 | 
86 | ### Summary 
87 | 
88 | In all the pages tested across the frontend apps (17 in total) we see between 3-6% reduction in the number of bytes a browser is having to download over the network. This saving is being converted to improvements in the visual metrics a users is seeing in the browser.
89 | 
90 | ## Proposal
91 | 
92 | The proposal is to enable Brotli compression at the edge using the [Fastly Brotli Compression LA](https://www.fastly.com/release-notes/q3-2020#brotli). Browsers that support Brotli will receive these versions of the files. Any browsers that don't will receive the gzip version of the file. This can therefore be applied using the progressive enhancement methodology. In doing so we MUST update the configuration VCL so as to automate this process. During testing it was simply enabled via the Fastly user interface.
93 | 
94 | We SHOULD also purge the static cache once completed as it was found during testing that certain Fastly PoP's continued to serve the uncompressed version of the file since there was a mismatch between `Accept-Encoding` header from the browser and `Content-Encoding` version available on the CDN. Once the cache was purged this issue was rectified. However, this issue would rectify itself over time even without the manual purging, since Rails inserts an MD5 fingerprint into the filename of each file. Once a file is changed it will have a new URL, thus invalidating the old cache.
95 | 


--------------------------------------------------------------------------------
/rfc-143-split-database-instances.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-04
 5 | ---
 6 | 
 7 | # One database per RDS instance
 8 | 
 9 | ## Summary
10 | 
11 | We currently use a one-instance-many-databases approach to hosting MySQL, Postgres, Mongo and Redis. These mega-databases make it difficult to perform major version upgrades – they need to be coordinated across all applications at the same time, increase the blast-radius of problems in a single database, and make it harder to appropriately size and monitor resources for an application.
12 | 
13 | This RFC proposes that we split the single RDS instances for MySQL and Postgres, moving to a single RDS instance per database. It also proposes we apply the same principle to our self-hosted Mongo and Redis installations, and more generally to all supporting services for an application which aren't shared with other applications.
14 | 
15 | ## Naming
16 | 
17 | The word "database" is often interchangeably used to describe a collection of tables, and an instance of (e.g.) Postgres server running somewhere. For the purposes of this RFC, "database" refers to the collection of tables, schema, etc.. We'll use "instance" to refer to the runtime that hosts one or more databases.
18 | 
19 | ## Problem
20 | 
21 | Having a single instance hosting multiple databases has a number of drawbacks:
22 | 
23 | 1. Major upgrades must be applied to all databases in one go, as it's the instance we're upgrading, not the database. Major upgrades are considered risky because they contain potentially breaking, and backward-incompatible changes. This means major upgrades are scarier, so we're less likely to do them until we're forced. They also require coordination across disparate teams so everyone's confident their apps will survive the upgrade.
24 | 
25 | 2. Problems in one database are likely to impact other unrelated services. For example an app with a poorly performing database has caused performance degradation in other databases and their apps[^1].
26 | 
27 | 3. Related to [2], it's significantly harder to size instance resources. Disk is fairly straightforward, but it's harder to understand CPU/RAM/IOPS patterns. We're also likely to over-size resources to reduce the likelihood large-scale problems in [2].
28 | 
29 | 4. There is no clear owner of the instances. Each app is owned by a single team, and each database should be accessed by exactly one app. However, many apps with different owners reside on the single instance. Sizing and managing databases is an app-level responsibility, so it doesn't fit with the remit of Platform Reliability/Replatforming/Platform Health (i.e. for a single, central resource).
30 | 
31 | This is coming up now because our versions of MySQL and Postgres are approaching End of Life (notice for [MySQL](https://forums.aws.amazon.com/ann.jspa?annID=8790) and [Postgres](https://forums.aws.amazon.com/ann.jspa?annID=8499)), forcing us to plan major upgrades for both over the next few months.
32 | 
33 | ## Proposal
34 | 
35 | We propose a new principle to be applied across GOV.UK:
36 | 
37 | > Supporting services for an application should not be shared with any other applications at the infrastructure/provisioning level.
38 | 
39 | For example, a RabbitMQ queue used solely to manage Sidekiq jobs for an application should never be accessed by another app. An exception to the rule would be when services are designed for application communication, such as a RabbitMQ queue that broadcasts publishing events to be consumed by other apps. Both types of queues should live in distinct RabbitMQ instances/servers.
40 | 
41 | Applying this initially to our databases, we propose that:
42 | 
43 | 1. All new long-lived databases are created in their own individual RDS instances.
44 | 2. Existing MySQL and Postgres databases are migrated out of the central instance to individual managed-service instances as part of the upcoming major upgrade cycle.
45 | 3. Existing Redis and Mongo databases are migrated out of their central instances in-line with team priorities, and prior to any major upgrade.
46 | 4. Any future supporting services for applications should be distinct to the application and, where applicable, run on a managed service
47 | 
48 | ### Consequences
49 | 
50 | - It will be easier to identify the causes of performance problems in an application.
51 | 
52 | - The chance of a problem with one service cascading to other services will be reduced, because there is no contention for resources which aren't shared.
53 | 
54 | - Creating more RDS instances means that application-owning teams will need to monitor and consider database upgrades as part of their day-to-day work. We can ease this burden through monitoring & reporting, which may be considered by a future Platform Reliability team.
55 | 
56 | - There will be less consistency in database versions across the programme. Individual teams are responsible for their upgrade appetite and prioritisation.
57 | 
58 | - There is a potential increase in costs caused by having dedicated instances, particularly when they are initially split (because it's harded to accurately estimate resource requirements). This may be offset by being able to optimise individual databases to their needs.
59 | 
60 | - Migrating to new instances with minimal (or no) downtime takes planning and effort. AWS' forced upgrades of Postgres and MySQL in early 2022 will force this effort regardless.
61 | 
62 | - This change affects our production, integration, staging and CI environments. Our setup for CI requires a slightly different user setup so we can create and destroy databases within the instance (where we create databases per test run). There's likely a strong argument for using local disposable databases in non-production-like environments like integration or CI, to skip the long RDS provisioning times - the move to kubernetes should in principle make this pretty straightforward, but we'll cover that in a separate RFC when it's feasible.
63 | 
64 | - CI will need to support a wider range of database versions than it currently supports to cope with teams using different versions. Ideally, teams will be able to create and manage databases to meet their needs independently, but as a first step we'll need to introduce a pool of "supported" database types and versions. We have experience of doing this with Mongo and tagged agents.
65 | 
66 | [^1]: [Incident report for Imminence slowness](https://docs.google.com/document/d/10aOHyjO8JjzbIhj5HpowAuDlXvnbkAnDgYpTuCpBv-0/edit#heading=h.gzidrot4nw3r)
67 | 


--------------------------------------------------------------------------------
/rfc-136-remove-our-backup-cdn-in-gcp.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | status: accepted
  3 | implementation: done
  4 | status_last_reviewed: 2024-03-04
  5 | ---
  6 | 
  7 | # Remove the backup CDN in Google Cloud Platform (GCP)
  8 | 
  9 | ## Summary
 10 | 
 11 | We have two backup content delivery networks (CDNs), in case Fastly have an outage.
 12 | The first is AWS CloudFront, the second is Google Cloud CDN.
 13 | 
 14 | A simultaneous outage of both Fastly and AWS CloudFront is very unlikely. So
 15 | the Google Cloud CDN backup provides very little extra reliability.
 16 | 
 17 | The infrastructure cost of the Google Cloud CDN backup is negligible, but the
 18 | cost of maintaining it is significant (it's broken right now, and it would take
 19 | several weeks of engineering time to fix it).
 20 | 
 21 | The maintenance cost outweighs the expected reliability benefit, so we should
 22 | remove the Google Cloud CDN backup.
 23 | 
 24 | ## Problem
 25 | 
 26 | GOV.UK's current business continuity plan aims to ensure that members of the
 27 | public can continue to use www.gov.uk, even in the event of a major outage in
 28 | one or more of our infrastructure providers.
 29 | 
 30 | Firstly, while our primary CDN (Fastly) is up, any failed requests to our origin servers
 31 | will be retried against three static mirrors - two in AWS, one in GCP.
 32 | 
 33 | * origin
 34 | * AWS S3
 35 | * AWS S3 (replica in a different region)
 36 | * Google Cloud Storage
 37 | 
 38 | If none of the above are successful, an error is served to the user.
 39 | 
 40 | Secondly, if our primary CDN (Fastly) is down, we have two backup CDNs which we
 41 | can manually fail over to (by updating a DNS record). AWS CloudFront and Google
 42 | Cloud CDN.
 43 | 
 44 | The mirroring approach works well. This RFC does not propose any change to the
 45 | mirrors, including the Google Cloud Storage mirror.
 46 | 
 47 | The AWS CloudFront CDN is reasonably easy to maintain and test, however the
 48 | Google Cloud CDN backup is significantly more difficult.
 49 | 
 50 | ### Our Google Cloud CDN setup currently broken, and fixing it would be a lot of work
 51 | 
 52 | The certificate in use on the CDN expired in May 2020.
 53 | 
 54 | The Google Cloud Storage bucket the CDN is pointing to is currently private.
 55 | This means even if you ignore certificate warnings you'll see an access
 56 | denied error.
 57 | 
 58 | These issues could be fixed as part of the failover process, but we don't
 59 | have any documentation explaining how to do that.
 60 | 
 61 | Fixing the issues up front would require several weeks of engineering time. The
 62 | replatforming team have already spent a couple of weeks investigating. At the
 63 | moment the opportunity cost of prioritising fixing these issues over other
 64 | infrastructure work is unacceptable.
 65 | 
 66 | ### It is difficult to issue a certificate for Google Cloud CDN
 67 | 
 68 | GCP provide managed certificates for their CDN, but these can only be issued
 69 | and renewed if the CDN is serving live traffic (i.e. www.gov.uk's [DNS A
 70 | records point to the CDN's IP](https://cloud.google.com/load-balancing/docs/ssl-certificates/google-managed-certs#update-dns)).
 71 | 
 72 | This doesn't work for us because we need our DNS A records to be pointing at
 73 | Fastly for www.gov.uk to work properly.
 74 | 
 75 | We could issue certificates from a third party certificate provider (for
 76 | example GlobalSign or LetsEncrypt).
 77 | 
 78 | Previously we've used Gandi for this, however they've changed their policies
 79 | for domains like www.gov.uk. The only way Gandi can validate ownership is via
 80 | an email to `admin@www.gov.uk` (which we don't have set up).
 81 | 
 82 | It would be possible to automate certificate issuance using something like
 83 | [certbot](https://certbot.eff.org/), using either the ACME DNS or ACME HTTP
 84 | challenges. Doing this in a secure and reliable way would be a significant
 85 | piece of work.
 86 | 
 87 | We could delay issuing certificates to the point where we fail over (at which
 88 | point GCP could issue the certificate). That process would need to be
 89 | documented (and ideally practiced in a non-production environment).
 90 | 
 91 | ### Our Google Cloud CDN setup is hard to test
 92 | 
 93 | Because we can't easily issue a certificate to use in Google Cloud CDN, and
 94 | because the underlying static mirrors are currently private, it's not possible
 95 | to test that the CDN actually works.
 96 | 
 97 | It would be possible to test the Google CDN, but only if we resolved the
 98 | other problems in this RFC.
 99 | 
100 | The current situation with both our backup CDNs shows that processes which
101 | are not tested are likely to break.
102 | 
103 | ### A simultaneous outage for Fastly and AWS CloudFront is unlikely
104 | 
105 | Let's assume that there's a 0.01% chance that Fastly will be having a major
106 | incident at any given point, and the same for AWS CloudFront. (Real world
107 | reliability is much higher than this).
108 | 
109 | Let's also assume that major incidents with Fastly are independent from major
110 | incidents with AWS CloudFront.
111 | 
112 | The chances of both CDNs having a major incident at the same time would be
113 | `0.0001 × 0.0001 = 0.00000001` (so 0.000001%). That's maybe once in 10,000
114 | years.
115 | 
116 | In practice there may be some situations where failures at Fastly and
117 | CloudFront are not independent (for example global internet issues, natural
118 | disasters, global thermonuclear war). Even in these situations, having the
119 | Google Cloud CDN backup would not buy us much additional reliability, because
120 | it's likely it would also be affected by any issue wide enough to affect
121 | Fastly and AWS CloudFront.
122 | 
123 | ## Proposal
124 | 
125 | GOV.UK should remove the backup Google Cloud CDN.
126 | 
127 | GOV.UK's business continuity plan should be updated to make it clear that there
128 | is no formal plan in place to continue to serve traffic from www.gov.uk in the
129 | event of a simultaneous outage of both Fastly and AWS CloudFront.
130 | 
131 | ## Alternative option - fix the Google Cloud CDN
132 | 
133 | If GOV.UK decides not to remove the Google Cloud CDN backup, work needs to be
134 | prioritised to fix it.
135 | 
136 | Doing this properly will take several weeks, and will include some
137 | architectural decision making (for example, how do we issue a certificate for
138 | this CDN?).
139 | 
140 | Currently no teams have capacity to do this work, but GOV.UK could decide to
141 | prioritise this over other features. We do not recommend this as the
142 | reliability value of a secondary backup CDN is low.
143 | 


--------------------------------------------------------------------------------
/rfc-158-port-content-store-to-postgresql.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | status: accepted
 3 | implementation: done
 4 | status_last_reviewed: 2024-03-04
 5 | ---
 6 | 
 7 | # Port Content Store to PostgreSQL on RDS
 8 | 
 9 | ## Summary
10 | 
11 | GOV.UK Publishing should port Content Store from its current legacy self-hosted MongoDB, to PostgreSQL running on AWS' RDS.
12 | 
13 | ## Problem
14 | 
15 | Content Store is a critical component of GOV.UK. All front-end apps rely on it for serving requests, it serves the GOV.UK content API directly, and is the authoritative source of all content published on GOV.UK. However, it stores this content in a self-hosted legacy database (MongoDB v2.6) which is poorly understood, difficult to support, and has been marked [end-of-life since 2016](https://www.mongodb.com/blog/post/mongodb-2-6-end-of-life). 
16 | 
17 | There are many consequences of this long-standing tech debt: 
18 | 
19 | - we do not receive security updates for MongoDB
20 | - support for this version is hard to find in tooling and client libraries 
21 | - when incidents occur with it we do not have the expertise in-house to fully understand and resolve issues
22 | - when GOV.UK replatformed onto Kubernetes in March 2023, the MongoDB cluster was not migrated over. It still runs on EC2 instances configured via GOV.UK Puppet, and is a blocker for deprecating that large legacy codebase
23 | - we cannot run the same version locally as in production (it is not available for the standard developer laptops), meaning compatibility and testing is largely based on hope rather than rigour
24 | - GOV.UK developers (and the wider industry pool of developers from which we hire) tend to be significantly less familiar with Mongo than PostgreSQL, which most of the rest of GOV.UK runs on
25 | - The Mongoid ORM adapter which we use to connect to Mongo from our Ruby on Rails applications does not support many of the features of the more standard ActiveRecord, which is used throughout GOV.UK - meaning the Content Store application is harder to work on and creates more toil for developers
26 | 
27 | [Architectural Decision Record 0038](https://docs.publishing.service.gov.uk/repos/govuk-aws/architecture/decisions/0038-mongo_replacement_by_documentdb.html) recommended in 2019 that in general MongoDB should be replaced with Amazon's proprietary DocumentDB. However this has not proved to be a workable decision in practise for several reasons, including an inability to run DocumentDB locally, its "Mongo compatibility mode" not being fully compatible, and increasing dependence on single-vendor proprietary software being against general government policy of choosing open-source by default.
28 | 
29 | ## Proposal
30 | 
31 | 
32 | A previous [options paper](https://docs.google.com/document/d/1evZ6B3a2XMU8YgDruuS8idseqC38vcogo_bnIDshfrY/edit#) written by Ryan Brooks (previous Lead Technical Architect for GOV.UK Publishing) recommended migrating to RDS PostgreSQL as the most practical option for Content Store.
33 | 
34 | GOV.UK Publishing Platform team will implement this recommendation - we will port Content Store to run on PostgreSQL, using Amazon's RDS managed service in integration, staging and production. This will bring Content Store in line with most of GOV.UK, and externalise responsibility for the mechanics of running and updating a highly-available datastore under load, to Amazon. It will also allow us to use the exact same version of PostgreSQL for local development as in production. 
35 | 
36 | We have already performed tech 'spikes' to prove the concept of a) porting the application, and b) migrating the full dataset over to PostgreSQL (overall [Trello 'epic' card](https://trello.com/c/C1BQDFTG/502-plan-for-migrating-content-store-off-mongodb), [forked application](https://github.com/alphagov/content-store/pull/1062) running on PostgreSQL). 
37 | 
38 | We have also completed a round of [performance testing](https://docs.google.com/document/d/1e9LYPbytrQ1a6R2T4UBemN-SP-bz7t_DtNSsMchQap8/edit?pli=1) and [optimisation](https://github.com/alphagov/content-store-on-postgresql/pull/3), verifying that for the most typical and critical use case of Content Store - find and return the document corresponding to a given URL path - performance on PostgreSQL is at least as fast as for Mongo on the same hardware.
39 | 
40 | Whilst there are several possible ways to manage the migration, this RFC is focussed on the target end-state, not how to get there. We can, however, state that we are confident we can achieve this migration with :
41 | 
42 | - zero or near-zero downtime
43 | - no significant changes to the HTTP Content Store APIs
44 | - comparable performance for most queries as a result of the move
45 | 
46 | ## Consequences
47 | 
48 | While the majority of applications correctly use the API to interact with Content Store data, there are some downstream processes which depend on the nightly database backups, and therefore will need to be changed. These include:
49 | 
50 | - The overnight [environment sync](https://docs.publishing.service.gov.uk/manual/govuk-env-sync.html) job, which dumps live data to S3 and imports it into integration & staging environments
51 | - Data Services' GCP [Storage Transfer job](https://github.com/alphagov/govuk-s3-mirror/blob/main/terraform/transfer.tf) to upload the S3 backup to Google Cloud Platform for subsequent analysis and processing
52 | - Data Services' [MongoDB Content tool](https://docs.publishing.service.gov.uk/repos/govuk-mongodb-content.html) which allows the user to explore a local copy of the database
53 | 
54 | ## Possible mitigations 
55 | 
56 | All of the above are solvable problems. Publishing Platform team will make best efforts to submit the necessary changes themselves where practical (this is most likely on the environment sync job) or collaborate with the owning team on changes where we don't have the skills or context to do it well ourselves. 
57 | 
58 | We've already spoken to Data Services about this change and the impact on their downstream services - the provisional plan is to introduce an additional step on their import side which stands up a PostgreSQL database in GCP and then exports to Mongo from there, as they have a significant number of queries/analyses based on Mongo. We plan to have a period of dual-running on both databases to make a smoother switchover, and allow time for backups to accumulate and for import jobs to be ported over before switching the Mongo DBs off entirely - based on the approach which [The Guardian blogged about](https://www.theguardian.com/info/2018/nov/30/bye-bye-mongo-hello-postgres) when they did a similar migration of their CMS. But these are all implementation questions to be tackled in detail once we achieve consensus on the end state, as outlined in this RFC.
59 | 


--------------------------------------------------------------------------------