├── .github └── workflows │ └── build.yml ├── .gitignore ├── LICENSE.md ├── Makefile ├── README.md ├── traffic-advice.bs └── traffic-advice.md /.github/workflows/build.yml: -------------------------------------------------------------------------------- 1 | name: Build 2 | on: 3 | pull_request: 4 | branches: 5 | - main 6 | push: 7 | branches: 8 | - main 9 | workflow_dispatch: 10 | jobs: 11 | build: 12 | name: Build 13 | runs-on: ubuntu-latest 14 | steps: 15 | - uses: actions/checkout@v2 16 | - name: Build 17 | run: make ci 18 | - name: Deploy 19 | if: ${{ (github.event_name == 'push' || github.event_name == 'workflow_dispatch') && github.ref == 'refs/heads/main' }} 20 | uses: peaceiris/actions-gh-pages@v3 21 | with: 22 | github_token: ${{ secrets.GITHUB_TOKEN }} 23 | publish_dir: ./out 24 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | out 2 | traffic-advice.html 3 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | Copyright 2020 Google LLC. SPDX-License-Identifier: Apache-2.0 2 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | SHELL=/bin/bash 2 | 3 | bikeshed_files = traffic-advice.bs 4 | 5 | .PHONY: ci clean local remote 6 | 7 | local: $(bikeshed_files) 8 | $(foreach source,$(bikeshed_files),bikeshed --die-on=warning spec $(source) $(source:.bs=.html);) 9 | 10 | remote: $(bikeshed_files:.bs=.html) 11 | 12 | ci: $(bikeshed_files:.bs=.html) 13 | mkdir -p out 14 | cp $^ out/ 15 | 16 | clean: 17 | rm -f $(bikeshed_files:.bs=.html) 18 | 19 | %.html: %.bs 20 | @ (HTTP_STATUS=$$(curl https://api.csswg.org/bikeshed/ \ 21 | --output $@ \ 22 | --write-out "%{http_code}" \ 23 | --header "Accept: text/plain, text/html" \ 24 | -F die-on=warning \ 25 | -F file=@$<) && \ 26 | [[ "$$HTTP_STATUS" -eq "200" ]]) || ( \ 27 | echo ""; cat $@; echo ""; \ 28 | rm -f $@; \ 29 | exit 22 \ 30 | ); 31 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Private Prefetch Proxy Explained 2 | 3 | Work-in-progress draft ([feedback & questions welcomed](https://github.com/buettner/private-prefetch-proxy/issues)) 4 | 5 | ## Problem and motivation 6 | There is [renewed interest](https://github.com/jeremyroman/alternate-loading-modes) in prefetching/prerendering as a way to improve loading performance on the web. However, prefetching cross-origin links reveals potentially identifiable information with the destination, e.g., the user’s cookies and IP address, before the user has explicitly signaled their interest in the site or content. 7 | 8 | Most of these concerns can be addressed with changes at the browser, e.g,. not sending cookies on prefetch requests, but hiding the client IP from the destination until the user navigates requires changes at the network level. While [technologies exist](https://developers.google.com/web/updates/2018/11/signed-exchanges) that enable prefetching without revealing the client IP to destinations, they require some [work to set up](https://developers.google.com/web/updates/2018/11/signed-exchanges#trying_out_signed_exchanges) which isn’t yet always [trivial](https://blog.amp.dev/2019/06/17/introducing-cloudflare-amp-real-url/). 9 | 10 | To make it easier to achieve instant experiences on the web, we’re exploring an alternative design for privacy-preserving prefetch as described [in this blog post](https://blog.chromium.org/2020/12/continuing-our-journey-to-bring-instant.html). Key to the proposal is the use of an HTTP/2 CONNECT proxy (or potentially in the future, a [QUIC proxy](https://tools.ietf.org/html/draft-pauly-masque-quic-proxy-00)) to obfuscate the IP address from the destination site during prefetching, along with rules governing its usage and additional measures to ensure that the prefetches cannot be linked to the user. 11 | 12 | ## Goals 13 | We want to make it easy for sites to take advantage of cross-origin prefetching without revealing the user’s IP address to the destination. We propose using an HTTP/2 CONNECT proxy to obfuscate the user’s IP during prefetching to achieve this goal, while at the same time maintaining the security properties of the web and giving both users and websites control over the use of the proxy. 14 | 15 | ## Non-goals 16 | This proposal is only relevant for the cross-origin cases. For same-origin prefetching/prerendering, there is no need to hide the user’s IP address or other state. Indeed, the party triggering prefetch/prerender is the same party that is being prefetched/prerendered, and naturally has access to this information. While same-origin prefetch/prerendering are out-of-scope of this explainer, we are nevertheless interested in improving prefetching/prerendering for both cross-origin and same-origin scenarios through other efforts. 17 | 18 | ## Challenges 19 | There are two primary concerns we see with prefetch proxies. The first is that they can amplify the impact of compromised TLS certificates. Today, an attacker must have a compromised TLS certificate and also be on the network path between the user and that origin to MITM the connection. If attackers can designate and run prefetch proxies, they can trivially put themselves on the network path. A related concern is collusion, where a prefetch proxy works with some destinations to selectively unblind requests. 20 | 21 | Today, there are no technical means by which a browser can verify that a CONNECT proxy is not MITMing the client->origin TLS connection. While confidential computing technology may allow the browser to verify exactly what code is being run by the proxy, the technology is in its infancy. 22 | 23 | The second concern is that the proxy may become an aggregation point for user data. The set of links shown to a user on a referring site is both business data of the referrer and potentially PII for the user (e.g., if they are logged into the referring site). Of course, browsers see links on referrer pages today, but most browsers do not make that data available to backend infrastructure. Prefetch proxies should not build profiles of users or referring sites. 24 | 25 | This leads to the core challenge of making this feature available to as much of the web as possible while: 26 | 1. Giving users and referrers control over who they trust with their data. 27 | 1. Giving publishers the ability to opt-out of the feature. 28 | 1. Giving browsers the ability to ensure that prefetch proxies do not put their users at risk of TLS attacks or tracking. 29 | 30 | ## Exploration 31 | In the interest of starting a discussion with the community, we are sharing a tentative plan for making privacy-preserving prefetching more accessible and appealing to a broad set of parties. We would [love to hear](https://github.com/buettner/private-prefetch-proxy/issues) your questions, concerns and feedback to refine our thinking! 32 | 33 | ### Browser 34 | A key responsibility of a browser is to ensure the safety of its users. As noted in the [Challenges](https://github.com/buettner/private-prefetch-proxy#challenges) section, the primary concern with private prefetch proxies is that they introduce safety risks which the browser can not sufficiently address with currently available technology. Until this gap is resolved, it follows that the browser should only use a private prefetch proxy that it implicitly trusts, either because it is operated by the browser vendor itself, or by a third party under a contract (i.e. similar to the approach taken with DoH and VPN features by some browser vendors). 35 | 36 | ### Referrers and users 37 | Outgoing links on a referring page may be the result of information the referrer knows about the user. Consequently, outgoing links should not be prefetched, even with privacy preserving guarantees, without at least one of those two parties’ consent (and the other party should be able to opt-out). 38 | 39 | ### Publishers 40 | Some publishers may not be comfortable with proxied prefetches even if the referrer, user and browser all are. A publisher’s level of comfort might depend on: the browser (and which proxy by association), user related characteristics (e.g. geo restrictions), or perhaps the ability of a given referrer to balance performance and data usage. We would love input on other potential concerns to help us prioritize refinements. Please [**share concrete details as to why**](https://github.com/buettner/private-prefetch-proxy/issues) a particular aspect matters to you. 41 | 42 | ### Participation model 43 | We believe that a referrer opt-in combined with opt-outs for users and publishers provides the best value and flexibility to all parties involved: 44 | * Referrers can choose to opt-in if they believe the benefits are worth it. 45 | * Both users and publishers can choose to opt-out if they are uncomfortable with the feature (e.g. concerned about extra data usage). 46 | 47 | #### Referrer opt-in 48 | Referrers opt-in to the feature by using [Speculation Rules](https://github.com/jeremyroman/alternate-loading-modes/blob/main/triggers.md) to indicate which links should be privately prefetched. E.g.,: 49 | 50 | ```html 51 | 60 | ``` 61 | 62 | Where: 63 | - `urls` would contain a list of URLs the referrer believes to be good candidates for prefetching. The browser would consider this list for prefetching, in addition to other constraints (e.g. bandwidth, prioritizing the main user experience, user preferences, etc). 64 | - `requires": ["anonymous-client-ip-when-cross-origin"]` indicates that the referrer wants the cross origin prefetches to be done in a privacy preserving manner. 65 | 66 | 67 | #### User opt-out 68 | Users can opt-out of the feature at any time. Furthermore, users can temporarily opt-out of the feature by using their browser’s private browsing mode. 69 | 70 | #### Publisher opt-out 71 | Publishers can opt out by disallowing connections in their [traffic advice](traffic-advice.md). This advice would be fetched and cached by the proxy, and can be used by publishers by adding a single resource to their origin at a well-known path. 72 | 73 | In addition, publishers can opt-out for individual requests, for example, when dealing with temporary traffic spikes or other issues. Publishers should look for the `Purpose: prefetch` or the new ['Sec-Purpose: prefetch; anonymous-client-ip'](https://wicg.github.io/nav-speculation/prefetch.html#sec-purpose-header) request header and respond with an HTTP 403 (Forbidden) (see [location](https://github.com/buettner/private-prefetch-proxy#geolocation) for an example use case). 74 | 75 | ### Future opportunities 76 | We’re continuing to explore ways to safely prefetch via proxies not operated by the browser. In that case, referrers may wish to specify which (if any) proxies they trust with their user data. The *speculation rules* approach offers a flexible pattern which would allow for this extension. 77 | 78 | If you have ideas on future opportunities or want to suggest a different approach, please [start a topic](https://github.com/buettner/private-prefetch-proxy/issues). Thanks! 79 | 80 | 81 | ## Prefetching Details 82 | ### Using an isolated network context 83 | Prefetches should not reveal any local state that can be used to identify the user. The CONNECT proxy masks the IP address, but the browser is responsible for not revealing other information that can be used to identify the user. 84 | 85 | Specifically: 86 | * Cookies must not be sent on prefetches. 87 | * Prefetches must use an isolated network context that does not reveal state from the HTTP cache, previous TLS sessions, etc. 88 | * Static fingerprinting surfaces such as User-Agent must be bucketed, e.g., by reducing the [User-Agent](https://www.chromium.org/updates/ua-reduction). 89 | 90 | In addition, prefetches should not persist any state (cookies, HTTP caching) unless the user navigates to the prefetched link. 91 | 92 | The following headers will be sent on prefetch requests: 93 | 94 | purpose: prefetch 95 | sec-purpose: prefetch; anonymous-client-ip 96 | user-agent: 97 | accept: 98 | accept-encoding: gzip, deflate, br 99 | accept-language: 100 | sec-ch-ua: 101 | sec-ch-ua-mobile: ?<0 or 1> 102 | 103 | ### What to prefetch 104 | Our experiment found that fetching the mainframe HTML, along with statically linked CSS and synchronous Javascript, provided a 40% LCP improvement at the median. Fetching other resources, for example images, may further improve user experience at the cost of more wasted bytes on mispredictions. 105 | 106 | # FAQ 107 | ## TLS key leaks and private prefetch proxies 108 | 109 | **Concern:** “TLS key leaks (e.g. heartbleed) pose a greater user threat with a private prefetch proxy because it allows an attacker to direct prefetches through a colluding proxy, thereby manipulating the network path and making MITM attacks easier.” 110 | 111 | This is a risk if we allow websites to specify any “private prefetch proxy” of their choosing. For instance, a malicious website could specify their own proxy, loaded with compromised TLS keys, and trick the user into clicking a prefetched link for a legitimate website. For this reason, “private prefetch proxies” need to be trusted by the browser before they will be used for prefetching. 112 | 113 | ## Risk of collusion 114 | **Concern:** “Private prefetch proxies enable a new vector for cross-site user tracking, as the proxy can directly terminate TLS connections to origins the proxy owns or colludes with, and then it can directly add tracking identifiers to requests. ” 115 | 116 | This is another reason why the proxy must be trusted by the browser. Not only must private prefetch proxies not introduce new identifiers, they must in fact IP-blind all destination origins. 117 | 118 | ## Learning about the user’s interests 119 | **Concern:** “Prefetch proxies will learn about users based on their prefetch requests, which they could monetize or leverage themselves.” 120 | 121 | Similar to other concerns about the trustworthiness of the proxy, we believe the only current way to address this concern is to require that the browser trusts the proxy. However, both the referrer website and the user have control over which proxies (if any) they are willing to use, with prefetching being disabled if there is no agreement. 122 | 123 | ## Content blockers and extensions 124 | 125 | **Concern:** “What about content blockers? How does this impact extensions?” 126 | 127 | Browsers should continue to ensure that network requests and responses are subject to a user’s installed extensions even when the requests are handled by a private prefetch proxy. 128 | 129 | For DNS based content blockers, there is a range of options to explore including allowing users to disable the feature altogether, or to enable an additional blocking DNS lookup for every domain at navigation time (along with the associated performance penalty). 130 | 131 | ## Impact for services provided by ISPs 132 | **Concern:** “How does this interact with content filtering?” 133 | 134 | We acknowledge that network administrators may need to filter content. We propose the following approach to avoid interfering in these scenarios. 135 | 136 | At startup and on a change of network, the browser would attempt to resolve a purpose-specific domain name, and examine the result: 137 | 138 | - If a response code other than NOERROR is returned (e.g. NXDOMAIN or SERVFAIL), or if a NOERROR response code is returned, but contains neither A nor AAAA records, then the browser would change its behavior in the following manner: 139 | 140 | - Upon navigation to a prefetched link, the browser would issue a blocking DNS lookup for the domain. This DNS lookup will happen at the same time and in the same manner as if the prefetch had not happened, providing the administrator with the same opportunity to filter content. 141 | 142 | ## Abuse 143 | **Concern:** “How will this interact with websites’ anti-abuse mechanisms?” 144 | 145 | To protect against attackers using the proxy to abuse websites, the prefetch proxy must block traffic that does not fit the pattern of legitimate link prefetching e.g., based on the number of requests, session duration, etc. 146 | 147 | We’re also considering schemes for authentication, and website operators can always opt-out of proxied prefetching (e.g., by rejecting requests with the “Purpose: prefetch” header or adding a [traffic-advice](https://github.com/buettner/private-prefetch-proxy/blob/main/traffic-advice.md) file). In addition, private prefetch proxies should allow for reverse DNS lookups of their IP addresses and publish an escalation path for help addressing potential abuse concerns. 148 | 149 | ## Geolocation 150 | **Concern:** "How will this work with geography-based use cases (e.g. Geo-filtering / Geo-access)?" 151 | 152 | The destination server will see the IP of the proxy egress IP, not the user's IP; this may interfere with IP-based geolocation. 153 | Servers that rely on geolocation to determine what content to serve have the following options: 154 | * Determine the location of the user at navigation time, e.g., by triggering a request via JS. 155 | * Reject requests with the "Purpose: prefetch" or the new ['Sec-Purpose: prefetch; anonymous-client-ip'](https://wicg.github.io/nav-speculation/prefetch.html#sec-purpose-header) header for resources that are georestricted. 156 | 157 | More speculative ideas worth exploring are: 158 | * Requiring proxies to only egress traffic from IPs in the same country/region as the user. The challenge here is having agreement on the granularity of "region", as proxies likely can't egress in every country. 159 | * APIs/mechanism by which the proxy can tell the destination what general region the user is in (e.g., [Geohash HTTP Client Hints](https://tfpauly.github.io/privacy-proxy/draft-geohash-hint.html)). Similar to the above, there would need to be agreement about the required granularity. 160 | 161 | ## Traffic analysis 162 | **Question**: "Even though prefetches are end-to-end encrypted between the browser and the destination, can't the proxy perform traffic analysis attacks?" 163 | 164 | [By design](https://github.com/buettner/private-prefetch-proxy#using-an-isolated-network-context), prefetches should not reveal any local state to the destination that could be used to identify the user. This means that the responses cannot be personalized. The proxy could learn, for example, that the destination runs A/B experiments on non-logged in users. But we don't believe this information is particularly valuable, and the destination can always reject prefetch requests. 165 | 166 | ## Trusted Private Prefetch Proxies (TPPP) 167 | **Question**: “What are ‘trusted private prefetch proxies’?” 168 | 169 | We would like to firm this up with the help of the community. 170 | 171 | At a high level, here is a tentative and non-exhaustive list of aspects that we think would be needed: 172 | 173 | - Requirements to define expected behavior: “a TPPP must hide the IP address of the prefetch requester”, what data can be logged, retention policy for those logs, etc. 174 | - Usage rules (e.g. abuse prevention). 175 | - Potentially audits to assert that a TPPP is implementing the requirements and usage rules as specified. 176 | 177 | 178 | ## Other concerns or questions? 179 | 180 | Please [file an issue](https://github.com/buettner/private-prefetch-proxy/issues) if you have identified something that ought to be addressed in this section. 181 | 182 | 183 | # Feedback, discussion 184 | 185 | 186 | If you are interested in this proposal, please consider participating in existing [discussions](https://github.com/buettner/private-prefetch-proxy/issues) or filing new issues to share feedback or ask questions. In addition, if you are interested in prefetching/prerendering in general, you might be interested in [discussions for alternate loading modes](https://github.com/jeremyroman/alternate-loading-modes/issues) as well. 187 | 188 | -------------------------------------------------------------------------------- /traffic-advice.bs: -------------------------------------------------------------------------------- 1 | 15 |
 16 | {
 17 |     "HTTP-CACHING": {
 18 |         "authors": [
 19 |             "R. Fielding",
 20 |             "M. Nottingham",
 21 |             "J. Reschke"
 22 |         ],
 23 |         "href": "https://httpwg.org/http-core/draft-ietf-httpbis-cache-latest.html",
 24 |         "title": "HTTP Caching",
 25 |         "status": "Internet-Draft",
 26 |         "publisher": "IETF"
 27 |     },
 28 |     "HTTP-SEMANTICS": {
 29 |         "aliasOf": "RFC7231"
 30 |     },
 31 |     "WELL-KNOWN": {
 32 |         "aliasOf": "RFC8615"
 33 |     }
 34 | }
 35 | 
36 | 37 |
38 | Introduction {#intro} 39 | ===================== 40 | 41 | *This section is non-normative.* 42 | 43 | Publishers might wish not to accept traffic from private prefetch proxies and other sources other than direct user traffic, for instance to reduce server load due to speculative prefetch activity. 44 | 45 | We propose a well-known "traffic advice" resource, analogous to `robots.txt` (for web crawlers), which allows an HTTP server to request that implementing agents stop sending traffic to it for some time. 46 |
47 | 48 | Implementations {#implementations} 49 | ================================== 50 | 51 | This specification may be implemented by traffic advice respecting agents, such as proxy servers or other applications which direct HTTP traffic on behalf of clients such as a web browser. 52 | 53 | While [[FETCH]] is used to describe the algorithm to request this resource, such agents might not implement [[HTML]]. 54 | 55 | Definitions {#dfns} 56 | =================== 57 | 58 | A traffic advice entry is a [=struct=] with the following [=struct/items=]: 59 | * disallowed flag, a [=boolean=] which is initially false 60 |
If the [=traffic advice entry/disallowed flag=] is true, the advice requests that traffic, including establishing connections and sending requests, be avoided.
61 | * fraction, a number which is at least 0 and at most 1, initially 1 62 |
The advice requests that only that fraction of traffic be permitted. A server might use this to facilitate an incremental rollout, or to partially reduce server load during peak times.
63 | 64 | A traffic advice result is null, a [=traffic advice entry=], or `"unreachable"`. 65 | 66 | An agent identity is a [=list=] of [=strings=]. It must contain at least two elements, and the last must be `"*"`. 67 | 68 | Identity {#identity} 69 | ==================== 70 | 71 | Each agent should have an brand name that specifically identifies it (such as `PollyPrefetchProxy`). 72 | 73 | Its [=agent identity=] is all of the following that apply, in order: 74 | 75 | 1. The brand name 76 | 1. `"prefetch-proxy"`, if the agent is a proxy server which exclusively serves prefetch traffic (for example, a [private prefetch proxy](https://github.com/buettner/private-prefetch-proxy)) 77 | 1. `"*"` 78 | 79 | Fetching {#fetching} 80 | ==================== 81 | 82 |
83 | 84 | To generate a traffic advice URL for [=origin=] |origin|, run the following steps: 85 | 86 | 1. If |origin| is not a [=tuple origin=], return failure. 87 | 88 | 1. If |origin|'s [=origin/scheme=] is not an [=HTTP(S) scheme=], return failure. 89 | 90 | 1. If |origin| is not a [=potentially trustworthy origin=], return failure. 91 | 92 | 1. Return a new [=URL=] as follows: 93 | 94 | : [=url/scheme=] 95 | :: |origin|'s [=origin/scheme=] 96 | : [=url/host=] 97 | :: |origin|'s [=origin/host=] 98 | : [=url/port=] 99 | :: |origin|'s [=origin/port=] 100 | : [=url/path=] 101 | :: « `".well-known"`, `"traffic-advice"` » 102 | 103 |
104 | 105 |
106 | 107 | To fetch traffic advice for [=origin=] |origin|, [=agent identity=] |identity| and algorithm |whenComplete| accepting a [=traffic advice result=]: 108 | 109 | 1. Let |url| be the result of [=generating a traffic advice URL=] for |origin|. 110 | If it results in failure, then return failure. 111 | 112 | 1. Let |request| be a [=request=] as follows: 113 | 114 | : [=request/method=] 115 | :: `` `GET` `` 116 | : [=request/URL=] 117 | :: |url| 118 | : [=request/client=] 119 | :: null 120 | : [=request/credentials mode=] 121 | :: `"omit"` 122 | : [=request/redirect mode=] 123 | :: `"manual"` 124 |
This means that a [=redirect status=] will not lead to another origin being contacted.
125 | 126 | 1. Let |fetchController| be null. 127 | 128 | 1. Let |processResponse| be the following steps, given [=response=] |response|: 129 | 130 | 1. If |response|'s [=response/type=] is `"error"`, then [=fetch controller/terminate=] |fetchController|, run |whenComplete| with `"unreachable"`, and return. 131 | 132 | 1. If |response|'s [=response/type=] is `"opaqueredirect"`, then [=fetch controller/terminate=] |fetchController|, run |whenComplete| with null, and return. 133 | 134 | 1. [=Assert=]: |response|'s [=response/type=] is `"basic"`. 135 | 136 | 1. If |response|'s [=response/status=] is 429 (Too Many Requests; see [[RFC6585]]) or 503 (Service Unavailable; see [[HTTP-SEMANTICS]]), then [=fetch controller/terminate=] |fetchController|, run |whenComplete| with `"unreachable"`, and return. 137 |
If present, the [[HTTP-SEMANTICS]] `Retry-After` response header could be used as a hint about when to next retry.
138 | 139 | 1. If |response|'s [=response/status=] is not an [=ok status=], then [=fetch controller/terminate=] |fetchController|, run |whenComplete| with null and return. 140 | 141 | 1. If |response|'s [=response/status=] is a [=null body status=], then [=fetch controller/terminate=] |fetchController|, run |whenComplete| with null and return. 142 | 143 | 1. Let |mimeType| be the result of [=header list/extracting a MIME type=] from |response|'s [=response/header list=]. 144 | 145 | 1. If |mimeType| is failure or its [=MIME type/essence=] is not `"application/trafficadvice+json"`, then [=fetch controller/terminate=] |fetchController|, run |whenComplete| with null and return. 146 | 147 | 1. Let |processResponseEndOfBody| be the following steps, given [=response=] |response| and null, failure or [=byte sequence=] |body|: 148 | 149 | 1. If |body| is not a [=byte sequence=], then run |whenComplete| with null and return. 150 | 151 | 1. Let |string| be the result of [=UTF-8 decoding=] |body|. 152 | 153 | 1. Let |parseResult| be the result of [=parsing traffic advice=] from |string| given |identity|. 154 | 155 | 1. Run |whenComplete| with |parseResult|. 156 | 157 | 1. [=Fetch=] |request| with [=fetch/processResponse=] set to |processResponse| and [=fetch/processResponseEndOfBody=] set to |processResponseEndOfBody|, and set |fetchController| to the result. 158 | 159 |
160 | Notwithstanding the usual behavior of [[HTTP-CACHING]], agents (especially ones shared amongst multiple users) should consider applying a minimum freshness lifetime (10 minutes is suggested) and maximum freshness lifetime (48 hours is suggested) in order to balance the [security considerations](#security) discussed below. If these suggested values are used, a default freshness lifetime (if none is specified) of 30 minutes may be appropriate. 161 |
162 | 163 |
164 | 165 | Parsing {#parsing} 166 | ================== 167 | 168 |
169 | 170 | To parse traffic advice from a [=string=] |string| given [=agent identity=] |identity|: 171 | 172 | 1. Let |parsed| be the result of [=parsing JSON into Infra values=] given |string|. If this throws an exception, then return null. 173 | 174 | 1. If |parsed| is not a [=list=], then return null. 175 | 176 | 1. Let |bestMatch| be null. 177 | 178 | 1. [=list/For each=] |entry| of |parsed|: 179 | 180 | 1. If |entry| is not a [=map=], then [=iteration/continue=]. 181 | 182 | 1. If |entry|[`"user_agent"`] does not [=map/exist=] or is not a [=string=], then [=iteration/continue=]. 183 | 184 | 1. Let |agentSelector| be |entry|[`"user_agent"`]. 185 | 186 | 1. If |identity| does not contain |agentSelector|, then [=iteration/continue=]. 187 | 188 | 1. If |bestMatch| is null or |agentSelector| appears at an earlier index in |identity| than |bestMatch|[`"user_agent"`] does, then set |bestMatch| to |entry|. 189 | 190 | 1. If |bestMatch| is null, then return null. 191 | 192 | 1. Let |entry| be a [=traffic advice entry=]. 193 | 194 | 1. If |bestMatch|[`"disallow"`] [=map/exists=] and is true, then set |entry|'s [=traffic advice entry/disallowed flag=] to true. 195 | 196 | 1. If |bestMatch|[`"fraction"`] [=map/exists=] and is a number, then: 197 | 198 | 1. Let |fraction| be |bestMatch|[`"fraction"`]. 199 | 200 | 1. If |fraction| is greater than or equal to 0 and less than or equal to 1, then set |entry|'s [=traffic advice entry/fraction=] to |fraction|. 201 | 202 | 1. Return |entry|. 203 | 204 |
205 | 206 | Interpretation {#interpretation} 207 | ================================ 208 | 209 | When they would be able to respect advice to disallow traffic to an origin (for example, when requested to proxy prefetch traffic to the origin), [=traffic advice respecting agents=] should [=fetch traffic advice=] (respecting [[HTTP-CACHING]] semantics). 210 | 211 | If the result is null, then no advice was received. Agents should adopt their default behavior. 212 | 213 | If the result is `"unreachable"`, then the HTTP server was not able to service the request for traffic advice. Since this could indicate that the server cannot accept additional requests at this time, agents may stop traffic to the server for some interval. 214 | 215 | If the result's [=traffic advice entry/disallowed flag=] is true, then the HTTP server advises that traffic is discouraged at this time. Agents should respect this by not establishing new connections or sending new requests. 216 | 217 | Otherwise, if the result's [=traffic advice entry/fraction=] is less than 1, then the HTTP server advises that it would like to receive only a fraction of the possible traffic. Agents may implement this as they see fit, but the following algorithm is suggested on establishment of an HTTP connection on behalf of a client. 218 | 219 | 1. Choose a uniform random number |r| between 0 and 1. 220 | 1. If |r| is less than or equal to the result's [=traffic advice entry/fraction=], then the traffic is permitted by the fraction. 221 | 1. Otherwise, a connection is not established. 222 | 223 | This process should not be repeated as part of automatic retry logic, since this would defeat the server's ability to shed load in this manner. Broadly, agents should aim for a fraction of 0.1 to result in approximately 10% of the traffic to the HTTP server. 224 | 225 | This approach allows servers to scale their traffic proportionally as part of an incremental rollout. Agents should avoid approaches which might bias the permitted connections or requests in ways that might make this scaling non-linear (e.g., by preferring certain kinds of connection or user). 226 | 227 | Security considerations {#security} 228 | =================================== 229 | 230 | Type confusion {#type-confusion} 231 | -------------------------------- 232 | 233 | Like other resources, it is possible that the `/.well-known/traffic-advice` path could be used for a request with some other destination (e.g., as a script). If interpreted as JavaScript, the JSON data would either be syntactically invalid or an empty block. More generally, this specification requires the use of a MIME type that is not used for any other purpose, and standard countermeasures (e.g., `X-Content-Type-Options: nosniff`) can be used to prevent type confusion in some cases which are permissive of mismatched MIME types. 234 | 235 | Caching issues {#security-caching} 236 | ---------------------------------- 237 | 238 | Because the traffic advice resource is expected to be cached by [=traffic advice respecting agents=] such as private prefetch proxies, it is possible that a temporary compromise of an origin server or its private key could be extended to a longer outage of some traffic due to an agent caching a policy that prevents or throttles traffic, leading to a denial of service for such traffic. This is similar to attacks against HTTP Public Key Pinning [[RFC7469]]. 239 | 240 | This is less of an issue if the traffic is non-essential (e.g., prefetch) traffic. 241 | 242 | To mitigate this, well-behaved agents implement a maximum freshness lifetime when they [=fetch traffic advice=]. 243 | 244 | Request amplification {#request-amplification} 245 | ---------------------------------------------- 246 | 247 | Agents which are proxy services accessible to untrusted users (esp. the general public) may be susceptible to being used to amplify a denial of service attack conducted, for example, by a botnet. For example, if a small request from a client (e.g. `CONNECT target.example:443` with small headers) can cause a larger request (e.g., `GET /.well-known/traffic-advice` with large headers) to the origin server, this could be used to increase the effective bandwidth available to the distributed denial of service attack against an origin server. 248 | 249 | To mitigate this, well-behaved agents implement, in addition to other anti-abuse measures, a minimum freshness lifetime when they [=fetch traffic advice=]. 250 | 251 | Privacy considerations {#privacy} 252 | ================================= 253 | 254 | This specification provides general mechanisms for agents to limit the traffic they are sending. Most privacy considerations are expected to be particular to the agents in question (for example, proxies inspecting traffic they carry). 255 | 256 | If privacy considerations related to the traffic advice mechanism itself are identified, they should be added here. 257 | 258 | IANA considerations {#iana} 259 | =========================== 260 | 261 | Well-known `traffic-advice` URI {#iana-well-known} 262 | -------------------------------------------------- 263 | 264 | This document defines well-known URI suffix `traffic-advice` as described by [[WELL-KNOWN]]. It should be submitted for registration as follows: 265 | 266 | : URI suffix 267 | :: traffic-advice 268 | : Change controller 269 | :: The editor(s) of this document, pending a standards venue 270 | : Specification(s) 271 | :: This document 272 | : Status 273 | :: provisional 274 | : Related information 275 | :: None 276 | 277 | The `application/trafficadvice+json` MIME type {#iana-mime-type} 278 | ---------------------------------------------------------------- 279 | 280 | This document defines the [=MIME type=] `application/trafficadvice+json` as described by [[RFC6838]]. It should be submitted for registration as follows: 281 | 282 | : Type name 283 | :: `application` 284 | : Subtype name 285 | :: `trafficadvice+json` 286 | : Required parameters 287 | :: N/A 288 | : Optional parameters 289 | :: N/A 290 | : Encoding considerations 291 | :: Always UTF-8 292 | : Security considerations 293 | :: See [Security considerations](#security). 294 | : Interoperability considerations 295 | :: This MIME type is not known to be in previous use. Applications which can process `application/json` should be able to process all valid data with this MIME type. 296 | : Published specification 297 | :: This document 298 | : Applications that use this media type 299 | :: [=traffic advice respecting agents=] 300 | : Fragment identifier considerations 301 | :: N/A 302 | : Additional information 303 | :: 304 | : Deprecated alias names for this type 305 | :: N/A 306 | : Magic number(s) 307 | :: N/A 308 | : File extension(s) 309 | :: None. This resource will be named `traffic-advice` when fetched over HTTP. 310 | : Macintosh file type code 311 | :: Same as for `application/json` [[RFC8259]] 312 | : Person & email address to contact for further information 313 | :: The editor(s) of this document 314 | : Intended usage 315 | :: Common 316 | : Restrictions on usage 317 | :: N/A 318 | : Change controller 319 | :: The editor(s) of this document, pending a standards venue 320 | -------------------------------------------------------------------------------- /traffic-advice.md: -------------------------------------------------------------------------------- 1 | # Traffic Advice 2 | 3 | Publishers may wish not to accept traffic from [private prefetch proxies](README.md) and other sources other than direct user traffic, for instance to reduce server load due to speculative prefetch activity. 4 | 5 | We propose a well-known "traffic advice" resource, analogous to `/robots.txt` (for web crawlers), which allows an HTTP server to declare that implementing agents should stop sending traffic to it for some time. The formal traffic-advice specification can be found [here](https://buettner.github.io/private-prefetch-proxy/traffic-advice.html). 6 | 7 | ## Proposal 8 | 9 | HTTP request activity can broadly be divided into: 10 | * activity on behalf of a user interaction (e.g., a web browser a web page requested by the user), or which for another reason cannot easily be discarded 11 | * activity for which there is an existing specialized mechanism for throttling traffic (e.g. web crawlers respecting `robots.txt`) 12 | * activity which can easily be discarded (e.g., because it corresponds to a prefetch which improves loading performance but not correctness) at the server's request (e.g., because it is under load or the operator otherwise does not wish to serve non-essential traffic) 13 | 14 | Applications in the third category should consider acting as *agents which respect traffic advice*, so as to respect the server operator's wishes with a minimum resource impact. 15 | 16 | Agents which respect traffic advice should fetch the well-known path `/.well-known/traffic-advice`. If it returns a response with an [ok status](https://fetch.spec.whatwg.org/#ok-status) and a `application/trafficadvice+json` MIME type, the response body should contain valid UTF-8 encoded JSON like the following: 17 | 18 | ```json 19 | [ 20 | {"user_agent": "prefetch-proxy", "disallow": true} 21 | ] 22 | ``` 23 | 24 | Each agent has a series of identifiers it recognizes, in order of specificity: 25 | * its own agent name (e.g. `"ExamplePrivatePrefetchProxy"`) 26 | * decreasingly specific generic categories that describe it, like `"prefetch-proxy"` 27 | * `"*"` (which applies to every implementing agent) 28 | 29 | It finds the most specific element of the response, and applies the corresponding advice (currently only a boolean which advises disallowing all traffic) to its behavior. The agent should respect the cache-related response headers to minimize the frequency of such requests and to revalidate the resource when it is stale. 30 | 31 | 32 | If the response has a `404 Not Found` status (or a similar status), on the other hand, the agent should apply its default behavior. 33 | 34 | ## Why not robots.txt? 35 | 36 | `robots.txt` is designed for crawlers, especially search engine crawlers, and so site owners have likely already established robots rules because they wish to limit traffic from crawlers -- even though they have no such concern about prefetch proxy traffic. The `robots.txt` format is also designed to limit traffic by path, which isn't appropriate for agents which do not know the path of the requests they are responsible for throttling (as with a CONNECT proxy carrying TLS traffic). 37 | 38 | A more similar textual format would be possible, but the format for parsing `robots.txt` is not consistently specified and implemented. By contrast, JSON implementations are widely available on a wide variety of platforms used by site owners and authors. 39 | 40 | ## Application to private prefetch proxies 41 | 42 | For example, suppose a private prefetch proxy, `ExamplePrivatePrefetchProxy`, would like to respect traffic advice in order to allow site owners to limit inbound traffic from the proxy. 43 | 44 | When a client of the proxy service (e.g., a web browser) requests a connection to `https://www.example.com`, the proxy server issues an HTTP request for `https://www.example.com/.well-known/traffic-advice`. It receives the sample response body from above. It recognizes `"prefetch-proxy"` as the most specific advice to apply to itself. 45 | 46 | It caches this result (traffic is presently disallowed) at the proxy server (or even across multiple proxy server instances run by the same operator), and refuses client connections to `https://www.example.com` until an updated `/.well-known/traffic-advice` resource no longer disallows traffic. Even if a large number of proxy clients request connections to `https://www.example.com`, the site operator and its CDN do not receive traffic from the proxy except for infrequent requests to revalidate the traffic advice (which may be, for example, once per hour). 47 | --------------------------------------------------------------------------------