101 | sec-ch-ua-mobile: ?<0 or 1>
102 |
103 | ### What to prefetch
104 | Our experiment found that fetching the mainframe HTML, along with statically linked CSS and synchronous Javascript, provided a 40% LCP improvement at the median. Fetching other resources, for example images, may further improve user experience at the cost of more wasted bytes on mispredictions.
105 |
106 | # FAQ
107 | ## TLS key leaks and private prefetch proxies
108 |
109 | **Concern:** “TLS key leaks (e.g. heartbleed) pose a greater user threat with a private prefetch proxy because it allows an attacker to direct prefetches through a colluding proxy, thereby manipulating the network path and making MITM attacks easier.”
110 |
111 | This is a risk if we allow websites to specify any “private prefetch proxy” of their choosing. For instance, a malicious website could specify their own proxy, loaded with compromised TLS keys, and trick the user into clicking a prefetched link for a legitimate website. For this reason, “private prefetch proxies” need to be trusted by the browser before they will be used for prefetching.
112 |
113 | ## Risk of collusion
114 | **Concern:** “Private prefetch proxies enable a new vector for cross-site user tracking, as the proxy can directly terminate TLS connections to origins the proxy owns or colludes with, and then it can directly add tracking identifiers to requests. ”
115 |
116 | This is another reason why the proxy must be trusted by the browser. Not only must private prefetch proxies not introduce new identifiers, they must in fact IP-blind all destination origins.
117 |
118 | ## Learning about the user’s interests
119 | **Concern:** “Prefetch proxies will learn about users based on their prefetch requests, which they could monetize or leverage themselves.”
120 |
121 | Similar to other concerns about the trustworthiness of the proxy, we believe the only current way to address this concern is to require that the browser trusts the proxy. However, both the referrer website and the user have control over which proxies (if any) they are willing to use, with prefetching being disabled if there is no agreement.
122 |
123 | ## Content blockers and extensions
124 |
125 | **Concern:** “What about content blockers? How does this impact extensions?”
126 |
127 | Browsers should continue to ensure that network requests and responses are subject to a user’s installed extensions even when the requests are handled by a private prefetch proxy.
128 |
129 | For DNS based content blockers, there is a range of options to explore including allowing users to disable the feature altogether, or to enable an additional blocking DNS lookup for every domain at navigation time (along with the associated performance penalty).
130 |
131 | ## Impact for services provided by ISPs
132 | **Concern:** “How does this interact with content filtering?”
133 |
134 | We acknowledge that network administrators may need to filter content. We propose the following approach to avoid interfering in these scenarios.
135 |
136 | At startup and on a change of network, the browser would attempt to resolve a purpose-specific domain name, and examine the result:
137 |
138 | - If a response code other than NOERROR is returned (e.g. NXDOMAIN or SERVFAIL), or if a NOERROR response code is returned, but contains neither A nor AAAA records, then the browser would change its behavior in the following manner:
139 |
140 | - Upon navigation to a prefetched link, the browser would issue a blocking DNS lookup for the domain. This DNS lookup will happen at the same time and in the same manner as if the prefetch had not happened, providing the administrator with the same opportunity to filter content.
141 |
142 | ## Abuse
143 | **Concern:** “How will this interact with websites’ anti-abuse mechanisms?”
144 |
145 | To protect against attackers using the proxy to abuse websites, the prefetch proxy must block traffic that does not fit the pattern of legitimate link prefetching e.g., based on the number of requests, session duration, etc.
146 |
147 | We’re also considering schemes for authentication, and website operators can always opt-out of proxied prefetching (e.g., by rejecting requests with the “Purpose: prefetch” header or adding a [traffic-advice](https://github.com/buettner/private-prefetch-proxy/blob/main/traffic-advice.md) file). In addition, private prefetch proxies should allow for reverse DNS lookups of their IP addresses and publish an escalation path for help addressing potential abuse concerns.
148 |
149 | ## Geolocation
150 | **Concern:** "How will this work with geography-based use cases (e.g. Geo-filtering / Geo-access)?"
151 |
152 | The destination server will see the IP of the proxy egress IP, not the user's IP; this may interfere with IP-based geolocation.
153 | Servers that rely on geolocation to determine what content to serve have the following options:
154 | * Determine the location of the user at navigation time, e.g., by triggering a request via JS.
155 | * Reject requests with the "Purpose: prefetch" or the new ['Sec-Purpose: prefetch; anonymous-client-ip'](https://wicg.github.io/nav-speculation/prefetch.html#sec-purpose-header) header for resources that are georestricted.
156 |
157 | More speculative ideas worth exploring are:
158 | * Requiring proxies to only egress traffic from IPs in the same country/region as the user. The challenge here is having agreement on the granularity of "region", as proxies likely can't egress in every country.
159 | * APIs/mechanism by which the proxy can tell the destination what general region the user is in (e.g., [Geohash HTTP Client Hints](https://tfpauly.github.io/privacy-proxy/draft-geohash-hint.html)). Similar to the above, there would need to be agreement about the required granularity.
160 |
161 | ## Traffic analysis
162 | **Question**: "Even though prefetches are end-to-end encrypted between the browser and the destination, can't the proxy perform traffic analysis attacks?"
163 |
164 | [By design](https://github.com/buettner/private-prefetch-proxy#using-an-isolated-network-context), prefetches should not reveal any local state to the destination that could be used to identify the user. This means that the responses cannot be personalized. The proxy could learn, for example, that the destination runs A/B experiments on non-logged in users. But we don't believe this information is particularly valuable, and the destination can always reject prefetch requests.
165 |
166 | ## Trusted Private Prefetch Proxies (TPPP)
167 | **Question**: “What are ‘trusted private prefetch proxies’?”
168 |
169 | We would like to firm this up with the help of the community.
170 |
171 | At a high level, here is a tentative and non-exhaustive list of aspects that we think would be needed:
172 |
173 | - Requirements to define expected behavior: “a TPPP must hide the IP address of the prefetch requester”, what data can be logged, retention policy for those logs, etc.
174 | - Usage rules (e.g. abuse prevention).
175 | - Potentially audits to assert that a TPPP is implementing the requirements and usage rules as specified.
176 |
177 |
178 | ## Other concerns or questions?
179 |
180 | Please [file an issue](https://github.com/buettner/private-prefetch-proxy/issues) if you have identified something that ought to be addressed in this section.
181 |
182 |
183 | # Feedback, discussion
184 |
185 |
186 | If you are interested in this proposal, please consider participating in existing [discussions](https://github.com/buettner/private-prefetch-proxy/issues) or filing new issues to share feedback or ask questions. In addition, if you are interested in prefetching/prerendering in general, you might be interested in [discussions for alternate loading modes](https://github.com/jeremyroman/alternate-loading-modes/issues) as well.
187 |
188 |
--------------------------------------------------------------------------------
/traffic-advice.bs:
--------------------------------------------------------------------------------
1 |
2 | Title: Traffic Advice
3 | Shortname: traffic-advice
4 | Status: DREAM
5 | Repository: buettner/private-prefetch-proxy
6 | Editor: Jeremy Roman, Google https://www.google.com/, jbroman@chromium.org
7 | Abstract: A proposal to allow site owners to advise prefetch proxies and other agents to disallow traffic.
8 | Markup Shorthands: css no, markdown yes
9 | Assume Explicit For: yes
10 | Complain About: accidental-2119 yes, missing-example-ids yes
11 | Indent: 2
12 | Boilerplate: omit conformance
13 | Default Biblio Status: current
14 |
15 |
16 | {
17 | "HTTP-CACHING": {
18 | "authors": [
19 | "R. Fielding",
20 | "M. Nottingham",
21 | "J. Reschke"
22 | ],
23 | "href": "https://httpwg.org/http-core/draft-ietf-httpbis-cache-latest.html",
24 | "title": "HTTP Caching",
25 | "status": "Internet-Draft",
26 | "publisher": "IETF"
27 | },
28 | "HTTP-SEMANTICS": {
29 | "aliasOf": "RFC7231"
30 | },
31 | "WELL-KNOWN": {
32 | "aliasOf": "RFC8615"
33 | }
34 | }
35 |
36 |
37 |
38 | Introduction {#intro}
39 | =====================
40 |
41 | *This section is non-normative.*
42 |
43 | Publishers might wish not to accept traffic from private prefetch proxies and other sources other than direct user traffic, for instance to reduce server load due to speculative prefetch activity.
44 |
45 | We propose a well-known "traffic advice" resource, analogous to `robots.txt` (for web crawlers), which allows an HTTP server to request that implementing agents stop sending traffic to it for some time.
46 |
47 |
48 | Implementations {#implementations}
49 | ==================================
50 |
51 | This specification may be implemented by traffic advice respecting agents, such as proxy servers or other applications which direct HTTP traffic on behalf of clients such as a web browser.
52 |
53 | While [[FETCH]] is used to describe the algorithm to request this resource, such agents might not implement [[HTML]].
54 |
55 | Definitions {#dfns}
56 | ===================
57 |
58 | A traffic advice entry is a [=struct=] with the following [=struct/items=]:
59 | * disallowed flag, a [=boolean=] which is initially false
60 | If the [=traffic advice entry/disallowed flag=] is true, the advice requests that traffic, including establishing connections and sending requests, be avoided.
61 | * fraction, a number which is at least 0 and at most 1, initially 1
62 | The advice requests that only that fraction of traffic be permitted. A server might use this to facilitate an incremental rollout, or to partially reduce server load during peak times.
63 |
64 | A traffic advice result is null, a [=traffic advice entry=], or `"unreachable"`.
65 |
66 | An agent identity is a [=list=] of [=strings=]. It must contain at least two elements, and the last must be `"*"`.
67 |
68 | Identity {#identity}
69 | ====================
70 |
71 | Each agent should have an brand name that specifically identifies it (such as `PollyPrefetchProxy`).
72 |
73 | Its [=agent identity=] is all of the following that apply, in order:
74 |
75 | 1. The brand name
76 | 1. `"prefetch-proxy"`, if the agent is a proxy server which exclusively serves prefetch traffic (for example, a [private prefetch proxy](https://github.com/buettner/private-prefetch-proxy))
77 | 1. `"*"`
78 |
79 | Fetching {#fetching}
80 | ====================
81 |
82 |
83 |
84 | To generate a traffic advice URL for [=origin=] |origin|, run the following steps:
85 |
86 | 1. If |origin| is not a [=tuple origin=], return failure.
87 |
88 | 1. If |origin|'s [=origin/scheme=] is not an [=HTTP(S) scheme=], return failure.
89 |
90 | 1. If |origin| is not a [=potentially trustworthy origin=], return failure.
91 |
92 | 1. Return a new [=URL=] as follows:
93 |
94 | : [=url/scheme=]
95 | :: |origin|'s [=origin/scheme=]
96 | : [=url/host=]
97 | :: |origin|'s [=origin/host=]
98 | : [=url/port=]
99 | :: |origin|'s [=origin/port=]
100 | : [=url/path=]
101 | :: « `".well-known"`, `"traffic-advice"` »
102 |
103 |
104 |
105 |
106 |
107 | To fetch traffic advice for [=origin=] |origin|, [=agent identity=] |identity| and algorithm |whenComplete| accepting a [=traffic advice result=]:
108 |
109 | 1. Let |url| be the result of [=generating a traffic advice URL=] for |origin|.
110 | If it results in failure, then return failure.
111 |
112 | 1. Let |request| be a [=request=] as follows:
113 |
114 | : [=request/method=]
115 | :: `` `GET` ``
116 | : [=request/URL=]
117 | :: |url|
118 | : [=request/client=]
119 | :: null
120 | : [=request/credentials mode=]
121 | :: `"omit"`
122 | : [=request/redirect mode=]
123 | :: `"manual"`
124 | This means that a [=redirect status=] will not lead to another origin being contacted.
125 |
126 | 1. Let |fetchController| be null.
127 |
128 | 1. Let |processResponse| be the following steps, given [=response=] |response|:
129 |
130 | 1. If |response|'s [=response/type=] is `"error"`, then [=fetch controller/terminate=] |fetchController|, run |whenComplete| with `"unreachable"`, and return.
131 |
132 | 1. If |response|'s [=response/type=] is `"opaqueredirect"`, then [=fetch controller/terminate=] |fetchController|, run |whenComplete| with null, and return.
133 |
134 | 1. [=Assert=]: |response|'s [=response/type=] is `"basic"`.
135 |
136 | 1. If |response|'s [=response/status=] is 429 (Too Many Requests; see [[RFC6585]]) or 503 (Service Unavailable; see [[HTTP-SEMANTICS]]), then [=fetch controller/terminate=] |fetchController|, run |whenComplete| with `"unreachable"`, and return.
137 | If present, the [[HTTP-SEMANTICS]] `Retry-After` response header could be used as a hint about when to next retry.
138 |
139 | 1. If |response|'s [=response/status=] is not an [=ok status=], then [=fetch controller/terminate=] |fetchController|, run |whenComplete| with null and return.
140 |
141 | 1. If |response|'s [=response/status=] is a [=null body status=], then [=fetch controller/terminate=] |fetchController|, run |whenComplete| with null and return.
142 |
143 | 1. Let |mimeType| be the result of [=header list/extracting a MIME type=] from |response|'s [=response/header list=].
144 |
145 | 1. If |mimeType| is failure or its [=MIME type/essence=] is not `"application/trafficadvice+json"`, then [=fetch controller/terminate=] |fetchController|, run |whenComplete| with null and return.
146 |
147 | 1. Let |processResponseEndOfBody| be the following steps, given [=response=] |response| and null, failure or [=byte sequence=] |body|:
148 |
149 | 1. If |body| is not a [=byte sequence=], then run |whenComplete| with null and return.
150 |
151 | 1. Let |string| be the result of [=UTF-8 decoding=] |body|.
152 |
153 | 1. Let |parseResult| be the result of [=parsing traffic advice=] from |string| given |identity|.
154 |
155 | 1. Run |whenComplete| with |parseResult|.
156 |
157 | 1. [=Fetch=] |request| with [=fetch/processResponse=] set to |processResponse| and [=fetch/processResponseEndOfBody=] set to |processResponseEndOfBody|, and set |fetchController| to the result.
158 |
159 |
160 | Notwithstanding the usual behavior of [[HTTP-CACHING]], agents (especially ones shared amongst multiple users) should consider applying a minimum freshness lifetime (10 minutes is suggested) and maximum freshness lifetime (48 hours is suggested) in order to balance the [security considerations](#security) discussed below. If these suggested values are used, a default freshness lifetime (if none is specified) of 30 minutes may be appropriate.
161 |
162 |
163 |
164 |
165 | Parsing {#parsing}
166 | ==================
167 |
168 |
169 |
170 | To parse traffic advice from a [=string=] |string| given [=agent identity=] |identity|:
171 |
172 | 1. Let |parsed| be the result of [=parsing JSON into Infra values=] given |string|. If this throws an exception, then return null.
173 |
174 | 1. If |parsed| is not a [=list=], then return null.
175 |
176 | 1. Let |bestMatch| be null.
177 |
178 | 1. [=list/For each=] |entry| of |parsed|:
179 |
180 | 1. If |entry| is not a [=map=], then [=iteration/continue=].
181 |
182 | 1. If |entry|[`"user_agent"`] does not [=map/exist=] or is not a [=string=], then [=iteration/continue=].
183 |
184 | 1. Let |agentSelector| be |entry|[`"user_agent"`].
185 |
186 | 1. If |identity| does not contain |agentSelector|, then [=iteration/continue=].
187 |
188 | 1. If |bestMatch| is null or |agentSelector| appears at an earlier index in |identity| than |bestMatch|[`"user_agent"`] does, then set |bestMatch| to |entry|.
189 |
190 | 1. If |bestMatch| is null, then return null.
191 |
192 | 1. Let |entry| be a [=traffic advice entry=].
193 |
194 | 1. If |bestMatch|[`"disallow"`] [=map/exists=] and is true, then set |entry|'s [=traffic advice entry/disallowed flag=] to true.
195 |
196 | 1. If |bestMatch|[`"fraction"`] [=map/exists=] and is a number, then:
197 |
198 | 1. Let |fraction| be |bestMatch|[`"fraction"`].
199 |
200 | 1. If |fraction| is greater than or equal to 0 and less than or equal to 1, then set |entry|'s [=traffic advice entry/fraction=] to |fraction|.
201 |
202 | 1. Return |entry|.
203 |
204 |
205 |
206 | Interpretation {#interpretation}
207 | ================================
208 |
209 | When they would be able to respect advice to disallow traffic to an origin (for example, when requested to proxy prefetch traffic to the origin), [=traffic advice respecting agents=] should [=fetch traffic advice=] (respecting [[HTTP-CACHING]] semantics).
210 |
211 | If the result is null, then no advice was received. Agents should adopt their default behavior.
212 |
213 | If the result is `"unreachable"`, then the HTTP server was not able to service the request for traffic advice. Since this could indicate that the server cannot accept additional requests at this time, agents may stop traffic to the server for some interval.
214 |
215 | If the result's [=traffic advice entry/disallowed flag=] is true, then the HTTP server advises that traffic is discouraged at this time. Agents should respect this by not establishing new connections or sending new requests.
216 |
217 | Otherwise, if the result's [=traffic advice entry/fraction=] is less than 1, then the HTTP server advises that it would like to receive only a fraction of the possible traffic. Agents may implement this as they see fit, but the following algorithm is suggested on establishment of an HTTP connection on behalf of a client.
218 |
219 | 1. Choose a uniform random number |r| between 0 and 1.
220 | 1. If |r| is less than or equal to the result's [=traffic advice entry/fraction=], then the traffic is permitted by the fraction.
221 | 1. Otherwise, a connection is not established.
222 |
223 | This process should not be repeated as part of automatic retry logic, since this would defeat the server's ability to shed load in this manner. Broadly, agents should aim for a fraction of 0.1 to result in approximately 10% of the traffic to the HTTP server.
224 |
225 | This approach allows servers to scale their traffic proportionally as part of an incremental rollout. Agents should avoid approaches which might bias the permitted connections or requests in ways that might make this scaling non-linear (e.g., by preferring certain kinds of connection or user).
226 |
227 | Security considerations {#security}
228 | ===================================
229 |
230 | Type confusion {#type-confusion}
231 | --------------------------------
232 |
233 | Like other resources, it is possible that the `/.well-known/traffic-advice` path could be used for a request with some other destination (e.g., as a script). If interpreted as JavaScript, the JSON data would either be syntactically invalid or an empty block. More generally, this specification requires the use of a MIME type that is not used for any other purpose, and standard countermeasures (e.g., `X-Content-Type-Options: nosniff`) can be used to prevent type confusion in some cases which are permissive of mismatched MIME types.
234 |
235 | Caching issues {#security-caching}
236 | ----------------------------------
237 |
238 | Because the traffic advice resource is expected to be cached by [=traffic advice respecting agents=] such as private prefetch proxies, it is possible that a temporary compromise of an origin server or its private key could be extended to a longer outage of some traffic due to an agent caching a policy that prevents or throttles traffic, leading to a denial of service for such traffic. This is similar to attacks against HTTP Public Key Pinning [[RFC7469]].
239 |
240 | This is less of an issue if the traffic is non-essential (e.g., prefetch) traffic.
241 |
242 | To mitigate this, well-behaved agents implement a maximum freshness lifetime when they [=fetch traffic advice=].
243 |
244 | Request amplification {#request-amplification}
245 | ----------------------------------------------
246 |
247 | Agents which are proxy services accessible to untrusted users (esp. the general public) may be susceptible to being used to amplify a denial of service attack conducted, for example, by a botnet. For example, if a small request from a client (e.g. `CONNECT target.example:443` with small headers) can cause a larger request (e.g., `GET /.well-known/traffic-advice` with large headers) to the origin server, this could be used to increase the effective bandwidth available to the distributed denial of service attack against an origin server.
248 |
249 | To mitigate this, well-behaved agents implement, in addition to other anti-abuse measures, a minimum freshness lifetime when they [=fetch traffic advice=].
250 |
251 | Privacy considerations {#privacy}
252 | =================================
253 |
254 | This specification provides general mechanisms for agents to limit the traffic they are sending. Most privacy considerations are expected to be particular to the agents in question (for example, proxies inspecting traffic they carry).
255 |
256 | If privacy considerations related to the traffic advice mechanism itself are identified, they should be added here.
257 |
258 | IANA considerations {#iana}
259 | ===========================
260 |
261 | Well-known `traffic-advice` URI {#iana-well-known}
262 | --------------------------------------------------
263 |
264 | This document defines well-known URI suffix `traffic-advice` as described by [[WELL-KNOWN]]. It should be submitted for registration as follows:
265 |
266 | : URI suffix
267 | :: traffic-advice
268 | : Change controller
269 | :: The editor(s) of this document, pending a standards venue
270 | : Specification(s)
271 | :: This document
272 | : Status
273 | :: provisional
274 | : Related information
275 | :: None
276 |
277 | The `application/trafficadvice+json` MIME type {#iana-mime-type}
278 | ----------------------------------------------------------------
279 |
280 | This document defines the [=MIME type=] `application/trafficadvice+json` as described by [[RFC6838]]. It should be submitted for registration as follows:
281 |
282 | : Type name
283 | :: `application`
284 | : Subtype name
285 | :: `trafficadvice+json`
286 | : Required parameters
287 | :: N/A
288 | : Optional parameters
289 | :: N/A
290 | : Encoding considerations
291 | :: Always UTF-8
292 | : Security considerations
293 | :: See [Security considerations](#security).
294 | : Interoperability considerations
295 | :: This MIME type is not known to be in previous use. Applications which can process `application/json` should be able to process all valid data with this MIME type.
296 | : Published specification
297 | :: This document
298 | : Applications that use this media type
299 | :: [=traffic advice respecting agents=]
300 | : Fragment identifier considerations
301 | :: N/A
302 | : Additional information
303 | ::
304 | : Deprecated alias names for this type
305 | :: N/A
306 | : Magic number(s)
307 | :: N/A
308 | : File extension(s)
309 | :: None. This resource will be named `traffic-advice` when fetched over HTTP.
310 | : Macintosh file type code
311 | :: Same as for `application/json` [[RFC8259]]
312 | : Person & email address to contact for further information
313 | :: The editor(s) of this document
314 | : Intended usage
315 | :: Common
316 | : Restrictions on usage
317 | :: N/A
318 | : Change controller
319 | :: The editor(s) of this document, pending a standards venue
320 |
--------------------------------------------------------------------------------
/traffic-advice.md:
--------------------------------------------------------------------------------
1 | # Traffic Advice
2 |
3 | Publishers may wish not to accept traffic from [private prefetch proxies](README.md) and other sources other than direct user traffic, for instance to reduce server load due to speculative prefetch activity.
4 |
5 | We propose a well-known "traffic advice" resource, analogous to `/robots.txt` (for web crawlers), which allows an HTTP server to declare that implementing agents should stop sending traffic to it for some time. The formal traffic-advice specification can be found [here](https://buettner.github.io/private-prefetch-proxy/traffic-advice.html).
6 |
7 | ## Proposal
8 |
9 | HTTP request activity can broadly be divided into:
10 | * activity on behalf of a user interaction (e.g., a web browser a web page requested by the user), or which for another reason cannot easily be discarded
11 | * activity for which there is an existing specialized mechanism for throttling traffic (e.g. web crawlers respecting `robots.txt`)
12 | * activity which can easily be discarded (e.g., because it corresponds to a prefetch which improves loading performance but not correctness) at the server's request (e.g., because it is under load or the operator otherwise does not wish to serve non-essential traffic)
13 |
14 | Applications in the third category should consider acting as *agents which respect traffic advice*, so as to respect the server operator's wishes with a minimum resource impact.
15 |
16 | Agents which respect traffic advice should fetch the well-known path `/.well-known/traffic-advice`. If it returns a response with an [ok status](https://fetch.spec.whatwg.org/#ok-status) and a `application/trafficadvice+json` MIME type, the response body should contain valid UTF-8 encoded JSON like the following:
17 |
18 | ```json
19 | [
20 | {"user_agent": "prefetch-proxy", "disallow": true}
21 | ]
22 | ```
23 |
24 | Each agent has a series of identifiers it recognizes, in order of specificity:
25 | * its own agent name (e.g. `"ExamplePrivatePrefetchProxy"`)
26 | * decreasingly specific generic categories that describe it, like `"prefetch-proxy"`
27 | * `"*"` (which applies to every implementing agent)
28 |
29 | It finds the most specific element of the response, and applies the corresponding advice (currently only a boolean which advises disallowing all traffic) to its behavior. The agent should respect the cache-related response headers to minimize the frequency of such requests and to revalidate the resource when it is stale.
30 |
31 |
32 | If the response has a `404 Not Found` status (or a similar status), on the other hand, the agent should apply its default behavior.
33 |
34 | ## Why not robots.txt?
35 |
36 | `robots.txt` is designed for crawlers, especially search engine crawlers, and so site owners have likely already established robots rules because they wish to limit traffic from crawlers -- even though they have no such concern about prefetch proxy traffic. The `robots.txt` format is also designed to limit traffic by path, which isn't appropriate for agents which do not know the path of the requests they are responsible for throttling (as with a CONNECT proxy carrying TLS traffic).
37 |
38 | A more similar textual format would be possible, but the format for parsing `robots.txt` is not consistently specified and implemented. By contrast, JSON implementations are widely available on a wide variety of platforms used by site owners and authors.
39 |
40 | ## Application to private prefetch proxies
41 |
42 | For example, suppose a private prefetch proxy, `ExamplePrivatePrefetchProxy`, would like to respect traffic advice in order to allow site owners to limit inbound traffic from the proxy.
43 |
44 | When a client of the proxy service (e.g., a web browser) requests a connection to `https://www.example.com`, the proxy server issues an HTTP request for `https://www.example.com/.well-known/traffic-advice`. It receives the sample response body from above. It recognizes `"prefetch-proxy"` as the most specific advice to apply to itself.
45 |
46 | It caches this result (traffic is presently disallowed) at the proxy server (or even across multiple proxy server instances run by the same operator), and refuses client connections to `https://www.example.com` until an updated `/.well-known/traffic-advice` resource no longer disallows traffic. Even if a large number of proxy clients request connections to `https://www.example.com`, the site operator and its CDN do not receive traffic from the proxy except for infrequent requests to revalidate the traffic advice (which may be, for example, once per hour).
47 |
--------------------------------------------------------------------------------