├── .gitignore
├── README.md
├── old-versions
    ├── captcha-bypass-formal-spec.txt
    └── captcha-bypass-internet-draft.md
└── captcha-bypass-internet-draft.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | .refcache/
2 | *.xml
3 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # challenge-bypass-specification
 2 | 
 3 | This repository contains the formal specification for a protocol that has been developed to allow bypassing challenge pages using signed tokens that guarantee anonymity to the user. The following files are of use:
 4 | 
 5 | - captcha-bypass-formal-spec.txt: Original specification of protocol along with the algorithms that are required to instantiate it. The draft is heavily based on a preceding work by George Tankersley and Filippo Valsorda except with alterations where the design has changed since it's original conception.
 6 | - captcha-bypass-internet-draft.txt: The most current draft of a document intended to be submitted as an IETF internet draft proposal. Focuses on defining the different phases of the protocol and how extensions can be built in. All changes should be made to the markdown document captcha-bypass-formal-spec.md and then compiled using mmark and xml2rfc into the text document.
 7 | 
 8 | We encourage comments/feedback on the designs and the drafts themselves.
 9 | 
10 | 


--------------------------------------------------------------------------------
/old-versions/captcha-bypass-formal-spec.txt:
--------------------------------------------------------------------------------
  1 | Title: Cloudflare CAPTCHA bypass
  2 | Authors: George Tankersley, Filippo Valsorda, Alex Davidson, Nick Sullivan
  3 | Created: 19-Sep-2016 
  4 | 
  5 | Latest:
  6 |   - 2016-09-28
  7 |   - 2016-09-19
  8 | 
  9 | History: 
 10 |   (see repository commits)
 11 | 
 12 | Preface:
 13 |   Large parts of this specification are taken from an original draft written by 
 14 |   George Tankersley and Filippo Valsorda 
 15 |   (https://github.com/gtank/captcha-draft/blob/master/captcha-plugin-draft.txt)  
 16 | 
 17 | Overview:
 18 | 
 19 |   In many IP reputation systems, Tor exits quickly become associated with
 20 |   abuse.  Because of this, Tor Browser users quickly become familiar with
 21 |   various CAPTCHA challenges. Cloudflare use an IP-reputation based system to 
 22 |   display pages to users requiring the completion of a successful CAPTCHA before
 23 |   the requested page can be accessed. While CAPTCHAs in themselves are supposed 
 24 |   to be easily solvable for humans, Tor users are dealt a disproportionate 
 25 |   amount of these challenges due to the regularity of Tor exit nodes being dealt 
 26 |   with poor IP reputations. This problem has been likened to an act of 
 27 |   censorship against Tor users as these users are the most targeted by this 
 28 |   protection mechanism. This problem also affects users of certain VPN providers
 29 |   and of I2P services.
 30 | 
 31 |   This document describes a solution to this problem. We propose a Tor Browser
 32 |   plugin that will store a finite set of unlinkable CAPTCHA bypass tokens and a
 33 |   server-side redemption system that will consume them, allowing users to
 34 |   forego solving a CAPTCHA so long as they supply a valid token.
 35 | 
 36 | Introduction:
 37 | 
 38 |   Cloudflare uses an IP reputation database to serve challenge pages to users 
 39 |   who are deemed to be malicious. Specifically in the case of Tor, each user is 
 40 |   given the IP address of the exit node that their connection traverses through. 
 41 |   Tor exit nodes are typically assigned poor IP reputation scores due to the 
 42 |   amount of abuse that is detected through these IPs. The effect of this is that 
 43 |   regular users are presented with a higher than typical number of CAPTCHA pages 
 44 |   when attempting to visit websites protected by Cloudflare (this also affects 
 45 |   users of I2P and VPN services). 
 46 | 
 47 |   While they are intended as a minor inconvenience, they are impenetrable for 
 48 |   users with high privacy or security requirements. Google's ReCAPTCHA service 
 49 |   de facto demands JavaScript execution - the challenges it produces without JS 
 50 |   are even more cumbersome and require extra participation on the user's part. 
 51 |   Worse, the challenge page sets a unique cookie to indicate that the user has 
 52 |   been verified. Since Cloudflare controls the domains for all of the protected 
 53 |   origins, it can potentially link CAPTCHA users across all >2 million 
 54 |   Cloudflare sites without violating same-origin policy.
 55 | 
 56 |   This protocol solves both of these problems. First, it moves JavaScript
 57 |   execution into a consistent browser plugin (for use in TBB etc.) that can be 
 58 |   more effectively audited than a piece of ephemerally injected JavaScript. We 
 59 |   still require an initial CAPTCHA solution but future solutions can be avoided
 60 |   and are controlled only the plugin. Second, it separates CAPTCHA solving from 
 61 |   the request endpoint and eliminates linkability across domains with blind 
 62 |   signatures. Furthermore the solution intends to make the lives of such users 
 63 |   much easier by drastically lowering the number of CAPTCHAs that are seen by 
 64 |   honest users. Crucially this is achieved with a similar level of abuse 
 65 |   protection provided by Cloudflare to current customers of their service.
 66 | 
 67 |   In essence, the protocol allows a user to solve a single CAPTCHA and in return
 68 |   learn a specified number of tokens that are blindly signed that can be used 
 69 |   for redemption instead of witnessing CAPTCHA challenges in the future. For
 70 |   each request a client makes to a Cloudflare host that would otherwise demand
 71 |   a CAPTCHA solution, a browser plugin will automatically supply a bypass token. 
 72 |   By issuing a number of tokens per CAPTCHA solution that is suitable for 
 73 |   ordinary browsing but too low for attacks, we maintain similar protective 
 74 |   guarantees to those of Cloudflare's current system. We also leave the door 
 75 |   open to an elevated threat response that does not offer to accept bypass 
 76 |   tokens. 
 77 | 
 78 |   The solution we provide is general in the sense that it is agnostic of the 
 79 |   edge server (in this case Cloudflare) - the protocol may be of use to similar 
 80 |   edge providers who face similar issues. Furthermore while we explicitly talk 
 81 |   of CAPTCHAs as a challenge mechanism, our design will be modular enough to 
 82 |   incorporate the use of any particular challenge.
 83 | 
 84 | 1 Participants:
 85 |   
 86 |   The "edge" proxies connections for a protected website and presents a
 87 |   challenge page if the request is deemed malicious. If the challenge is 
 88 |   completed successfully then the edge issues tokens to the given user. The edge 
 89 |   also consumes these tokens in order to allow bypassing of the challenge page 
 90 |   and prevents double spending.
 91 | 
 92 |   The "plugin" runs in a user's browser and keeps track of their store of bypass 
 93 |   tokens. It detects challenges pages and presents bypass tokens when needed. It 
 94 |   also pins keys from challenge pages and validates <meta> tag signatures.
 95 | 
 96 |   We will also refer to the "user"/"client" who is responsible for making 
 97 |   connection requests to protected websites.
 98 | 
 99 |   The original draft states the independence of the "edge" and the "challenge" 
100 |   service that issues tokens upon completion of challenges. We will assume that 
101 |   both these roles are carried out alone by the "edge".
102 | 
103 | 2 Cryptographic preliminaries:
104 | 
105 |   2.1 Schemes
106 | 
107 |     This specification uses the following cryptographic building blocks:
108 | 
109 |       * A public key encryption system PK_KEYGEN()->seckey, pubkey;
110 |         PK_ENCRYPT(pubkey, msg)->ciphertext; and PK_DECRYPT(seckey,
111 |         ciphertext)->msg; where secret keys are of length PK_SECKEY_LEN bytes,
112 |         and public keys are of length PK_PUBKEY_LEN bytes.
113 | 
114 |       * A public key signature system SIGN_KEYGEN()->seckey, pubkey;
115 |         SIGN_SIGN(seckey,msg)->sig; and SIGN_VERIFY(pubkey, sig, msg) -> { "OK",
116 |         "BAD" }; where secret keys are of length SIGN_SECKEY_LEN bytes, public
117 |         keys are of length SIGN_PUBKEY_LEN bytes, and signatures are of length
118 |         SIGN_SIG_LEN bytes.
119 | 
120 |         This signature system must also support blind signing (see below).
121 | 
122 |       * A cryptographic hash function H(d), which should be pre-image and
123 |         collision resistant. It produces hashes of length HASH_LEN bytes.
124 | 
125 |       * A cryptographic message authentication code MAC(key,msg) that produces
126 |         outputs of length MAC_LEN bytes.
127 | 
128 |     We provisionally instantiate these with
129 | 
130 |       For both PK and SIGN: 2048-bit RSA using OAEP and PSS (respectively) for
131 |       normal operations.
132 | 
133 |       For H: SHA256
134 | 
135 |       For MAC: HMAC-SHA256
136 | 
137 |   2.2 Blind signing
138 | 
139 |     This specification uses the following cryptographic tools for blinding and 
140 |     unblinding tokens and corresponding signatures:
141 | 
142 |       * An algorithm BLIND_GEN(i) -> b_i where b_i is a secret blinding factor 
143 |       for blinding a single token t_i
144 | 
145 |       * An algorithm BLIND(t_i, b_i) -> t'_i that takes the original token t_i 
146 |       and a blinding factor b_i and outputs a blinded token t'_i
147 | 
148 |       * (Signing is carried out as above using the standard signing algorithm)
149 | 
150 |       * An algorithm UNBLIND(t'_i, b_i) -> taking the blinded token and the 
151 |       blinding factor used and outputting the original unblinded token t_i
152 | 
153 |       * An algorithm UNBLIND_SIG(s'_i, b_i) -> s_i where s'_i is a signature on 
154 |       the blinded token t'_i, b_i is the blinding factor used to retrieve t'_i 
155 |       from t_i and s_i is a valid signature on t_i.
156 | 
157 |     We use the RSA modification that is described at 
158 |     https://en.wikipedia.org/wiki/Blind_signature#Blind_signature_schemes to 
159 |     instantiate the algorithms that we define above. 
160 | 
161 |     NOTE: In the future we believe it may be preferable to switch to a 
162 |     pairing-based elliptic curve variant that allows for blind signing. 
163 |     For example we may be able to use the variant of BLS signatures described 
164 |     here: 
165 |     http://crypto.stackexchange.com/questions/2424/which-blind-signature-schemes-exist-and-how-do-they-compare
166 |     this would allow for much shorter signatures on tokens. 
167 |     Our construction maintains generality anyway and so switching the specific 
168 |     scheme should not affect the operation of the protocol.
169 |     
170 |   2.3 Keys
171 | 
172 |     This protocol involves several keys. For clarity, they are:
173 | 
174 |       * One secret/public keypair for ENCRYPT, used by the plugin to encrypt the
175 |         token contents such that they can only be read by the edge.
176 | 
177 |       * One secret/public keypair for SIGN, used by the challenge service to sign
178 |         bypass tokens. This key pair *must* be different to the key pair for 
179 |         ENCRYPT to avoid attacks 
180 |         (https://en.wikipedia.org/wiki/Blind_signature#Dangers_of_blind_signing).
181 | 
182 |       * One secret/public keypair for SIGN, used by the edge to sign <meta> tags
183 |         on challenge pages. If the same operator controls both the edge and the
184 |         challenge service, this MUST be different from the keypair used to sign
185 |         bypass tokens because the client has access to a signing oracle under 
186 |         the token key (see "Obtaining signatures"). The plugin should be 
187 |         distributed with a pinned copy of this public key for each supported 
188 |         edge.
189 | 
190 |       * A symmetric MAC key derived from each nonce, used by the plugin to bind 
191 |         a redeemed nonce to its particular request.
192 | 
193 | 3 Notation 
194 | 
195 |   - We use the symbol || to denote concatenation
196 |   - All tokens are assumed to be communicated in the hex representation of their 
197 |   byte arrays.
198 | 
199 | 4 Token representation
200 | 
201 |   The edge issues blindly signed tokens to the user who completes a given 
202 |   challenge that was specified by the edge. The tokens are stored and controlled 
203 |   by a plugin that is present in the client's browser.
204 | 
205 |   For our purposes tokens will be a JSON object comprising a single "nonce" 
206 |   field. The "nonce" field will be made up of 30 cryptographically random bytes.
207 | 
208 |   Token JSON:
209 | 
210 |   {
211 |       nonce: <random 30 bytes>
212 |   }
213 | 
214 | 5 Client-Edge messages 
215 | 
216 |   The client communicates with the edge using "JSON requests" that can currently
217 |   be used for designating tokens for signing or a single token for redemption 
218 |   and bypassing challenge pages. 
219 | 
220 |   A JSON signing request takes the form:
221 | 
222 |   {
223 |       type: "Signing",
224 |       contents: [t'_1, t'_2, ... t'_M]
225 |   }
226 | 
227 |   where each t'_i is a blinded version of a token taking the form above. 
228 |   A JSON redeeming request takes the form: 
229 | 
230 |   {
231 |       type: "Redeem",
232 |       contents: [<encrypted token>, <signature>, <HMAC>]
233 |   }
234 | 
235 |   where the encrypted token is an unblinded token as above, the signature is 
236 |   unblinded and valid over this token and the HMAC is computed over the Host 
237 |   header and HTTP path of the request containing the token. The HMAC is also 
238 |   keyed with the nonce from the "nonce" field in the token JSON.
239 | 
240 |   The contents field on both of these requests is base-64 encoded before it is 
241 |   sent.
242 | 
243 | 6 Plugin operation
244 | 
245 |   The browser that is operated by the user will contain an extension known as 
246 |   the "plugin". The plugin will contain a CA cert that it can use to validate 
247 |   the key pairs used by the edge for signing/encryption. The plugin will also 
248 |   have access to a certificate transparency (CT) log to check that the certs it 
249 |   has been issued are not unique (see Section 8.1). These keys and certificates 
250 |   will be updated periodically by the edge depending on internal security 
251 |   considerations (default: 6 months).
252 | 
253 |   When the edge issues signed tokens to the user, these will be stored against 
254 |   the public key that correctly validates the signatures on the tokens. When 
255 |   posed with a token redemption request the plugin will choose a token that has 
256 |   not been used yet and send this to the edge along with the paired signature. 
257 | 
258 | 7 Design
259 | 
260 |   7.1 Generation/signing of tokens
261 | 
262 |     When a user issues a HTTP request for a website that is protected the edge 
263 |     will respond with a challenge page, for instance containing a CAPTCHA. 
264 |     CAPTCHA responses are typically submitted to a <form> tag on this page. 
265 | 
266 |     If the edge accepts tokens for bypassing CAPTCHAs then it will add an extra 
267 |     <meta> tag to the <head> of the HTML body with id="captcha-bypass". It will
268 |     also add certificates for the public keys that are currently in use for 
269 |     signing and encryption.
270 |     A compatible plugin (see 5) should then follow these steps:
271 | 
272 |       * check the validity of the certificate;
273 |       * verify the certs in the meta tags;
274 |       * check the CT log to make sure these certs are publicly available;
275 |       * generate M tokens and save them (e.g. 50 < M < 500?);
276 |       * blind them as appropriate;
277 |       * when the challenge solution is sent to the edge, the blinded tokens are 
278 |       sent as part of the body of the HTTP request in a JSON signing request.
279 | 
280 |     The edge, after performing validation on the rest of the request (i.e.
281 |     checking the CAPTCHA solution), should sign the blinded values and send the 
282 |     signatures comma-separated in the response body as part of a concatenated 
283 |     string.
284 | 
285 |     The client then:
286 | 
287 |       * parses each of the signatures and pairs them with the blinded tokens;
288 |       * verifies the signatures;
289 |       * unblinds the signature/token pairings;
290 |       * verifies the unblinded pairs;
291 |       * stores the signed tokens against the edge public verification key.
292 | 
293 |     The client validates the signatures with the pinned key to prevent
294 |     tracking via per-user signing keys.
295 | 
296 |   7.2 Token redemption
297 |   
298 |     An edge server communicates to a client that it will accept tokens signed by
299 |     a certain challenge service key using the same <meta> tags described above.
300 |     When encountering a page displaying these tags a client does the following:
301 | 
302 |       * check the validity of the certificate;
303 |       * verify the certs in the meta tags;
304 |       * check the CT log to make sure these certs are publicly available;
305 |       * check if it has available tokens signed by the key on the cert;
306 |       * mark the current origin as accepting tokens signed with the given server 
307 |       key;
308 |       * reload the page.
309 | 
310 |     A client should not reload the page if it tried to redeem a token while
311 |     loading it the first time, to avoid infinite loops.  
312 | 
313 |   7.3 Edge validation process
314 | 
315 |     Every time a client sends an HTTP request to an origin marked as accepting 
316 |     tokens for a certain server key, it can send a  "challenge-bypass-token" 
317 |     HTTP header, composed of a JSON redeeming request containing a base-64 
318 |     encoded contents field with the following properties:
319 | 
320 |       * a yet unused token, encrypted with the corresponding edge identity key;
321 |       * the unblinded signature for that token;
322 |       * a HMAC - keyed with the nonce from the "nonce" field of the token - of 
323 |       the concatenation of the Host header and the HTTP path of the request.
324 | 
325 |     Used nonces must be deleted immediately by the plugin/client. In practice, 
326 |     the signatures that we use are JWS objects and we store the encrypted token
327 |     directly on the JWS object rather than separately.
328 | 
329 |     The edge, when receiving such a header performs the following validation on 
330 |     the request:
331 | 
332 |       * set a "success" bool to true;
333 |       * decrypt the token, if the token is malformed set "success" to false;
334 |       * check the signature on the token, if invalid set "success" to false;
335 |       * check if the token was already used, if so set "success" to false;
336 |       * validate the HMAC (keyed with the "nonce" field of the token) using the 
337 |       "host" and "http" information from the verification request, if invalid 
338 |       set "success" to false.
339 | 
340 |     A server must reject malformed nonces if a client tries to redeem them, even 
341 |     if they come with a valid signature. This is to prevent attacks that 
342 |     exploit the homomorphic property of RSA signatures. 
343 | 
344 |     If at the end of this process "success" is true, the edge allows the request 
345 |     through to the requested origin. The nonce that was used is persisted to 
346 |     prevent double-spending detection in the future but all other information
347 |     is forgotten.
348 | 
349 |     Finally, the edge also gives the client a single-domain clearance 
350 |     cookie. This allows the client to make future visits to the domain in 
351 |     question without having to spend more tokens. This means that we may be able 
352 |     to make considerable reductions in the amount of tokens we hand out which 
353 |     may lead to longer key rotation periods and more efficient run-times for 
354 |     generation and verification of tokens in both the edge and browsers.
355 | 
356 |   7.4 Double-spend detection
357 | 
358 |     The scheme requires the server to detect nonce reuse with reasonable 
359 |     reliability. However, there might be no need for a zero false positive rate, 
360 |     because if an attacker needs to make 10,000 requests to have one succeed, 
361 |     that's possibly an acceptable trade-off.
362 | 
363 |     Therefore, the server could use data structures such as Bloom filters or 
364 |     cuckoo filters to store tokens that it has witnessed. The parameters of 
365 |     these structures can be chosen to ensure a false-positive probability of any 
366 |     given amount. Cuckoo filters may be more efficient but Bloom filters may be 
367 |     easier to construct.
368 | 
369 | 8 Security considerations
370 | 
371 |   (Largely taken from previous draft)
372 |   
373 |   8.1 Deanonymization potential
374 | 
375 |     The current Cloudflare CAPTCHA simply places a cookie allowing you to
376 |     access the website. Since Cloudflare controls the origins, it could
377 |     currently correlate user sessions across multiple circuits using these
378 |     cookies. This is a gap in the Tor Browser threat model- the design
379 |     explicitly ignores linking within a session by malicious first parties, but
380 |     Cloudflare has effectively first-party control over a large proportion of 
381 |     the web.
382 | 
383 |     Our design is an improvement over this state of affairs. Since the CAPTCHA
384 |     service only sees blinded nonces, Cloudflare cannot link a CAPTCHA solution
385 |     session to a given redemption request. Since each token is used only once,
386 |     in contrast to a cookie, the tokens themselves cannot be used to link
387 |     requests.
388 | 
389 |     The largest vector for deanonymization is a "key tagging attack" whereby 
390 |     Cloudflare could advertise unique sub-certificates for each connection 
391 |     looking to have tokens signed. The edge could then link redemption requests
392 |     with the original signed tokens and thus compromise the anonymity model we 
393 |     are looking to preserve. Our design addresses this by pinning a Cloudflare 
394 |     CA cert in the Browser plugin itself. The plugin then checks that all certs 
395 |     that are provided can be found in public CT logs to prevent usage of these 
396 |     certificates in a unique way for each user. At the very least this prevents
397 |     the edge from being able to link requests in the method mentioned above.
398 | 
399 |   8.2 Interception of redemption tokens by malicious exits
400 | 
401 |     If the target site is accessed over HTTP, there is an opportunity for a
402 |     malicious exit to extract the tokens from the user's requests and replay
403 |     them for its own use. To alleviate this, we include a MAC over the HOST and
404 |     PATH headers for a particular request using the redeemed nonce as a key.
405 |     Since the unblinded nonce is known only to the client and the CAPTCHA
406 |     service endpoint (due to the copy encrypted in the redemption token) it
407 |     constitutes a shared key that allows the service to verify this binding and
408 |     disallow out-of-scope replays. This provides very similar behavior to
409 |     third-party caveats in Google's macaroon design.
410 | 
411 |   8.3 Token stockpiling
412 | 
413 |     An attacker who wishes to bypass many CAPTCHAs in the future could
414 |     intentionally trigger CAPTCHAs (e.g. by first running attacks through a
415 |     particular IP) and save the resulting tokens for later.
416 | 
417 |   8.4 Token exhaustion attacks by malicious exits or sites
418 | 
419 |     An entity with the ability to inject content (such as a malicious exit or
420 |     website) can spoof requests for bypass tokens. In the absence of logic to
421 |     prevent this, an attacker can induce the TBB plugin to spend all of its
422 |     bypass tokens.
423 | 
424 | 
425 | 


--------------------------------------------------------------------------------
/old-versions/captcha-bypass-internet-draft.md:
--------------------------------------------------------------------------------
  1 | % Title = "Protocol for bypassing challenge pages using RSA blind signed tokens"
  2 | % abbrev = "Protocol for bypassing challenge pages"
  3 | % category = "info"
  4 | % docName = "draft-protocol-challenge-bypass-00"
  5 | % ipr= "trust200902"
  6 | % area = "Internet"
  7 | % workgroup = "Network Working Group"
  8 | % keyword = [""]
  9 | %
 10 | % date = 2016-09-13
 11 | %
 12 | % [pi]
 13 | % toc = "yes"
 14 | %
 15 | % #Independent Submission
 16 | % [[author]]
 17 | % initials="A."
 18 | % surname="Davidson"
 19 | % fullname="Alex Davidson"
 20 | % organization = "Royal Holloway, University of London"
 21 | %   [author.address]
 22 | %   email = "alex.davidson.2014@live.rhul.ac.uk"
 23 | %   [author.address.postal]
 24 | %   street = "Egham Hill"
 25 | %   city = "Egham"
 26 | %   code = "TW20 0EX"
 27 | %
 28 | % [[author]]
 29 | % initials="N."
 30 | % surname="Sullivan"
 31 | % fullname="Nick Sullivan"
 32 | % organization = "CloudFlare"
 33 | %   [author.address]
 34 | %   email = "nick@cloudflare.com"
 35 | %   [author.address.postal]
 36 | %   street = "101 Townsend St"
 37 | %   city = "San Francisco"
 38 | %   code = "CA 94107"
 39 | %
 40 | % [[author]]
 41 | % initials="G."
 42 | % surname="Tankersley"
 43 | % fullname="George Tankersley"
 44 | % organization = "coreOS"
 45 | %   [author.address]
 46 | %   email = "george.tankersley@gmail.com"
 47 | %
 48 | % [[author]]
 49 | % initials="F."
 50 | % surname="Valsorda"
 51 | % fullname="Filippo Valsorda"
 52 | % organization = "CloudFlare"
 53 | %   [author.address]
 54 | %   email = "filippo@cloudflare.com"
 55 | %   [author.address.postal]
 56 | %   street = "25 Lavington Street"
 57 | %   city = "London"
 58 | %   code = "SE1 0NZ"
 59 | 
 60 | 
 61 | .# Abstract
 62 | 
 63 | This document proposes a protocol for bypassing challenge pages (such 
 64 | as forms requiring CAPTCHA submissions) that are served by edge 
 65 | services in order to protect origin websites. A client is required to 
 66 | complete an initial challenge and is then granted signed tokens which
 67 | can be redeemed in the future to bypass challenges and thus meaning 
 68 | that honest users undergo less manual computation. The signed tokens 
 69 | are cryptographically unlinkable to prevent future requests being 
 70 | linked to the original signed set of tokens.
 71 | 
 72 | {mainmatter}
 73 | 
 74 | # Introduction
 75 | 
 76 | Various challenge pages are used to distinguish human access to a 
 77 | website from automated access, with the intention of preventing 
 78 | malicious behaviour that could compromise the website that is being 
 79 | hosted. CAPTCHAs ("Completely Automated Public Turing test to tell 
 80 | Computers and Humans Apart") [@!ABHL03] are one of the most widely 
 81 | used methods for distinguishing human access to a resource from 
 82 | automated access. CAPTCHAs are regularly deployed as "interstitial"
 83 | pages forcing a user to answer the CAPTCHA before access is given to 
 84 | a website that was requested by the user. This is used to prevent 
 85 | malicious access by automated processes that can adversely affect the 
 86 | performance of the website itself. While these 'challenges' succeed in 
 87 | their mission statement, they do add a noticeably extra amount of work
 88 | to honest users who have to complete them.
 89 | 
 90 | These challenge pages are commonly enforced by CDNs who offer security
 91 | services to customers. Companies like CloudFlare offer the ability for 
 92 | customers to serve CAPTCHA pages (via Google's ReCAPTCHA service) to 
 93 | any IP addresses requesting a protected resource where the IP is 
 94 | deemed to have a "bad reputation". IP reputation scoring is done 
 95 | externally and is based on whether any malign activity (such as 
 96 | spamming and/or abuse) is detected as originating from this address. 
 97 | 
 98 | Services such as Tor suffer dramatically under such reputation-based 
 99 | systems. Users are assigned to one of a small number exit nodes when 
100 | accessing webpages through Tor and thus they are assigned the IP of 
101 | the node itself. The IP addresses of these nodes are frequently 
102 | associated with malicious and abusive behaviour and are thus assigned
103 | poor reputation scores. This can also effect other services such as 
104 | VPN providers and I2P traffic.
105 | 
106 | The end result is that honest users of services like Tor are forced to 
107 | complete many challenge pages in order to access content provided by 
108 | edge service providers such as CloudFlare. This problem is exacerbated
109 | by the fact that these companies typically offer services to a 
110 | wide-range of websites including many of the most visited web pages on 
111 | the internet today. This results in a huge increase in workload for 
112 | these users for a very common and unobtrusive task. 
113 | 
114 | Further problems arise for users who choose not to enable JavaScript 
115 | in their browsers since they are served with challenges that are 
116 | rapidly deteriorating to the point where a large proportion of 
117 | challenges are too hard to be solved.
118 | 
119 | Currently some edge providers (e.g. CloudFlare) attempt to solve this 
120 | problem by providing cookies that enable access to protected resources 
121 | once a CAPTCHA has been solved. There are two problems with this 
122 | method:
123 | - when a new Tor circuit is constructed the cookie is rendered 
124 |   useless;
125 | - due to a CDN's sprawling presence, giving users cookies can lead to 
126 |   future deanonymisation attacks that bypass the current Tor threat 
127 |   model.
128 | As such a new solution to this problem is needed.
129 | 
130 | In this document we detail a protocol that enables a user to complete
131 | a single edge-served challenge page in return for a finite number of 
132 | signed tokens. These tokens can then be used to bypass future 
133 | challenge pages that are served by participating edge-providers. The 
134 | tokens are generated in such a way that signed tokens cannot be linked
135 | to future redeemed tokens for bypassing. We achieve this using the RSA
136 | blind signature scheme first presented by David Chaum [@?Cha83]. 
137 | 
138 | ## Terminology
139 | 
140 | The keywords **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**,
141 | **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL**, when they appear in this
142 | document, are to be interpreted as described in [@?RFC2119].
143 | 
144 | The following terms are used:
145 | 
146 | edge: A serving endpoint that provides access to a protected origin.
147 | 
148 | client: The endpoint attempting to access an edge-protected service.
149 | 
150 | origin: The endpoint where web content is stored.
151 | 
152 | edge-protected: Term for origins that pay the edge to provide 
153 | protection services for their domain.
154 | 
155 | endpoint: Points where requests and responses are dealt with.
156 | 
157 | browser: A program ran by the client that provides access to webpages.
158 | 
159 | plugin: An installed service that runs in the client's browser.
160 | 
161 | tokens: JSON structures that are generated by the plugin in the 
162 | client's browser for future redemption.
163 | 
164 | blinding: An operation that "hides" the contents of the token while 
165 | still allowing the underlying token to be cryptographically signed.
166 | 
167 | unblinding: The reverse procedure of blinding. Recovers a token and
168 | (if signed) a valid signature on the token.
169 | 
170 | challenge answer: Generated when submitting a response to a given 
171 | challenge.
172 | 
173 | challenge page: A page generated by the edge for the client. The 
174 | client must answer a challenge on the page correctly and return it to
175 | gain access to a particular resource.
176 | 
177 | nonce: A randomly sampled value that is used for generating unique
178 | tokens.
179 | 
180 | 
181 | # Protocol Overview
182 | 
183 | Our protocols are initiated when a client is presented with a 
184 | challenge page that contains additional information indicating that 
185 | the edge service accepts tokens for bypassing the challenge. This can 
186 | be indicated in the HTML of the page as a meta tag along with a 
187 | certificate advising which public key the edge is currently using.
188 | Two separate protocols exist for when the client has no signed tokens
189 | available to them and secondly when the client already has tokens.
190 | 
191 | Both protocols require essentially four rounds of communication. We 
192 | take into account the initial request and response when the client 
193 | attempts to visit an edge-protected origin and is served a challenge 
194 | page instead.
195 | 
196 | ## Acquiring Signed Tokens
197 | 
198 | {#fig-sign-toks-full}
199 | ~~~
200 | 
201 |        Client                                            Edge
202 |     
203 |        [OriginRequest]         ------->                      
204 |                                                 ChallengePage  ^                                                                      
205 |                                                  + bypass_tag  |
206 |                                                + sig_key_cert  |
207 |                                <-------            [Response]  v
208 |     ^  VerifyCertificate
209 |     |  GenerateTokens
210 |     v  BlindTokens
211 |     ^  SolveChallenge
212 |     v  [SignRequest]           ------->        
213 |                                               VerifyChallenge  ^
214 |                                                    SignTokens  |
215 |                                <-------            [Response]  v
216 |     ^  VerifySigs
217 |     |  UnblindTokens
218 |     |  StoreTokens
219 |     v  [Finished]
220 | 
221 | ~~~
222 | Figure: Full message flow for acquiring signed tokens
223 | 
224 | 
225 | When a client attempts to visit an edge-protected origin the edge can 
226 | indicate that it accepts tokens for bypassing a challenge page in 
227 | exchange as well as presenting a certificate corresponding to their 
228 | current signing key. In this event the client does the following:
229 | 
230 | - checks that the certificate is valid and that the signature 
231 |   verifies correctly;
232 | - checks that it is aware of the public key provided (e.g. that the 
233 |   key is pinned in the plugin);
234 | - generates N tokens and blinds them;
235 | - sends the tokens to the edge along with an answer to the 
236 |   challenge.
237 | 
238 | In practice N =< 100 so as not put too much work on the browser, 
239 | limiting to this number also mitigates DDoS potential. After receiving 
240 | the tokens and the answer to the challenge the edge does the 
241 | following:
242 | 
243 | - checks that the answer is correct;
244 | - if this is the case then it signs the tokens and returns the 
245 |   signatures to the client.
246 | 
247 | The client does not immediately get access to the origin, though this
248 | can be achieved if the client immediately reloads the page and redeems
249 | a token using the process below. The client participates in some final
250 | post-processing:
251 | 
252 | - they check that the signatures verify correctly with respect to the 
253 | pinned public key and the blinded token;
254 | - they unblind the token and signature pair to get a new pair of the 
255 | original token and a valid sigature;
256 | - they finally store the pair in their browser plugin for future use.
257 | 
258 | 
259 | ## Redeeming Tokens
260 | 
261 | {#fig-redeem-toks-full}
262 | ~~~
263 | 
264 |        Client                                            Edge
265 |     
266 |        [OriginRequest]          ------>                      
267 |                                                 ChallengePage  ^                                                                      
268 |                                                  + bypass_tag  |                                                 
269 |                                                + sig_key_cert  |
270 |                                 <------            [Response]  v
271 |     ^  VerifyCertificate
272 |     |  ConstructTokenMessage
273 |     |  ProofOfWork*
274 |     v  [SendToken]              ------>   
275 |                                                    VerifyPoW*  ^   
276 |                                            VerifyTokenMessage  |
277 |                                                     GetOrigin  |
278 |                                 <------            [Response]  v
279 |        [Finished]
280 | 
281 | ~~~
282 | Figure: Full message flow for redeeming tokens
283 | 
284 | * - optional extensions to the protocol
285 | 
286 | As before the client attempts to visit an edge-protected website and 
287 | is faced with a challenge page. If the edge accepts tokens and 
288 | provides a certificate that corresponds to a key that the client has 
289 | pinned in their plugin and they have tokens signed by the counterpart 
290 | private key then the client can attempt to bypass the prospective 
291 | challenge page. This process is as follows:
292 | 
293 | - client constructs and sends a message containing:
294 |   - an encrypted, unused token;
295 |   - a valid signature for the token;
296 |   - a HMAC, keyed by the token and computed over message identifying 
297 |   information;
298 | - edge receives the message and performs the following checks:
299 |   - decrypts the token and checks that it resembles an agreed 
300 |   structure, else the protocol is aborted;
301 |   - checks if the token has already been used, if so the protocol is 
302 |   aborted; 
303 |   - verifies the signature on the unencrypted token;
304 |   - validates the HMAC using the token as a key and unique message 
305 |   information as input;
306 | - If all checks pass the edge allows the client access to the 
307 | originally requested origin.
308 | 
309 | # Preliminaries 
310 | 
311 | ## Protocol Communication
312 | 
313 | We assume that our protocol is carried out over HTTP. This is a 
314 | natural choice for the medium of communication given that the protocol
315 | is initiated by a client who is accessing a URI over the internet. 
316 | Due to this assumption we may refer to messages between the client and
317 | the edge as HTTP requests and responses respectively. This also helps 
318 | us to elaborate on particulars of the protocol that are intrinsically
319 | linked to this method of communication. 
320 | 
321 | However, while the original intention of this protocol is for 
322 | bypassing challenge pages over HTTP we encourage usage of the idea to 
323 | any scenario where the receiving of unlinkable "currency" is an 
324 | appropriate reward for completing some pre-defined challenge. The 
325 | message format of the protocol is not strictly required to be HTTP as 
326 | long as no structural changes are made to the messages that are sent.
327 | 
328 | ## Design Formation
329 | 
330 | To explain the concepts in our design we will use a variety of 
331 | structures that are most easily exposed using an easily readable code 
332 | syntax. 
333 | 
334 | ### Variables and Functions
335 | 
336 | Variables and functions follow the syntax of C-like languages, 
337 | such as:
338 | 
339 |     int apple = 7;
340 | 
341 | where `int` is the type of the variable 'apple' and 7 is the value 
342 | assigned to it. All types are self-explanatory and follow convention 
343 | apart from:
344 | 
345 | - `int_b`: Used for large integers when undergoing cryptographic 
346 | operations in large groups.
347 | - All arrays will be denoted by a set of square brackets followed by 
348 | the type of data that is contained. For example an array of strings 
349 | will be described as: 
350 | 
351 |     `[]string`
352 | 
353 | - We call key-value stores 'maps' and define them as:
354 | 
355 |     `map[type_1](type_2)`
356 | 
357 | where type_1 is the type of the keys for the map (e.g. string) and 
358 | type_2 is the type of the stored values (e.g. int).
359 | 
360 | We avoid type declaration when defining functions in favour of a 
361 | textual explanation.
362 | 
363 | ### Structs
364 | 
365 | We use structs to define a closed ecosystem (similar to an object) 
366 | with a list of variables and functions that define the struct. We 
367 | describe structs using the syntax:
368 | 
369 |     struct Person {
370 |       var (
371 |         string name;
372 |         int age;
373 |         map[string](string) emailAddresses;
374 |       );
375 | 
376 |       func (
377 |         setAge(n int);
378 |         changeName(name string);
379 |         addEmail(email string);
380 |       )
381 |     }
382 | 
383 | this gives us an interface with which we can interact with the struct,
384 | allowing us to store and access data with respect to this definition.
385 | Here, `vars` defines a list of variables stored on the struct, while 
386 | `func` defines a list of functions that require implementation.
387 | 
388 | ### JSON Objects
389 | 
390 | We use JSON objects for representing tokens and for constructing 
391 | messages for sending from the client to the edge. We define our JSON 
392 | structures as:
393 | 
394 |     {
395 |       "key_1":"[value_1]"
396 |       "key_2":"[value_2]"
397 |             .
398 |             .
399 |             .
400 |       "key_n":"[value_n]"
401 |     }
402 | 
403 | where each `"key_i"` is a key value, key_i is marshaled as a string 
404 | but can be any built-in type. Likewise `"[value_i]"` represents the 
405 | corresponding value for `"key_i"`, `value_i` is typically encoded as a 
406 | string in either hexadecimal or base-64 encoding. We assume that all 
407 | JSON is accessible using a map-interface where, if data is a JSON 
408 | object, then `data[key_1]` returns `value_1`.
409 | 
410 | ## Data Formatting
411 | 
412 | This section deals with the formatting of the different data types 
413 | that are required in our protocol. This will cover, for instance, how 
414 | tokens should be formatted and how messages between the client and the
415 | edge should be structured.
416 | 
417 | ### Tokens
418 | 
419 | Tokens are JSON-like structures containing a single nonce field, i.e.
420 | 
421 |     {
422 |       "nonce":"[nonce_value]"
423 |     }
424 | 
425 | where [nonce_value] is a hex encoded 30-byte sequence of 
426 | cryptographically random bytes.
427 | 
428 | ### Client-Edge Message Format
429 | 
430 | The messages that the client sends to the edge after being served a 
431 | challenge page (i.e. in the third round of communication) are written
432 | as JSON structures. These messages designate the type of operation 
433 | that is required of the edge. All messages below are base-64 encoded 
434 | before they are sent. 
435 | 
436 | The messages are heavily defined by the HTTP protocol that the 
437 | client-edge interaction takes place over. For example, the signing 
438 | messages detailed below are included in the body of a HTTP request due 
439 | to artificial limits placed on HTTP header field sizes by web servers. 
440 | Likewise the redemption messages are significantly smaller and are 
441 | thus included in a header. This difference in transport architecture
442 | leads to differences in the message formats shown below. 
443 | 
444 | #### Signing
445 | 
446 | In the first protocol, when the client sends an answer to the given 
447 | challenge page they can also append a JSON object to the of the form:
448 | 
449 |     {
450 |       "type":"Signing",
451 |       "contents":"[t'_1],[t'_2], ...,[t'_N]"
452 |     }
453 | 
454 | where [t'_i] is a generated token that has been subsequently blinded. 
455 | We call such a JSON object a 'JSON signing request' (JSR).
456 | 
457 | After base-64 encoding is done the final message is sent to the edge 
458 | as:
459 | 
460 |     blinded-tokens=[base-64 encoded JSR]
461 | 
462 | #### Redeeming
463 | 
464 | In the second protocol when the client attempts to bypass a challenge 
465 | they send a message containing a JSON object of the form:
466 | 
467 |     {
468 |       "type":"Redeem",
469 |       "contents":"[<encrypted_token>,<signature>,<HMAC>]"
470 |     }
471 | 
472 | where the token that is encrypted has been since unblinded. Such an 
473 | object is known as a 'JSON redemption request' (JRR).
474 | 
475 | ### Edge-Client Message Format
476 | 
477 | The messages returned by the edge to the client are much more heavily 
478 | defined by the messaging protocol being used to communicate. For 
479 | example, in the redemption protocol the server merely serves content 
480 | from the origin in the event that the token that is redeemed verifies
481 | correctly.
482 | 
483 | In the first protocol however the edge also returns a comma-separated
484 | list:
485 |     
486 |     signatures=[s'_1],[s'_2],...,[s'_N]
487 | 
488 | where [s'_i] is a signature computed by the edge over the blinded 
489 | token [t'_i] that it received along with a response to the challenge
490 | page that was sent.
491 | 
492 | ### Signature Transport Format
493 | 
494 | Signatures are sent between the client and the edge using the JWS 
495 | format defined in [@?RFC7515]. The token that is signed is stored as 
496 | the payload on the JWS object - thus when carrying out unblinding on 
497 | the signature the payload must also be updated.
498 | 
499 | ### Certificate Transport Format
500 | 
501 | Certificates can be transported via any standardised method for 
502 | encoding a certificate (e.g. X.509v3 [@?RFC5280]). 
503 | 
504 | # Cryptographic Tools
505 | 
506 | To instantiate the protocols above we require a set of tools that
507 | allows either participant to perform cryptographic operations over 
508 | data. In this section we detail the materials and the algorithms that
509 | are required in order to compute these operations.
510 | 
511 | ## Keys
512 | 
513 | Our protocol requires two key pairs:
514 | 
515 | - edge identity keys (id-pub-key, id-priv-key): Used for performing 
516 | the encryption and decryption required on the token that is sent for
517 | redemption;
518 | - edge signing keys (sign-pub-key, sign-priv-key): Used for performing
519 | the signing and verification of signatures;
520 | - A symmetric MAC key derived from the 'nonce' field on a token.
521 | 
522 | The edge holds both key pairs and the plugin in the client's browser 
523 | has the public keys. The MAC key is derived at the time of messaging 
524 | and is learnt by both parties.
525 | 
526 | ## Signing/Verifying Algorithms
527 | 
528 | - SIGN(sign-priv-key, data) --> sig : takes a private signing key and 
529 | some 'data' and returns a valid signature 'sig' on 'data'.
530 | - VERIFY(sign-pub-key, data, sig) --> 'good'/'bad' : takes the a 
531 | public verification key, some 'data' and a signature 'sig' and outputs
532 | 'good' if 'sig' is a valid signature on 'data'. Otherwise it outputs 
533 | 'bad'.
534 | 
535 | ## Blinding/Unblinding Algorithms
536 | 
537 | - BLIND(blinding-factor, data) --> blind-data : takes a randomly 
538 | sampled 'blinding-factor' and some 'data' and outputs 'blind-data' that 
539 | is computationally unlinkable from 'data'.
540 | - UNBLIND(blinding-factor, blind-data, blind-sig) --> (data, sig) : 
541 | takes 'blind-data' and the randomly sampled 'blinding-factor' used to 
542 | generate it, along with an optional parameter for a valid signature
543 | 'blind-sig' computed over 'blind-data' as input. Outputs 'data' and 
544 | 'sig' where 'data' is the unblinded counterpart to 'blind-data' and 
545 | 'sig' is a valid signature on 'data'.
546 | 
547 | ## Encryption/Decryption Algorithms
548 | 
549 | - ENCRYPT(id-pub-key, plaintext) --> ciphertext : takes a public 
550 | encryption key and a 'plaintext' as input and outputs an encrypted 
551 | 'ciphertext'.
552 | - DECRYPT(id-priv-key, ciphertext) --> plaintext : takes a private
553 | decryption key and an encrypted 'ciphertext' as input and outputs a
554 | 'plaintext'.
555 | 
556 | 
557 | ## MAC Algorithm
558 | 
559 | Our MAC algorithm has the following specification: 
560 | 
561 | - MAC(mac-key, data) --> mac : takes a symmetric mac-key and 'data' 
562 | as input and outputs 'mac' as a valid authentication code on 'data'.
563 | 
564 | ## Instantiation of Cryptographic Tools
565 | 
566 | In theory any digital signature scheme that allows for blind signing 
567 | and unblinding operations can be used to instantiate our requirements.
568 | However, due to the simplicity of its design we have chosen to only 
569 | support the RSA blind signing modification (RSA-blind) shown in 
570 | [!@Cha83]. We may benefit by adding support for elliptic curve based 
571 | designs in the future to decrease the size of messages in our 
572 | protocol.
573 | 
574 | By choosing RSA-blind we make the following parameter choices:
575 | 
576 | - both encryption and signing keys are 2048-bit RSA keys;
577 | - the SIGN/VERIFY algorithms are RSA without modifications;
578 | - BLIND/UNBLIND follow naturally from the referenced work;
579 | - ENCRYPT/DECRYPT are instantiated with RSA-OAEP;
580 | - MAC is instantiated with HMAC.
581 | 
582 | To mitigate the issues caused by using RSA without modifications we 
583 | make extra structural checks on the tokens that are sent -- this is to 
584 | prevent manipulation of tokens using the homomorphic properties of 
585 | this scheme. Also all tokens must be representable in less than 2048 
586 | bits to prevent problems with wrapping-around the RSA modulus.
587 | 
588 | ## Randomness Sampling
589 | 
590 | Finally we require an ability for the browser plugin to sample 
591 | random values for blinding tokens. Our algorithm can be thought of as:
592 | 
593 | - SAMPLE(seed) --> rand : takes a random seed as input and generates a
594 | value 'rand'.
595 | 
596 | we can instantiate this algorithm using any standard library for 
597 | generating cryptographic randomness. In future notation we may omit  
598 | the seed for ease of exposition.
599 | 
600 | # Browser plugin
601 | 
602 | To participate in the protocol, the client must be using a browser 
603 | with an installed and validated browser plugin. This plugin controls 
604 | the generation, blinding, unblinding, storage and redemption of tokens 
605 | for bypassing challenge pages. The browser plugin can be thought of as
606 | a struct with the following attributes:
607 | 
608 |     struct Plugin {
609 |       var (
610 |         map[string]([]string) tokens;
611 |         map[string](string) signatures;
612 |         map[string](int_b) blindingFactors;
613 |         []string publicKeys;
614 |       )
615 | 
616 |       func (
617 |         parse(string s, string p);
618 |         verifyCert([]byte pk, []byte cert);
619 |         verifySig([]byte pk, []byte s, []byte t);
620 |         generate(int N);
621 |         blind(string t);
622 |         unblind(string t', string s', int_b r);
623 |         store(string pubKey, []string tokens, []string sigs);
624 |         encode(string type, []string data);
625 |         mac([]byte nonce, string s);
626 |         send(string msg);
627 |         pow([]byte randNonce);
628 |       )
629 |     }
630 | 
631 | We implement the struct functions in the following way.
632 | 
633 | - parse(s, p) --> b
634 | 
635 | This function takes strings s, p as output and returns a boolean 'b' 
636 | where "b == true" if p is a valid substring of p. Otherwise "b == 
637 | false".
638 | 
639 | - verifyCert(pubKey, cert) --> b
640 | 
641 | This function takes the bytes of a public verification key 'pubKey' 
642 | and a certificate 'cert' as input and outputs a boolean value b, where 
643 | "b == true" if the signature on 'cert' can be verified correctly and 
644 | the public key on 'cert' is pinned in contained in 
645 | `Plugin.publicKeys`. Otherwise "b == false". The VERIFY() algorithm is
646 | used to ascertain whether the signature is valid over the inputs to 
647 | this function. 
648 | 
649 | Other details on the certificate are also verified in this step (for
650 | example that the expiry date has not elapsed and that the provider is
651 | consistent with the protecting edge).
652 | 
653 | - verifySig(pubKey, s, t) --> b
654 | 
655 | This function takes the bytes of a public verification key 'pubKey', a
656 | token 't' and a signature 's'. It outputs "b == true" if 's' is a 
657 | valid signature on t and "b == false" otherwise. The plugin runs
658 | VERIFY() using all three inputs to get the output b and returns this 
659 | as the output of the function.
660 | 
661 | - generate(N) --> tokens
662 | 
663 | This function takes an integer N as input and outputs an array 
664 | 'tokens' of length N containing. The array is generated by sampling N 
665 | 30-byte nonces is randomly sampled via SAMPLE() and constructing N 
666 | tokens by creating N JSON objects with the "nonce" field set to the 
667 | value of the sampled nonce.
668 | 
669 | - blind(t) --> t'
670 | 
671 | This function takes a token t as input and outputs a blinded token t'.
672 | The function uses SAMPLE() to generate a 256-byte random `int_b` r and 
673 | then runs BLIND(r, t) --> t' and outputs t'. It also sets 
674 |   
675 |     Plugin.blindingFactors[t'] = r 
676 | 
677 | - unblind(t', s', r) --> (t, s)
678 | 
679 | Takes a blinded token t', a valid signature s' for t' and the blinding 
680 | factor r as input and outputs a pair (t, s) where t is the unblinded 
681 | token and s is a valid signature on t. The function uses the algorithm 
682 | UNBLIND(r, t', s') to retrieve (t, s).
683 | 
684 | - store(pubKey, tokens, sigs) 
685 | 
686 | This function does not return anything. It simply sets 
687 | 
688 |     Plugin.tokens[pubKey] = tokens
689 | 
690 | and
691 | 
692 |     Plugin.signatures[tokens[i]] = sigs[i]
693 | 
694 | - encode(type, data)
695 | 
696 | Takes a 'type' string and a base-64 encoded string 'data' as input. 
697 | The 'type' string corresponds to a JSON request (either "JSR" or 
698 | "JRR") and creates a JSON object with the "type" field set 
699 | appropriately and the "contents" field set to be equal to the 'data' 
700 | input.
701 | 
702 | - mac(nonce, s) 
703 | 
704 | Takes 'nonce' in byte form and a string 's' as input. The 'nonce' 
705 | value is used as the key and 's' is the contents to be computed over.
706 | This function runs the algorithm MAC(nonce, s) on the two inputs and
707 | outputs whatever this algorithm outputs.
708 | 
709 | - send(msg)
710 | 
711 | Provides no output, takes a string representation 'msg' as input where 
712 | 'msg' is either a JSR or JRR as input. This function reloads the 
713 | current page in the browser and appends 'msg' in the HTTP request that
714 | is created (either in a header or in the body).
715 | 
716 | - pow(randNonce)
717 | 
718 | Optional method for the plugin. Takes the bytes of a random nonce 
719 | 'randNonce' as input and computes some proof-of-work computation that
720 | is specified by the edge. The output is given as 'out' and is used by 
721 | the client in the following bypass request that is made.
722 | 
723 | ## Pinned public keys
724 | 
725 | The plugin has a list of public keys pinned into it in the string 
726 | array `publicKeys` (the keys are stored as hex strings). This prevents 
727 | a service from handing out unique public keys for each client and thus 
728 | gaining request linkability and a deanonymisation vector on users. 
729 | When an edge provides a certificate for a given public key the plugin 
730 | checks if the key that is contained in the certificate is one of the
731 | pinned keys that it already has stored before continuing.
732 | 
733 | # Token Acquisition Protocol
734 | 
735 | The token acquisition protocol allows a client to acquire signatures
736 | on client-generated tokens that can be redeemed in the future to 
737 | bypass challenge pages. We analyse the protocol with respect to the 
738 | stages that we defined in Figure 1.
739 | 
740 | ## [OriginRequest]
741 | 
742 | This initiation of the protocol is triggered by the OriginRequest 
743 | where the client attempts to access a webpage (for example over HTTP).
744 | For the purposes of our protocol this webpage is edge-protected.
745 | 
746 | ## ChallengePage
747 | 
748 | The edge deems the origin request to come from a client requiring the
749 | showing of a challenge in order to grant access to the protected 
750 | website. The challenge page displays some HTML conveying the explicit
751 | challenge to the client.
752 | 
753 | To participate in accepting tokens for bypassing challenge pages, an 
754 | edge must also append specific `<meta>` tags to the HTML of the page. 
755 | The tags that indicate participation are:
756 | 
757 | ~~~~
758 | <meta name="captcha-bypass" id="captcha-bypass" />
759 | <meta name="chl-cert" id="chl-cert" content="%s" />
760 | ~~~~
761 | 
762 | where '%s' is replaced with a valid certificate on some public key.
763 | 
764 | ## VerifyCertificate
765 | 
766 | When a client is delivered such a page, the installed plugin will run 
767 | the `parse()` function on the HTML and the meta tags above, if this 
768 | function returns true then the plugin inputs the certificate from '%s'
769 | into `verifyCert()` and checks that this also returns true. 
770 | 
771 | ## GenerateTokens + BlindTokens
772 | 
773 | The plugin retrieves the public key 'pubKey' from the verified 
774 | certificate and then checks if `Plugin.tokens[pubKey]` is empty or 
775 | not. 
776 | 
777 | If there are no tokens stored for 'pubKey' the plugin runs 
778 | `generate(N)` to get an array of N 'toks'. It then runs `blind(t)` on 
779 | each token and constructs an array of blinded tokens, 'blindedTokens'. 
780 | The array 'toks' is stored in the 'tokens' map as 
781 |     
782 |     tokens[pubKey] = toks
783 | 
784 | where 'pubKey' is the public key from the certificate.
785 | 
786 | ## SolveChallenge
787 | 
788 | This step involves the client solving the presented challenge. This 
789 | step requires human intervention, for instance as in the way that 
790 | CAPTCHAs are solved.
791 | 
792 | ## [SignRequest]
793 | 
794 | The plugin encodes the array 'blindedTokens' as a string 'content' and 
795 | runs `encode("JSR", content)` to get a JSR request containing this 
796 | data. When the challenge solution is sent to the edge by the client, 
797 | the plugin base-64 encodes the JSR and appends it to the HTTP request 
798 | body using the syntax:
799 | 
800 |     blinded-tokens=<base-64 encoded JSR>
801 | 
802 | ## VerifyChallenge
803 | 
804 | When the edge receives the request with a challenge solution and a JSR
805 | it first checks that the solution provided is correct with respect to 
806 | the initial challenge that was sent.
807 | 
808 | ## SignTokens
809 | 
810 | The edge receives the blinded tokens, checks that the challenge 
811 | solution is valid and then runs SIGN() on each blinded token t'_i from 
812 | the JSR using the private signing key `sig-priv-key` that it owns. The
813 | edge constructs an array 'sigs' from the signatures that are produced 
814 | by the SIGN() algorithm.
815 | 
816 | ## [Response]
817 | 
818 | The edge responds to the client with an array containing the pairs of 
819 | blinded tokens with their respective signatures from the array 'sigs'
820 | using the syntax:
821 | 
822 |     signatures=[<s'_1>,...,<s'_N>]
823 | 
824 | where each `<s'_i>` is a base-64 encoded JWS object containing the 
825 | blinded token that is signed as the payload.
826 | 
827 | ## VerifyingSignatures
828 | 
829 | The client receives the comma-separated signatures from the edge. 
830 | 
831 | Firstly, the plugin runs `verifySig(pubKey, s'_i, t'_i)` for the ith 
832 | received signature `s'_i` where `t'_i` is the blinded token stored in 
833 | the payload and pubKey stored on the original certificate. If each 
834 | invocation of `verifySig()` is successful then the plugin proceeds.
835 | 
836 | ## UnblindTokens
837 | 
838 | Secondly the plugin runs `unblind(t'_i, s'_i, r_i)` where r_i is the 
839 | ith blinding factor stored in `Plugin.blindingFactors`. This function 
840 | outputs the pair `(t_i, s_i)`.
841 | 
842 | ## StoreTokens
843 | 
844 | Finally the plugin checks that:
845 | 
846 |     Plugin.tokens[pubKey][i] = t_i
847 | 
848 | If so, then the plugin runs `store(pubKey, t_i, s_i)` to store the 
849 | token and signature for future use. 
850 | 
851 | # Challenge Bypass Protocol
852 | 
853 | The challenge bypass protocol starts in the same way as the token
854 | acquisition protocol with the client attempting to visit an 
855 | edge-protected origin. The origin returns a challenge page as before
856 | and the client's browser verifies the HTML `meta` tags sent by the 
857 | edge indicate that bypassing a challenge page can happen. The protocol 
858 | deviates after the VerifyCertificate stage if the map 
859 | `Plugin.tokens[pubKey]` is populated by one or more tokens (where 
860 | 'pubKey' is the certified public key as before). 
861 | 
862 | We detail the steps that follow this stage in detailing how a client 
863 | can bypass the challenge.
864 | 
865 | ## ConstructTokenMessage
866 | 
867 | When the client has tokens for being able to bypass challenges the 
868 | browser plugin does the following:
869 | 
870 | - picks the next available token and signature pair (t,sig) for 
871 | 'pubKey' where:
872 | 
873 | ~~~~    
874 | pubKey = sig-pub-key;
875 | ~~~~
876 | 
877 | - encrypts t by computing 
878 | 
879 | ~~~~    
880 | ENCRYPT(id-pub-key, t) --> t-enc;
881 | ~~~~
882 | 
883 | - computes 
884 | 
885 | ~~~~
886 | MAC(t["nonce"], unique-request-data) --> hm
887 | ~~~~
888 | 
889 | where the MAC algorithm is keyed by the "nonce" field on the token and
890 | 'unique-request-data' is some data that is unique to a request 
891 | containing this token;
892 | 
893 | - creates a concatenated string:
894 | 
895 | ~~~~
896 | t-enc || sig || hm
897 | ~~~~
898 | 
899 | and base-64 encodes it to form a base-64 string 'data';
900 | 
901 | ## ProofOfWork
902 | 
903 | This is an optional extension to the protocol that enables the edge to 
904 | specify some proof-of-work (PoW) computation to the client. This is to 
905 | prevent any client from being able to construct many viable-looking,
906 | but invalid, tokens that force the edge into computing a number of 
907 | public-key operations before throwing away the invalid token. If done 
908 | often enough this could lead to a potential DDoS vector on the edge.
909 | By establishing a PoW step this limits the client to only being able 
910 | to redeem tokens when they can answer the PoW.
911 | 
912 | If this step is to be used, the edge specifies an extra header in the 
913 | initial response to the client with the attribute 
914 | "bypass-proof-of-work" and a value "randNonce" that contains a random
915 | nonce that the client uses in answering the PoW. The plugin then 
916 | computes `pow(randNonce)` --> 'out' where 'out' represent the output 
917 | of the computation.
918 | 
919 | ## SendToken
920 | 
921 | - Runs encode("JRR", data) to get a JRR with the "contents" field set 
922 | equal to 'data'
923 | - If ProofOfWork is done, then the plugin appends an extra field to 
924 | the JRR object named "pow" where the value is equal to 'out'.
925 | 
926 | The plugin then reloads the page and sends this JRR as the value of 
927 | the header "challenge-bypass-token".
928 | 
929 | ## VerifyPoW
930 | 
931 | When the edge receives the JRR message that was sent above, if a PoW 
932 | was stipulated then the edge first checks that the value stored in the 
933 | "pow" field is correct for the random nonce that was sent.
934 | 
935 | If not, then the protocol is aborted at this point.
936 | 
937 | ## VerifyTokenMessage
938 | 
939 | The edge decodes the "contents" fields from the received JRR and it 
940 | does the following:
941 | 
942 | - sets a "success" bool to true;
943 | - computes
944 | 
945 | ~~~~
946 | DECRYPT(id-priv-key, t-enc) --> t
947 | ~~~~
948 | 
949 | and checks that t is a JSON object with a "nonce" field, if either 
950 | check fails then set "success" equal to false;
951 | - checks that t has not been redeemed before, otherwise set "success" 
952 | to false;
953 | - computes
954 | 
955 |     VERIFY(sig-pub-key, t, sig) --> b
956 | 
957 | if 'b' is not true then it sets "success" to false;
958 | - retrieves 'edge-request-data' from the request it received and 
959 | computes
960 | 
961 |     MAC(t["nonce"], edge-request-data) --> hm-edge
962 | 
963 | and checks that hm == hm-edge, if not "it sets "success" to false.
964 | 
965 | If "success" is still true, then the edge marks the bypass request as 
966 | successful and continues. 
967 | 
968 | ## GetOrigin + [Response]
969 | 
970 | If the verification process was successful. The edge gets a response 
971 | from the origin that corresponds to the original request in 
972 | {OriginRequest} from the client. The edge then sends this response 
973 | directly back to the client.
974 | 
975 | This allows the client access to the resource they tried to gain 
976 | access to. 
977 | 
978 | 
979 | <reference anchor='Cha83' target='http://sceweb.sce.uhcl.edu/yang/teaching/csci5234WebSecurityFall2011/Chaum-blind-signatures.PDF'>
980 |   <front>
981 |    <title>Blind Signatures For Untraceable Payments</title>
982 |    <author initials='D.' surname='Chaum' fullname='David Chaum'></author>
983 |    <date year='1983' />
984 |   </front>
985 | </reference>
986 | 
987 | <reference anchor='ABHL03' target='https://www.cs.cmu.edu/~mblum/research/pdf/captcha.pdf'>
988 |   <front>
989 |    <title>CAPTCHA: Using Hard AI Problems For Security</title>
990 |    <author initials='L.' surname='von Ahn' fullname='Luis von Ahn'></author>
991 |    <author initials='M.' surname='Blum' fullname='Manuel Blum'></author>
992 |    <author initials='N. J.' surname='Hopper' fullname='Nicholas J. Hopper'></author>
993 |    <author initials='J.' surname='Langford' fullname='John Langford'></author>
994 |    <date year='2003' />
995 |   </front>
996 | </reference>
997 | 
998 | {backmatter}


--------------------------------------------------------------------------------
/captcha-bypass-internet-draft.txt:
--------------------------------------------------------------------------------
   1 | 
   2 | 
   3 | 
   4 | 
   5 | Network Working Group                                        A. Davidson
   6 | Internet-Draft                      Royal Holloway, University of London
   7 | Intended status: Informational                               N. Sullivan
   8 | Expires: March 17, 2017                                       Cloudflare
   9 |                                                            G. Tankersley
  10 |                                                               Cloudflare
  11 |                                                              F. Valsorda
  12 |                                                               Cloudflare
  13 |                                                       September 13, 2016
  14 | 
  15 | 
  16 |   Protocol for bypassing challenge pages using RSA blind signed tokens
  17 |                    draft-protocol-challenge-bypass-00
  18 | 
  19 | Abstract
  20 | 
  21 |    This document proposes a protocol for bypassing challenge pages (such
  22 |    as forms requiring CAPTCHA submissions) that are served by edge
  23 |    services in order to protect origin websites.  A client is required
  24 |    to complete an initial challenge and is then granted signed tokens
  25 |    which can be redeemed in the future to bypass challenges and thus
  26 |    meaning that honest users undergo less manual computation.  The
  27 |    signed tokens are cryptographically unlinkable to prevent future
  28 |    requests being linked to the original signed set of tokens.
  29 | 
  30 | Status of This Memo
  31 | 
  32 |    This Internet-Draft is submitted in full conformance with the
  33 |    provisions of BCP 78 and BCP 79.
  34 | 
  35 |    Internet-Drafts are working documents of the Internet Engineering
  36 |    Task Force (IETF).  Note that other groups may also distribute
  37 |    working documents as Internet-Drafts.  The list of current Internet-
  38 |    Drafts is at http://datatracker.ietf.org/drafts/current/.
  39 | 
  40 |    Internet-Drafts are draft documents valid for a maximum of six months
  41 |    and may be updated, replaced, or obsoleted by other documents at any
  42 |    time.  It is inappropriate to use Internet-Drafts as reference
  43 |    material or to cite them other than as "work in progress."
  44 | 
  45 |    This Internet-Draft will expire on March 17, 2017.
  46 | 
  47 | Copyright Notice
  48 | 
  49 |    Copyright (c) 2016 IETF Trust and the persons identified as the
  50 |    document authors.  All rights reserved.
  51 | 
  52 | 
  53 | 
  54 | 
  55 | 
  56 | Davidson, et al.         Expires March 17, 2017                 [Page 1]
  57 | 
  58 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
  59 | 
  60 | 
  61 |    This document is subject to BCP 78 and the IETF Trust's Legal
  62 |    Provisions Relating to IETF Documents
  63 |    (http://trustee.ietf.org/license-info) in effect on the date of
  64 |    publication of this document.  Please review these documents
  65 |    carefully, as they describe your rights and restrictions with respect
  66 |    to this document.  Code Components extracted from this document must
  67 |    include Simplified BSD License text as described in Section 4.e of
  68 |    the Trust Legal Provisions and are provided without warranty as
  69 |    described in the Simplified BSD License.
  70 | 
  71 | Table of Contents
  72 | 
  73 |    1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
  74 |      1.1.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   4
  75 |    2.  Protocol Overview . . . . . . . . . . . . . . . . . . . . . .   5
  76 |      2.1.  Acquiring Signed Tokens . . . . . . . . . . . . . . . . .   5
  77 |      2.2.  Redeeming Tokens  . . . . . . . . . . . . . . . . . . . .   7
  78 |    3.  Preliminaries . . . . . . . . . . . . . . . . . . . . . . . .   8
  79 |      3.1.  Protocol Communication  . . . . . . . . . . . . . . . . .   8
  80 |      3.2.  Design Formation  . . . . . . . . . . . . . . . . . . . .   8
  81 |        3.2.1.  Variables and Functions . . . . . . . . . . . . . . .   9
  82 |        3.2.2.  Structs . . . . . . . . . . . . . . . . . . . . . . .   9
  83 |        3.2.3.  JSON Objects  . . . . . . . . . . . . . . . . . . . .  10
  84 |      3.3.  Data Formatting . . . . . . . . . . . . . . . . . . . . .  10
  85 |        3.3.1.  Tokens  . . . . . . . . . . . . . . . . . . . . . . .  10
  86 |        3.3.2.  Client-Edge Message Format  . . . . . . . . . . . . .  11
  87 |        3.3.3.  Edge-Client Message Format  . . . . . . . . . . . . .  12
  88 |        3.3.4.  Signature Transport Format  . . . . . . . . . . . . .  12
  89 |        3.3.5.  Certificate Transport Format  . . . . . . . . . . . .  12
  90 |    4.  Cryptographic Tools . . . . . . . . . . . . . . . . . . . . .  12
  91 |      4.1.  Keys  . . . . . . . . . . . . . . . . . . . . . . . . . .  12
  92 |      4.2.  Signing/Verifying Algorithms  . . . . . . . . . . . . . .  13
  93 |      4.3.  Blinding/Unblinding Algorithms  . . . . . . . . . . . . .  13
  94 |      4.4.  Encryption/Decryption Algorithms  . . . . . . . . . . . .  13
  95 |      4.5.  MAC Algorithm . . . . . . . . . . . . . . . . . . . . . .  13
  96 |      4.6.  Instantiation of Cryptographic Tools  . . . . . . . . . .  14
  97 |      4.7.  Randomness Sampling . . . . . . . . . . . . . . . . . . .  14
  98 |    5.  Browser plugin  . . . . . . . . . . . . . . . . . . . . . . .  14
  99 |      5.1.  Pinned public keys  . . . . . . . . . . . . . . . . . . .  17
 100 |    6.  Token Acquisition Protocol  . . . . . . . . . . . . . . . . .  17
 101 |      6.1.  [OriginRequest] . . . . . . . . . . . . . . . . . . . . .  17
 102 |      6.2.  ChallengePage . . . . . . . . . . . . . . . . . . . . . .  18
 103 |      6.3.  VerifyCertificate . . . . . . . . . . . . . . . . . . . .  18
 104 |      6.4.  GenerateTokens + BlindTokens  . . . . . . . . . . . . . .  18
 105 |      6.5.  SolveChallenge  . . . . . . . . . . . . . . . . . . . . .  18
 106 |      6.6.  [SignRequest] . . . . . . . . . . . . . . . . . . . . . .  19
 107 |      6.7.  SignTokens  . . . . . . . . . . . . . . . . . . . . . . .  19
 108 |      6.8.  [Response]  . . . . . . . . . . . . . . . . . . . . . . .  19
 109 | 
 110 | 
 111 | 
 112 | Davidson, et al.         Expires March 17, 2017                 [Page 2]
 113 | 
 114 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
 115 | 
 116 | 
 117 |      6.9.  VerifyingSignatures . . . . . . . . . . . . . . . . . . .  19
 118 |      6.10. UnblindTokens . . . . . . . . . . . . . . . . . . . . . .  20
 119 |      6.11. StoreTokens . . . . . . . . . . . . . . . . . . . . . . .  20
 120 |    7.  Challenge Bypass Protocol . . . . . . . . . . . . . . . . . .  20
 121 |      7.1.  ConstructTokenMessage . . . . . . . . . . . . . . . . . .  20
 122 |      7.2.  ProofOfWork . . . . . . . . . . . . . . . . . . . . . . .  21
 123 |      7.3.  SendToken . . . . . . . . . . . . . . . . . . . . . . . .  21
 124 |      7.4.  VerifyPoW . . . . . . . . . . . . . . . . . . . . . . . .  21
 125 |      7.5.  VerifyTokenMessage  . . . . . . . . . . . . . . . . . . .  22
 126 |      7.6.  GetOrigin + [Response]  . . . . . . . . . . . . . . . . .  22
 127 |    8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  22
 128 |      8.1.  Normative References  . . . . . . . . . . . . . . . . . .  22
 129 |      8.2.  Informative References  . . . . . . . . . . . . . . . . .  23
 130 |    Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  23
 131 | 
 132 | 1.  Introduction
 133 | 
 134 |    Various challenge pages are used to distinguish human access to a
 135 |    website from automated access, with the intention of preventing
 136 |    malicious behaviour that could compromise the website that is being
 137 |    hosted.  CAPTCHAs ("Completely Automated Public Turing test to tell
 138 |    Computers and Humans Apart") [ABHL03] are one of the most widely used
 139 |    methods for distinguishing human access to a resource from automated
 140 |    access.  CAPTCHAs are regularly deployed as "interstitial" pages
 141 |    forcing a user to answer the CAPTCHA before access is given to a
 142 |    website that was requested by the user.  This is used to prevent
 143 |    malicious access by automated processes that can adversely affect the
 144 |    performance of the website itself.  While these 'challenges' succeed
 145 |    in their mission, they create noticeably more work for honest users
 146 |    who have to complete them.
 147 | 
 148 |    These challenge pages are commonly served by CDNs who offer security
 149 |    services to customers.  Companies like Cloudflare offer customers the
 150 |    ability to serve CAPTCHA pages (often using Google's ReCAPTCHA
 151 |    service) to any IP addresses requesting a protected resource where
 152 |    the IP is deemed to have a "bad reputation".  IP reputation scoring
 153 |    comes from varied sources and is based on whether any malicious
 154 |    activity (such as spamming and/or abuse) is detected as originating
 155 |    from the IP in question.
 156 | 
 157 |    Services such as Tor suffer dramatically under such reputation-based
 158 |    systems.  Users are assigned to one of a small number of exit nodes
 159 |    when accessing webpages through Tor and appear to be browsing with
 160 |    the IP of that node.  The IP addresses of these nodes are frequently
 161 |    associated with malicious and abusive behaviour and are thus assigned
 162 |    poor reputation scores.  This problem is not specific to Tor; VPNs,
 163 |    I2P, and internet users behind large-scale NAT installations are
 164 |    affected similarly.
 165 | 
 166 | 
 167 | 
 168 | 
 169 | Davidson, et al.         Expires March 17, 2017                 [Page 3]
 170 | 
 171 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
 172 | 
 173 | 
 174 |    The end result is that honest users of these services are forced to
 175 |    complete many challenge pages in order to access content protected by
 176 |    edge service providers such as Cloudflare, and the problem is
 177 |    exacerbated by the fact that these companies offer services to a wide
 178 |    range of popular websites.  This results in a huge increase in
 179 |    workload for average Tor users in spite of their non-malicious
 180 |    nature.
 181 | 
 182 |    Further problems arise for users who choose not to enable JavaScript
 183 |    in their browsers since they are served with challenges that are
 184 |    rapidly deteriorating to the point where a large proportion of
 185 |    challenges are too hard to be solved.
 186 | 
 187 |    Currently some edge providers (e.g.  Cloudflare) attempt to solve
 188 |    this problem by providing cookies that enable access to protected
 189 |    resources once a CAPTCHA has been solved.  There are two problems
 190 |    with this method: first, that when a new Tor circuit is constructed
 191 |    the cookie is rendered useless; and secondly, that setting cookies
 192 |    across many domains controlled by the same CDN could lead to
 193 |    deanonymisation attacks outside the current Tor Browser threat model.
 194 |    As such a new solution to this problem is needed.
 195 | 
 196 |    In this document we detail a protocol that enables a user to complete
 197 |    a single edge-served challenge page in return for a finite number of
 198 |    signed tokens.  These tokens can then be used to bypass future
 199 |    challenge pages that are served by participating edge-providers.  The
 200 |    tokens are generated in such a way that signed tokens cannot be
 201 |    linked to future redeemed tokens for bypassing.  We achieve this
 202 |    using the RSA blind signature scheme first presented by David Chaum
 203 |    [Cha83].
 204 | 
 205 | 1.1.  Terminology
 206 | 
 207 |    The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
 208 |    SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
 209 |    document, are to be interpreted as described in [RFC2119].
 210 | 
 211 |    The following terms are used:
 212 | 
 213 |    edge: A serving endpoint that provides access to a protected origin.
 214 | 
 215 |    client: The endpoint attempting to access an edge-protected service.
 216 | 
 217 |    origin: The endpoint where web content is stored.
 218 | 
 219 |    edge-protected: Term for origins that pay the edge to provide
 220 |    protection services for their domain.
 221 | 
 222 | 
 223 | 
 224 | 
 225 | Davidson, et al.         Expires March 17, 2017                 [Page 4]
 226 | 
 227 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
 228 | 
 229 | 
 230 |    endpoint: Points where requests and responses are dealt with.
 231 | 
 232 |    browser: A program ran by the client that provides access to
 233 |    webpages.
 234 | 
 235 |    plugin: An installed service that runs in the client's browser.
 236 | 
 237 |    tokens: JSON structures that are generated by the plugin in the
 238 |    client's browser for future redemption.
 239 | 
 240 |    blinding: An operation that "hides" the contents of the token while
 241 |    still allowing the underlying token to be cryptographically signed.
 242 | 
 243 |    unblinding: The reverse procedure of blinding.  Recovers a token and
 244 |    (if signed) a valid signature on the token.
 245 | 
 246 |    challenge answer: Generated when submitting a response to a given
 247 |    challenge.
 248 | 
 249 |    challenge page: A page generated by the edge for the client.  The
 250 |    client must answer a challenge on the page correctly and return it to
 251 |    gain access to a particular resource.
 252 | 
 253 |    nonce: A randomly sampled value that is used for generating unique
 254 |    tokens.
 255 | 
 256 | 2.  Protocol Overview
 257 | 
 258 |    Our protocols are initiated when a client is presented with a
 259 |    challenge page that contains additional information indicating that
 260 |    the edge service accepts tokens for bypassing the challenge.  This
 261 |    can be indicated in the HTML of the page as a meta tag along with a
 262 |    certificate advising which public key the edge is currently using.
 263 |    Two separate protocols exist for when the client has no signed tokens
 264 |    available to it and secondly for when the client already has tokens.
 265 | 
 266 |    Both protocols require essentially four rounds of communication.  We
 267 |    take into account the initial request and response when the client
 268 |    attempts to visit an edge-protected origin and is served a challenge
 269 |    page instead.
 270 | 
 271 | 2.1.  Acquiring Signed Tokens
 272 | 
 273 | 
 274 | 
 275 | 
 276 | 
 277 | 
 278 | 
 279 | 
 280 | 
 281 | Davidson, et al.         Expires March 17, 2017                 [Page 5]
 282 | 
 283 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
 284 | 
 285 | 
 286 |             Client                                            Edge
 287 | 
 288 |             [OriginRequest]         ------->
 289 |                                                      ChallengePage  ^
 290 |                                                       + bypass_tag  |
 291 |                                                     + sig_key_cert  |
 292 |                                     <-------            [Response]  v
 293 |          ^  VerifyCertificate
 294 |          |  GenerateTokens
 295 |          v  BlindTokens
 296 |          ^  SolveChallenge
 297 |          v  [SignRequest]           ------->
 298 |                                                    VerifyChallenge  ^
 299 |                                                         SignTokens  |
 300 |                                     <-------            [Response]  v
 301 |          ^  VerifySigs
 302 |          |  UnblindTokens
 303 |          |  StoreTokens
 304 |          v  [Finished]
 305 | 
 306 | 
 307 |           Figure 1: Full message flow for acquiring signed tokens
 308 | 
 309 |    When a client attempts to visit an edge-protected origin the edge can
 310 |    indicate that it accepts tokens for bypassing a challenge page in
 311 |    exchange as well as presenting a certificate corresponding to their
 312 |    current signing key.  In this event the client does the following:
 313 | 
 314 |    o  checks that the certificate is valid and that the signature
 315 |       verifies correctly;
 316 | 
 317 |    o  checks that it is aware of the public key provided (e.g. that the
 318 |       key is pinned in the plugin);
 319 | 
 320 |    o  generates N tokens and blinds them;
 321 | 
 322 |    o  sends the tokens to the edge along with an answer to the
 323 |       challenge.
 324 | 
 325 |    In practice N =< 100 so as not put too much work on the browser,
 326 |    limiting to this number also mitigates DDoS potential.  After
 327 |    receiving the tokens and the answer to the challenge the edge does
 328 |    the following:
 329 | 
 330 |    o  checks that the answer is correct;
 331 | 
 332 |    o  if this is the case then it signs the tokens and returns the
 333 |       signatures to the client.
 334 | 
 335 | 
 336 | 
 337 | Davidson, et al.         Expires March 17, 2017                 [Page 6]
 338 | 
 339 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
 340 | 
 341 | 
 342 |    The client does not immediately get access to the origin, though this
 343 |    can be achieved if the client immediately reloads the page and
 344 |    redeems a token using the process below.  The client participates in
 345 |    some final post-processing:
 346 | 
 347 |    o  they check that the signatures verify correctly with respect to
 348 |       the pinned public key and the blinded token;
 349 | 
 350 |    o  they unblind the token and signature pair to get a new pair of the
 351 |       original token and a valid sigature;
 352 | 
 353 |    o  they finally store the pair in their browser plugin for future
 354 |       use.
 355 | 
 356 | 2.2.  Redeeming Tokens
 357 | 
 358 | 
 359 |             Client                                            Edge
 360 | 
 361 |             [OriginRequest]          ------>
 362 |                                                      ChallengePage  ^
 363 |                                                       + bypass_tag  |
 364 |                                                     + sig_key_cert  |
 365 |                                      <------            [Response]  v
 366 |          ^  VerifyCertificate
 367 |          |  ConstructTokenMessage
 368 |          |  ProofOfWork*
 369 |          v  [SendToken]              ------>
 370 |                                                         VerifyPoW*  ^
 371 |                                                 VerifyTokenMessage  |
 372 |                                                          GetOrigin  |
 373 |                                      <------            [Response]  v
 374 |             [Finished]
 375 | 
 376 | 
 377 |              Figure 2: Full message flow for redeeming tokens
 378 | 
 379 |    o  - optional extensions to the protocol
 380 | 
 381 |    As before the client attempts to visit an edge-protected website and
 382 |    is faced with a challenge page.  If the edge accepts tokens and
 383 |    provides a certificate that corresponds to a key that the client has
 384 |    pinned in their plugin and they have tokens signed by the counterpart
 385 |    private key then the client can attempt to bypass the prospective
 386 |    challenge page.  This process is as follows:
 387 | 
 388 |    o  client constructs and sends a message containing:
 389 | 
 390 | 
 391 | 
 392 | 
 393 | Davidson, et al.         Expires March 17, 2017                 [Page 7]
 394 | 
 395 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
 396 | 
 397 | 
 398 |       *  an encrypted, unused token;
 399 | 
 400 |       *  a valid signature for the token;
 401 | 
 402 |       *  a HMAC, keyed by the token and computed over message
 403 |          identifying information;
 404 | 
 405 |    o  edge receives the message and performs the following checks:
 406 | 
 407 |       *  decrypts the token and checks that it resembles an agreed
 408 |          structure, else the protocol is aborted;
 409 | 
 410 |       *  checks if the token has already been used, if so the protocol
 411 |          is aborted;
 412 | 
 413 |       *  verifies the signature on the unencrypted token;
 414 | 
 415 |       *  validates the HMAC using the token as a key and unique message
 416 |          information as input;
 417 | 
 418 |    o  If all checks pass the edge allows the client access to the
 419 |       originally requested origin.
 420 | 
 421 | 3.  Preliminaries
 422 | 
 423 | 3.1.  Protocol Communication
 424 | 
 425 |    We assume that our protocol is carried out over HTTP.  This is a
 426 |    natural choice for the medium of communication given that the
 427 |    protocol is initiated by a client who is accessing a URI over the
 428 |    internet.  Due to this assumption we may refer to messages between
 429 |    the client and the edge as HTTP requests and responses respectively.
 430 |    This also helps us to elaborate on particulars of the protocol that
 431 |    are intrinsically linked to this method of communication.
 432 | 
 433 |    However, while the original intention of this protocol is for
 434 |    bypassing challenge pages over HTTP we encourage usage of the idea to
 435 |    any scenario where the receiving of unlinkable "currency" is an
 436 |    appropriate reward for completing some pre-defined challenge.  The
 437 |    message format of the protocol is not strictly required to be HTTP as
 438 |    long as no structural changes are made to the messages that are sent.
 439 | 
 440 | 3.2.  Design Formation
 441 | 
 442 |    To explain the concepts in our design we will use a variety of
 443 |    structures that are most easily exposed using an easily readable code
 444 |    syntax.
 445 | 
 446 | 
 447 | 
 448 | 
 449 | Davidson, et al.         Expires March 17, 2017                 [Page 8]
 450 | 
 451 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
 452 | 
 453 | 
 454 | 3.2.1.  Variables and Functions
 455 | 
 456 |    Variables and functions follow the syntax of C-like languages, such
 457 |    as:
 458 | 
 459 |                               int apple = 7;
 460 | 
 461 |    where "int" is the type of the variable 'apple' and 7 is the value
 462 |    assigned to it.  All types are self-explanatory and follow convention
 463 |    apart from:
 464 | 
 465 |    o  "int_b": Used for large integers when undergoing cryptographic
 466 |       operations in large groups.
 467 | 
 468 |    o  All arrays will be denoted by a set of square brackets followed by
 469 |       the type of data that is contained.  For example an array of
 470 |       strings will be described as: "[]string"
 471 | 
 472 |    o  We call key-value stores 'maps' and define them as:
 473 |       "map[type_1](type_2)"
 474 | 
 475 |    where type_1 is the type of the keys for the map (e.g. string) and
 476 |    type_2 is the type of the stored values (e.g. int).
 477 | 
 478 |    We avoid type declaration when defining functions in favour of a
 479 |    textual explanation.
 480 | 
 481 | 3.2.2.  Structs
 482 | 
 483 |    We use structs to define a closed ecosystem (similar to an object)
 484 |    with a list of variables and functions that define the struct.  We
 485 |    describe structs using the syntax:
 486 | 
 487 |                   struct Person {
 488 |                     var (
 489 |                       string name;
 490 |                       int age;
 491 |                       map[string](string) emailAddresses;
 492 |                     );
 493 | 
 494 |                     func (
 495 |                       setAge(n int);
 496 |                       changeName(name string);
 497 |                       addEmail(email string);
 498 |                     )
 499 |                   }
 500 | 
 501 | 
 502 | 
 503 | 
 504 | 
 505 | Davidson, et al.         Expires March 17, 2017                 [Page 9]
 506 | 
 507 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
 508 | 
 509 | 
 510 |    this gives us an interface with which we can interact with the
 511 |    struct, allowing us to store and access data with respect to this
 512 |    definition.  Here, "vars" defines a list of variables stored on the
 513 |    struct, while "func" defines a list of functions that require
 514 |    implementation.
 515 | 
 516 | 3.2.3.  JSON Objects
 517 | 
 518 |    We use JSON objects for representing tokens and for constructing
 519 |    messages for sending from the client to the edge.  We define our JSON
 520 |    structures as:
 521 | 
 522 |                            {
 523 |                              "key_1":"[value_1]"
 524 |                              "key_2":"[value_2]"
 525 |                                    .
 526 |                                    .
 527 |                                    .
 528 |                              "key_n":"[value_n]"
 529 |                            }
 530 | 
 531 |    where each ""key_i"" is a key value, key_i is marshaled as a string
 532 |    but can be any built-in type.  Likewise ""[value_i]"" represents the
 533 |    corresponding value for ""key_i"", "value_i" is typically encoded as
 534 |    a string in either hexadecimal or base64 encoding.  We assume that
 535 |    all JSON is accessible using a map-interface where, if data is a JSON
 536 |    object, then "data[key_1]" returns "value_1".
 537 | 
 538 | 3.3.  Data Formatting
 539 | 
 540 |    This section deals with the formatting of the different data types
 541 |    that are required in our protocol.  This will cover how tokens should
 542 |    be formatted and how messages between the client and the edge should
 543 |    be structured.
 544 | 
 545 | 3.3.1.  Tokens
 546 | 
 547 |    Tokens are JSON-like structures containing a single nonce field, i.e.
 548 | 
 549 |                          {
 550 |                            "nonce":"[nonce_value]"
 551 |                          }
 552 | 
 553 |    where [nonce_value] is a base64 encoded 32-byte sequence of
 554 |    cryptographically random bytes.
 555 | 
 556 | 
 557 | 
 558 | 
 559 | 
 560 | 
 561 | Davidson, et al.         Expires March 17, 2017                [Page 10]
 562 | 
 563 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
 564 | 
 565 | 
 566 | 3.3.2.  Client-Edge Message Format
 567 | 
 568 |    The messages that the client sends to the edge after being served a
 569 |    challenge page (i.e. in the third round of communication) are written
 570 |    as JSON structures.  These messages designate the type of operation
 571 |    that is required of the edge.  All messages below are base64 encoded
 572 |    before they are sent.
 573 | 
 574 |    The messages are heavily defined by the HTTP protocol that the
 575 |    client-edge interaction takes place over.  For example, the signing
 576 |    messages detailed below are included in the body of a HTTP request
 577 |    due to artificial limits placed on HTTP header field sizes by web
 578 |    servers.  Likewise the redemption messages are significantly smaller
 579 |    and are thus included in a header.  This difference in transport
 580 |    architecture leads to differences in the message formats shown below.
 581 | 
 582 | 3.3.2.1.  Signing
 583 | 
 584 |    In the first protocol, when the client sends an answer to the given
 585 |    challenge page they can also append a JSON object to the of the form:
 586 | 
 587 |                  {
 588 |                    "type":"Signing",
 589 |                    "contents":"[t'_1],[t'_2], ...,[t'_N]"
 590 |                  }
 591 | 
 592 |    where [t'_i] is a generated token that has been subsequently blinded.
 593 |    We call such a JSON object a 'JSON signing request' (JSR).
 594 | 
 595 |    After base64 encoding is done the final message is sent to the edge
 596 |    as:
 597 | 
 598 |                    blinded-tokens=[base64 encoded JSR]
 599 | 
 600 | 3.3.2.2.  Redeeming
 601 | 
 602 |    In the second protocol when the client attempts to bypass a challenge
 603 |    they send a message containing a JSON object of the form:
 604 | 
 605 |            {
 606 |              "type":"Redeem",
 607 |              "contents":"[<encrypted_token>,<signature>,<HMAC>]"
 608 |            }
 609 | 
 610 |    where the token that is encrypted has been since unblinded.  Such an
 611 |    object is known as a 'JSON redemption request' (JRR).
 612 | 
 613 | 
 614 | 
 615 | 
 616 | 
 617 | Davidson, et al.         Expires March 17, 2017                [Page 11]
 618 | 
 619 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
 620 | 
 621 | 
 622 | 3.3.3.  Edge-Client Message Format
 623 | 
 624 |    The messages returned by the edge to the client are much more heavily
 625 |    defined by the messaging protocol being used to communicate.  For
 626 |    example, in the redemption protocol the server merely serves content
 627 |    from the origin in the event that the token that is redeemed verifies
 628 |    correctly.
 629 | 
 630 |    In the first protocol however the edge also returns a comma-separated
 631 |    list:
 632 | 
 633 |                     signatures=[s'_1],[s'_2],...,[s'_N]
 634 | 
 635 |    where [s'_i] is a signature computed by the edge over the blinded
 636 |    token [t'_i] that it received along with a response to the challenge
 637 |    page that was sent.
 638 | 
 639 | 3.3.4.  Signature Transport Format
 640 | 
 641 |    Signatures are sent between the client and the edge using the JWS
 642 |    format defined in [RFC7515].  The token that is signed is stored as
 643 |    the payload on the JWS object - thus when carrying out unblinding on
 644 |    the signature the payload must also be updated.
 645 | 
 646 | 3.3.5.  Certificate Transport Format
 647 | 
 648 |    Certificates can be transported via any standardised method for
 649 |    encoding a certificate (e.g.  X.509v3 [RFC5280]).
 650 | 
 651 | 4.  Cryptographic Tools
 652 | 
 653 |    To instantiate the protocols above we require a set of tools that
 654 |    allows either participant to perform cryptographic operations over
 655 |    data.  In this section we detail the materials and the algorithms
 656 |    that are required in order to compute these operations.
 657 | 
 658 | 4.1.  Keys
 659 | 
 660 |    Our protocol requires two key pairs:
 661 | 
 662 |    o  edge identity keys (id-pub-key, id-priv-key): Used for performing
 663 |       the encryption and decryption required on the token that is sent
 664 |       for redemption;
 665 | 
 666 |    o  edge signing keys (sign-pub-key, sign-priv-key): Used for
 667 |       performing the signing and verification of signatures;
 668 | 
 669 |    o  A symmetric MAC key derived from the 'nonce' field on a token.
 670 | 
 671 | 
 672 | 
 673 | Davidson, et al.         Expires March 17, 2017                [Page 12]
 674 | 
 675 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
 676 | 
 677 | 
 678 |    The edge holds both key pairs and the plugin in the client's browser
 679 |    has the public keys.  The MAC key is derived at the time of messaging
 680 |    and is shared by both parties.
 681 | 
 682 | 4.2.  Signing/Verifying Algorithms
 683 | 
 684 |    o  SIGN(sign-priv-key, data) --> sig : takes a private signing key
 685 |       and some 'data' and returns a valid signature 'sig' on 'data'.
 686 | 
 687 |    o  VERIFY(sign-pub-key, data, sig) --> 'good'/'bad' : takes the a
 688 |       public verification key, some 'data' and a signature 'sig' and
 689 |       outputs 'good' if 'sig' is a valid signature on 'data'.  Otherwise
 690 |       it outputs 'bad'.
 691 | 
 692 | 4.3.  Blinding/Unblinding Algorithms
 693 | 
 694 |    o  BLIND(blinding-factor, data) --> blind-data : takes a randomly
 695 |       sampled 'blinding-factor' and some 'data' and outputs 'blind-data'
 696 |       that is computationally unlinkable from 'data'.
 697 | 
 698 |    o  UNBLIND(blinding-factor, blind-data, blind-sig) --> (data, sig) :
 699 |       takes 'blind-data' and the randomly sampled 'blinding-factor' used
 700 |       to generate it, along with an optional parameter for a valid
 701 |       signature 'blind-sig' computed over 'blind-data' as input.
 702 |       Outputs 'data' and 'sig' where 'data' is the unblinded counterpart
 703 |       to 'blind-data' and 'sig' is a valid signature on 'data'.
 704 | 
 705 | 4.4.  Encryption/Decryption Algorithms
 706 | 
 707 |    o  ENCRYPT(id-pub-key, plaintext) --> ciphertext : takes a public
 708 |       encryption key and a 'plaintext' as input and outputs an encrypted
 709 |       'ciphertext'.
 710 | 
 711 |    o  DECRYPT(id-priv-key, ciphertext) --> plaintext : takes a private
 712 |       decryption key and an encrypted 'ciphertext' as input and outputs
 713 |       a 'plaintext'.
 714 | 
 715 | 4.5.  MAC Algorithm
 716 | 
 717 |    Our MAC algorithm has the following specification:
 718 | 
 719 |    o  MAC(mac-key, data) --> mac : takes a symmetric mac-key and 'data'
 720 |       as input and outputs 'mac' as a valid authentication code on
 721 |       'data'.
 722 | 
 723 | 
 724 | 
 725 | 
 726 | 
 727 | 
 728 | 
 729 | Davidson, et al.         Expires March 17, 2017                [Page 13]
 730 | 
 731 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
 732 | 
 733 | 
 734 | 4.6.  Instantiation of Cryptographic Tools
 735 | 
 736 |    In theory any digital signature scheme that allows for blind signing
 737 |    and unblinding operations can be used to instantiate our
 738 |    requirements.  However, due to the simplicity of its design we have
 739 |    chosen to only support the RSA blind signing modification (RSA-blind)
 740 |    shown in [![Cha83]].  We may benefit by adding support for elliptic
 741 |    curve based designs in the future to decrease the size of messages in
 742 |    our protocol.
 743 | 
 744 |    By choosing RSA-blind we make the following parameter choices:
 745 | 
 746 |    o  both encryption and signing keys are 2048-bit RSA keys;
 747 | 
 748 |    o  the SIGN/VERIFY algorithm is FDH-RSA to support binding;
 749 | 
 750 |    o  BLIND/UNBLIND follow naturally from the referenced work;
 751 | 
 752 |    o  ENCRYPT/DECRYPT are instantiated with RSA-OAEP;
 753 | 
 754 |    o  MAC is instantiated with HMAC.
 755 | 
 756 | 
 757 | 4.7.  Randomness Sampling
 758 | 
 759 |    Finally we require an ability for the browser plugin to sample random
 760 |    values for blinding tokens.  Our algorithm can be thought of as:
 761 | 
 762 |    o  SAMPLE(seed) --> rand : takes a random seed as input and generates
 763 |       a value 'rand'.
 764 | 
 765 |    we can instantiate this algorithm using any standard library for
 766 |    generating cryptographic randomness.  In future notation we may omit
 767 |    the seed for ease of exposition.
 768 | 
 769 |    Random numbers used as blinding factors must be sampled from the full
 770 |    domain allowed by the chosen RSA parameters [BNPS01].
 771 | 
 772 | 5.  Browser plugin
 773 | 
 774 |    To participate in the protocol, the client must be using a browser
 775 |    with an installed and validated browser plugin.  This plugin controls
 776 |    the generation, blinding, unblinding, storage and redemption of
 777 |    tokens for bypassing challenge pages.  The browser plugin can be
 778 |    thought of as a struct with the following attributes:
 779 | 
 780 | 
 781 | 
 782 | 
 783 | Davidson, et al.         Expires March 17, 2017                [Page 14]
 784 | 
 785 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
 786 | 
 787 | 
 788 |          struct Plugin {
 789 |            var (
 790 |              map[string]([]string) tokens;
 791 |              map[string](string) signatures;
 792 |              map[string](int_b) blindingFactors;
 793 |              []string publicKeys;
 794 |            )
 795 | 
 796 |            func (
 797 |              parse(string s, string p);
 798 |              verifyCert([]byte pk, []byte cert);
 799 |              verifySig([]byte pk, []byte s, []byte t);
 800 |              generate(int N);
 801 |              blind(string t);
 802 |              unblind(string t', string s', int_b r);
 803 |              store(string pubKey, []string tokens, []string sigs);
 804 |              encode(string type, []string data);
 805 |              mac([]byte nonce, string s);
 806 |              send(string msg);
 807 |              pow([]byte randNonce);
 808 |            )
 809 |          }
 810 | 
 811 |    We implement the struct functions in the following way.
 812 | 
 813 |    o  parse(s, p) --> b
 814 | 
 815 |    This function takes strings s, p as output and returns a boolean 'b'
 816 |    where "b == true" if p is a valid substring of p.  Otherwise "b ==
 817 |    false".
 818 | 
 819 |    o  verifyCert(pubKey, cert) --> b
 820 | 
 821 |    This function takes the bytes of a public verification key 'pubKey'
 822 |    and a certificate 'cert' as input and outputs a boolean value b,
 823 |    where "b == true" if the signature on 'cert' can be verified
 824 |    correctly and the public key on 'cert' is pinned in contained in
 825 |    "Plugin.publicKeys".  Otherwise "b == false".  The VERIFY() algorithm
 826 |    is used to ascertain whether the signature is valid over the inputs
 827 |    to this function.
 828 | 
 829 |    Other details on the certificate are also verified in this step (for
 830 |    example that the expiry date has not elapsed and that the provider is
 831 |    consistent with the protecting edge).
 832 | 
 833 |    o  verifySig(pubKey, s, t) --> b
 834 | 
 835 | 
 836 | 
 837 | 
 838 | 
 839 | Davidson, et al.         Expires March 17, 2017                [Page 15]
 840 | 
 841 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
 842 | 
 843 | 
 844 |    This function takes the bytes of a public verification key 'pubKey',
 845 |    a token 't' and a signature 's'.  It outputs "b == true" if 's' is a
 846 |    valid signature on t and "b == false" otherwise.  The plugin runs
 847 |    VERIFY() using all three inputs to get the output b and returns this
 848 |    as the output of the function.
 849 | 
 850 |    o  generate(N) --> tokens
 851 | 
 852 |    This function takes an integer N as input and outputs an array
 853 |    'tokens' of length N containing.  The array is generated by sampling
 854 |    N 32-byte nonces randomly via SAMPLE() and constructing N tokens
 855 |    by creating N JSON objects with the "nonce" field set to the value
 856 |    of the sampled nonce.
 857 | 
 858 |    o  blind(t) --> t'
 859 | 
 860 |    This function takes a token t as input and outputs a blinded token
 861 |    t'.  The function uses SAMPLE() to generate a 256-byte random "int_b"
 862 |    r from the full domain allowed by the RSA keys and then runs
 863 |    BLIND(r, t) --> t' and outputs t'.  After each use of blind(), the plugin
 864 |    should store a map between the blinded data and the blinding factor used,
 865 |    such as
 866 | 
 867 |                       Plugin.blindingFactors[t'] = r
 868 | 
 869 |    o  unblind(t', s', r) --> (t, s)
 870 | 
 871 |    Takes a blinded token t', a valid signature s' for t' and the
 872 |    blinding factor r as input and outputs a pair (t, s) where t is the
 873 |    unblinded token and s is a valid signature on t.  The function uses
 874 |    the algorithm UNBLIND(r, t', s') to retrieve (t, s).
 875 | 
 876 |    o  store(pubKey, tokens, sigs)
 877 | 
 878 |    This function does not return anything.  It simply sets
 879 | 
 880 |                       Plugin.tokens[pubKey] = tokens
 881 | 
 882 |    and
 883 | 
 884 |                   Plugin.signatures[tokens[i]] = sigs[i]
 885 | 
 886 |    o  encode(type, data)
 887 | 
 888 |    Takes a 'type' string and a base64 encoded string 'data' as input.
 889 |    The 'type' string corresponds to a JSON request (either "JSR" or
 890 |    "JRR") and creates a JSON object with the "type" field set
 891 |    appropriately and the "contents" field set to be equal to the 'data'
 892 |    input.
 893 | 
 894 |    o  mac(nonce, s)
 895 | 
 896 | 
 897 | 
 898 | Davidson, et al.         Expires March 17, 2017                [Page 16]
 899 | 
 900 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
 901 | 
 902 | 
 903 |    Takes 'nonce' in byte form and a string 's' as input.  The 'nonce'
 904 |    value is used as the key and 's' is the contents to be computed over.
 905 |    This function runs the algorithm MAC(nonce, s) on the two inputs and
 906 |    outputs whatever this algorithm outputs.
 907 | 
 908 |    o  send(msg)
 909 | 
 910 |    Provides no output, takes a string representation 'msg' as input
 911 |    where 'msg' is either a JSR or JRR as input.  This function reloads
 912 |    the current page in the browser and appends 'msg' in the HTTP request
 913 |    that is created (either in a header or in the body).
 914 | 
 915 |    o  pow(randNonce)
 916 | 
 917 |    Optional method for the plugin.  Takes the bytes of a random nonce
 918 |    'randNonce' as input and computes some proof-of-work computation that
 919 |    is specified by the edge.  The output is given as 'out' and is used
 920 |    by the client in the following bypass request that is made.
 921 | 
 922 | 5.1.  Pinned public keys
 923 | 
 924 |    The plugin has a list of pinned public keys stored as base64 strings
 925 |    in the string array "publicKeys".  Because browser plugins are signed
 926 |    and verifiable as part of a deterministic build process, this
 927 |    prevents a service from assigning unique public keys to each client
 928 |    as a way of linking requests and deanonymising users.  When an edge
 929 |    provides a certificate for a given public key, the plugin checks that
 930 |    the key contained in the certificate is one of its pinned keys before
 931 |    communicating futher with the edge.
 932 | 
 933 | 6.  Token Acquisition Protocol
 934 | 
 935 |    The token acquisition protocol allows a client to acquire signatures
 936 |    on client-generated tokens that can be redeemed in the future to
 937 |    bypass challenge pages.  We analyse the protocol with respect to the
 938 |    stages that we defined in Figure 1.
 939 | 
 940 | 6.1.  [OriginRequest]
 941 | 
 942 |    This initiation of the protocol is triggered by the OriginRequest
 943 |    where the client attempts to access a webpage (for example over
 944 |    HTTP).  For the purposes of our protocol this webpage is edge-
 945 |    protected.
 946 | 
 947 | 
 948 | 
 949 | 
 950 | 
 951 | 
 952 | 
 953 | 
 954 | Davidson, et al.         Expires March 17, 2017                [Page 17]
 955 | 
 956 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
 957 | 
 958 | 
 959 | 6.2.  ChallengePage
 960 | 
 961 |    The edge deems the origin request to come from a client requiring the
 962 |    showing of a challenge in order to grant access to the protected
 963 |    website.  The challenge page displays some HTML conveying the
 964 |    explicit challenge to the client.
 965 | 
 966 |    To participate in accepting challenge bypass tokens, an edge must
 967 |    also append specific "<meta>" tags to the HTML of the page.  The tags
 968 |    that indicate participation are:
 969 | 
 970 |             <meta name="captcha-bypass" id="captcha-bypass" />
 971 |             <meta name="chl-cert" id="chl-cert" content="%s" />
 972 | 
 973 |    where '%s' is replaced with a valid certificate on some public key.
 974 | 
 975 | 6.3.  VerifyCertificate
 976 | 
 977 |    When a client is delivered such a page, the installed plugin will run
 978 |    the "parse()" function on the HTML and the meta tags above, if this
 979 |    function returns true then the plugin inputs the certificate from
 980 |    '%s' into "verifyCert()" and checks that this also returns true.
 981 | 
 982 | 6.4.  GenerateTokens + BlindTokens
 983 | 
 984 |    The plugin retrieves the public key 'pubKey' from the verified
 985 |    certificate and then checks if "Plugin.tokens[pubKey]" is empty or
 986 |    not.
 987 | 
 988 |    If there are no tokens stored for 'pubKey' the plugin runs
 989 |    "generate(N)" to get an array of N 'toks'.  It then runs "blind(t)"
 990 |    on each token and constructs an array of blinded tokens,
 991 |    'blindedTokens'.  The array 'toks' is stored in the 'tokens' map as
 992 | 
 993 |                            tokens[pubKey] = toks
 994 | 
 995 |    where 'pubKey' is the public key from the certificate.
 996 | 
 997 | 6.5.  SolveChallenge
 998 | 
 999 |    This step involves the client solving the presented challenge.  This
1000 |    step requires human intervention, for instance as in the way that
1001 |    CAPTCHAs are solved.
1002 | 
1003 | 
1004 | 
1005 | 
1006 | 
1007 | 
1008 | 
1009 | 
1010 | Davidson, et al.         Expires March 17, 2017                [Page 18]
1011 | 
1012 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
1013 | 
1014 | 
1015 | 6.6.  [SignRequest]
1016 | 
1017 |    The plugin encodes the array 'blindedTokens' as a string 'content'
1018 |    and runs "encode("JSR", content)" to get a JSR request containing
1019 |    this data.  When the challenge solution is sent to the edge by the
1020 |    client, the plugin base64 encodes the JSR and appends it to the HTTP
1021 |    request body using the syntax:
1022 | 
1023 |                    blinded-tokens=<base64 encoded JSR>
1024 | 
1025 |    ## VerifyChallenge
1026 | 
1027 |    When the edge receives the request with a challenge solution and a
1028 |    JSR it first checks that the solution provided is correct with
1029 |    respect to the initial challenge that was sent.
1030 | 
1031 | 6.7.  SignTokens
1032 | 
1033 |    The edge receives the blinded tokens, checks that the challenge
1034 |    solution is valid and then runs SIGN() on each blinded token t'_i
1035 |    from the JSR using the private signing key "sig-priv-key" that it
1036 |    owns.  The edge constructs an array 'sigs' from the signatures that
1037 |    are produced by the SIGN() algorithm.
1038 | 
1039 | 6.8.  [Response]
1040 | 
1041 |    The edge responds to the client with an array containing the pairs of
1042 |    blinded tokens with their respective signatures from the array 'sigs'
1043 |    using the syntax:
1044 | 
1045 |                       signatures=[<s'_1>,...,<s'_N>]
1046 | 
1047 |    where each "<s'_i>" is a base64 encoded JWS object containing the
1048 |    blinded token that is signed as the payload.
1049 | 
1050 | 6.9.  VerifyingSignatures
1051 | 
1052 |    The client receives the comma-separated signatures from the edge.
1053 | 
1054 |    Firstly, the plugin runs "verifySig(pubKey, s'_i, t'_i)" for the ith
1055 |    received signature "s'_i" where "t'_i" is the blinded token stored in
1056 |    the payload and pubKey stored on the original certificate.  If each
1057 |    invocation of "verifySig()" is successful then the plugin proceeds.
1058 | 
1059 | 
1060 | 
1061 | 
1062 | 
1063 | 
1064 | 
1065 | 
1066 | Davidson, et al.         Expires March 17, 2017                [Page 19]
1067 | 
1068 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
1069 | 
1070 | 
1071 | 6.10.  UnblindTokens
1072 | 
1073 |    Secondly the plugin runs "unblind(t'_i, s'_i, r_i)" where r_i is the
1074 |    ith blinding factor stored in "Plugin.blindingFactors".  This
1075 |    function outputs the pair "(t_i, s_i)".
1076 | 
1077 | 6.11.  StoreTokens
1078 | 
1079 |    Finally the plugin checks that:
1080 | 
1081 |                       Plugin.tokens[pubKey][i] = t_i
1082 | 
1083 |    If so, then the plugin runs "store(pubKey, t_i, s_i)" to store the
1084 |    token and signature for future use.
1085 | 
1086 | 7.  Challenge Bypass Protocol
1087 | 
1088 |    The challenge bypass protocol starts in the same way as the token
1089 |    acquisition protocol with the client attempting to visit an edge-
1090 |    protected origin.  The origin returns a challenge page as before and
1091 |    the client's browser verifies the HTML "meta" tags sent by the edge
1092 |    indicate that bypassing a challenge page can happen.  The protocol
1093 |    deviates after the VerifyCertificate stage if the map
1094 |    "Plugin.tokens[pubKey]" is populated by one or more tokens (where
1095 |    'pubKey' is the certified public key as before).
1096 | 
1097 |    We detail the steps that follow this stage in detailing how a client
1098 |    can bypass the challenge.
1099 | 
1100 | 7.1.  ConstructTokenMessage
1101 | 
1102 |    When the client has tokens for being able to bypass challenges the
1103 |    browser plugin does the following:
1104 | 
1105 |    o  picks the next available token and signature pair (t,sig) for
1106 |       'pubKey' where:
1107 | 
1108 |                            pubKey = sig-pub-key;
1109 | 
1110 |    o  encrypts t by computing
1111 | 
1112 |                      ENCRYPT(id-pub-key, t) --> t-enc;
1113 | 
1114 |    o  computes
1115 | 
1116 |                 MAC(t["nonce"], unique-request-data) --> hm
1117 | 
1118 | 
1119 | 
1120 | 
1121 | 
1122 | Davidson, et al.         Expires March 17, 2017                [Page 20]
1123 | 
1124 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
1125 | 
1126 | 
1127 |    where the MAC algorithm is keyed by the "nonce" field on the token
1128 |    and 'unique-request-data' is some data that is unique to a request
1129 |    containing this token;
1130 | 
1131 |    o  creates a concatenated string:
1132 | 
1133 |                             t-enc || sig || hm
1134 | 
1135 |    and base64 encodes it to form a string 'data';
1136 | 
1137 | 7.2.  ProofOfWork
1138 | 
1139 |    This is an optional extension to the protocol that enables the edge
1140 |    to specify some proof-of-work (PoW) computation to the client.  This
1141 |    is to prevent any client from being able to construct many viable-
1142 |    looking, but invalid, tokens that force the edge into computing a
1143 |    number of public-key operations before throwing away the invalid
1144 |    token.  If done often enough this could lead to a potential DDoS
1145 |    vector on the edge.  By establishing a PoW step this limits the
1146 |    client to only being able to redeem tokens when they can answer the
1147 |    PoW.
1148 | 
1149 |    If this step is to be used, the edge specifies an extra header in the
1150 |    initial response to the client with the attribute "bypass-proof-of-
1151 |    work" and a value "randNonce" that contains a random nonce that the
1152 |    client uses in answering the PoW.  The plugin then computes
1153 |    "pow(randNonce)" --> 'out' where 'out' represent the output of the
1154 |    computation.
1155 | 
1156 | 7.3.  SendToken
1157 | 
1158 |    o  Runs encode("JRR", data) to get a JRR with the "contents" field
1159 |       set equal to 'data'
1160 | 
1161 |    o  If ProofOfWork is done, then the plugin appends an extra field to
1162 |       the JRR object named "pow" where the value is equal to 'out'.
1163 | 
1164 |    The plugin then reloads the page and sends this JRR as the value of
1165 |    the header "challenge-bypass-token".
1166 | 
1167 | 7.4.  VerifyPoW
1168 | 
1169 |    When the edge receives the JRR message that was sent above, if a PoW
1170 |    was stipulated then the edge first checks that the value stored in
1171 |    the "pow" field is correct for the random nonce that was sent.
1172 | 
1173 |    If not, then the protocol is aborted at this point.
1174 | 
1175 | 
1176 | 
1177 | 
1178 | Davidson, et al.         Expires March 17, 2017                [Page 21]
1179 | 
1180 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
1181 | 
1182 | 
1183 | 7.5.  VerifyTokenMessage
1184 | 
1185 |    The edge decodes the "contents" fields from the received JRR and it
1186 |    does the following:
1187 | 
1188 |    o  sets a "success" bool to true;
1189 | 
1190 |    o  computes
1191 | 
1192 |                      DECRYPT(id-priv-key, t-enc) --> t
1193 | 
1194 |    and checks that t is a JSON object with a "nonce" field, if either
1195 |    check fails then set "success" equal to false; - checks that t has
1196 |    not been redeemed before, otherwise set "success" to false; -
1197 |    computes
1198 | 
1199 |                      VERIFY(sig-pub-key, t, sig) --> b
1200 | 
1201 |    if 'b' is not true then it sets "success" to false; - retrieves
1202 |    'edge-request-data' from the request it received and computes
1203 | 
1204 |               MAC(t["nonce"], edge-request-data) --> hm-edge
1205 | 
1206 |    and checks that hm == hm-edge, if not "it sets "success" to false.
1207 | 
1208 |    If "success" is still true, then the edge marks the bypass request as
1209 |    successful and continues.
1210 | 
1211 | 7.6.  GetOrigin + [Response]
1212 | 
1213 |    If the verification process was successful.  The edge gets a response
1214 |    from the origin that corresponds to the original request in
1215 |    {OriginRequest} from the client.  The edge then sends this response
1216 |    directly back to the client.
1217 | 
1218 |    This allows the client to access the origin resource.
1219 | 
1220 | 8.  References
1221 | 
1222 | 8.1.  Normative References
1223 | 
1224 |    [ABHL03]   von Ahn, L., Blum, M., Hopper, N., and J. Langford,
1225 |               "CAPTCHA: Using Hard AI Problems For Security", 2003,
1226 |               <https://www.cs.cmu.edu/~mblum/research/pdf/captcha.pdf>.
1227 | 
1228 | 
1229 | 
1230 | 
1231 | 
1232 | 
1233 | Davidson, et al.         Expires March 17, 2017                [Page 22]
1234 | 
1235 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
1236 | 
1237 | 
1238 | 8.2.  Informative References
1239 | 
1240 |    [BNPS01]   Bellare, M., Namprempre, C., Pointcheval, D., and M.
1241 |               Semanko,   "The One-More-RSA-Inversion Problems and the
1242 |               Security of Chaum’s Blind Signature Scheme", 2001,
1243 |               <https://eprint.iacr.org/2001/002.pdf>.
1244 | 
1245 |    [Cha83]    Chaum, D., "Blind Signatures For Untraceable Payments",
1246 |               1983, <http://sceweb.sce.uhcl.edu/yang/teaching/
1247 |               csci5234WebSecurityFall2011/Chaum-blind-signatures.PDF>.
1248 | 
1249 |    [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1250 |               Requirement Levels", BCP 14, RFC 2119,
1251 |               DOI 10.17487/RFC2119, March 1997,
1252 |               <http://www.rfc-editor.org/info/rfc2119>.
1253 | 
1254 |    [RFC5280]  Cooper, D., Santesson, S., Farrell, S., Boeyen, S.,
1255 |               Housley, R., and W. Polk, "Internet X.509 Public Key
1256 |               Infrastructure Certificate and Certificate Revocation List
1257 |               (CRL) Profile", RFC 5280, DOI 10.17487/RFC5280, May 2008,
1258 |               <http://www.rfc-editor.org/info/rfc5280>.
1259 | 
1260 |    [RFC7515]  Jones, M., Bradley, J., and N. Sakimura, "JSON Web
1261 |               Signature (JWS)", RFC 7515, DOI 10.17487/RFC7515, May
1262 |               2015, <http://www.rfc-editor.org/info/rfc7515>.
1263 | 
1264 | Authors' Addresses
1265 | 
1266 |    Alex Davidson
1267 |    Royal Holloway, University of London
1268 |    Egham Hill
1269 |    Egham  TW20 0EX
1270 | 
1271 |    Email: alex.davidson.2014@live.rhul.ac.uk
1272 | 
1273 | 
1274 |    Nick Sullivan
1275 |    Cloudflare
1276 |    101 Townsend St
1277 |    San Francisco  CA 94107
1278 | 
1279 |    Email: nick@cloudflare.com
1280 | 
1281 | 
1282 |    George Tankersley
1283 |    Cloudflare
1284 | 
1285 |    Email: george.tankersley@cloudflare.com
1286 | 
1287 | 
1288 | 
1289 | 
1290 | 
1291 | 
1292 | 
1293 | 
1294 | Davidson, et al.         Expires March 17, 2017                [Page 23]
1295 | 
1296 | Internet-Draft   Protocol for bypassing challenge pages   September 2016
1297 | 
1298 | 
1299 |    Filippo Valsorda
1300 |    Cloudflare
1301 |    25 Lavington Street
1302 |    London  SE1 0NZ
1303 | 
1304 |    Email: filippo@cloudflare.com
1305 | 
1306 | 
1307 | 
1308 | 
1309 | 
1310 | 
1311 | 
1312 | 
1313 | 
1314 | 
1315 | 
1316 | 
1317 | 
1318 | 
1319 | 
1320 | 
1321 | 
1322 | 
1323 | 
1324 | 
1325 | 
1326 | 
1327 | 
1328 | 
1329 | 
1330 | 
1331 | 
1332 | 
1333 | 
1334 | 
1335 | 
1336 | 
1337 | 
1338 | 
1339 | 
1340 | 
1341 | 
1342 | 
1343 | 
1344 | 
1345 | 
1346 | 
1347 | 
1348 | 
1349 | 
1350 | Davidson, et al.         Expires March 17, 2017                [Page 24]
1351 | 


--------------------------------------------------------------------------------