├── .gitignore
├── README.md
├── old-versions
├── captcha-bypass-formal-spec.txt
└── captcha-bypass-internet-draft.md
└── captcha-bypass-internet-draft.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | .refcache/
2 | *.xml
3 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # challenge-bypass-specification
2 |
3 | This repository contains the formal specification for a protocol that has been developed to allow bypassing challenge pages using signed tokens that guarantee anonymity to the user. The following files are of use:
4 |
5 | - captcha-bypass-formal-spec.txt: Original specification of protocol along with the algorithms that are required to instantiate it. The draft is heavily based on a preceding work by George Tankersley and Filippo Valsorda except with alterations where the design has changed since it's original conception.
6 | - captcha-bypass-internet-draft.txt: The most current draft of a document intended to be submitted as an IETF internet draft proposal. Focuses on defining the different phases of the protocol and how extensions can be built in. All changes should be made to the markdown document captcha-bypass-formal-spec.md and then compiled using mmark and xml2rfc into the text document.
7 |
8 | We encourage comments/feedback on the designs and the drafts themselves.
9 |
10 |
--------------------------------------------------------------------------------
/old-versions/captcha-bypass-formal-spec.txt:
--------------------------------------------------------------------------------
1 | Title: Cloudflare CAPTCHA bypass
2 | Authors: George Tankersley, Filippo Valsorda, Alex Davidson, Nick Sullivan
3 | Created: 19-Sep-2016
4 |
5 | Latest:
6 | - 2016-09-28
7 | - 2016-09-19
8 |
9 | History:
10 | (see repository commits)
11 |
12 | Preface:
13 | Large parts of this specification are taken from an original draft written by
14 | George Tankersley and Filippo Valsorda
15 | (https://github.com/gtank/captcha-draft/blob/master/captcha-plugin-draft.txt)
16 |
17 | Overview:
18 |
19 | In many IP reputation systems, Tor exits quickly become associated with
20 | abuse. Because of this, Tor Browser users quickly become familiar with
21 | various CAPTCHA challenges. Cloudflare use an IP-reputation based system to
22 | display pages to users requiring the completion of a successful CAPTCHA before
23 | the requested page can be accessed. While CAPTCHAs in themselves are supposed
24 | to be easily solvable for humans, Tor users are dealt a disproportionate
25 | amount of these challenges due to the regularity of Tor exit nodes being dealt
26 | with poor IP reputations. This problem has been likened to an act of
27 | censorship against Tor users as these users are the most targeted by this
28 | protection mechanism. This problem also affects users of certain VPN providers
29 | and of I2P services.
30 |
31 | This document describes a solution to this problem. We propose a Tor Browser
32 | plugin that will store a finite set of unlinkable CAPTCHA bypass tokens and a
33 | server-side redemption system that will consume them, allowing users to
34 | forego solving a CAPTCHA so long as they supply a valid token.
35 |
36 | Introduction:
37 |
38 | Cloudflare uses an IP reputation database to serve challenge pages to users
39 | who are deemed to be malicious. Specifically in the case of Tor, each user is
40 | given the IP address of the exit node that their connection traverses through.
41 | Tor exit nodes are typically assigned poor IP reputation scores due to the
42 | amount of abuse that is detected through these IPs. The effect of this is that
43 | regular users are presented with a higher than typical number of CAPTCHA pages
44 | when attempting to visit websites protected by Cloudflare (this also affects
45 | users of I2P and VPN services).
46 |
47 | While they are intended as a minor inconvenience, they are impenetrable for
48 | users with high privacy or security requirements. Google's ReCAPTCHA service
49 | de facto demands JavaScript execution - the challenges it produces without JS
50 | are even more cumbersome and require extra participation on the user's part.
51 | Worse, the challenge page sets a unique cookie to indicate that the user has
52 | been verified. Since Cloudflare controls the domains for all of the protected
53 | origins, it can potentially link CAPTCHA users across all >2 million
54 | Cloudflare sites without violating same-origin policy.
55 |
56 | This protocol solves both of these problems. First, it moves JavaScript
57 | execution into a consistent browser plugin (for use in TBB etc.) that can be
58 | more effectively audited than a piece of ephemerally injected JavaScript. We
59 | still require an initial CAPTCHA solution but future solutions can be avoided
60 | and are controlled only the plugin. Second, it separates CAPTCHA solving from
61 | the request endpoint and eliminates linkability across domains with blind
62 | signatures. Furthermore the solution intends to make the lives of such users
63 | much easier by drastically lowering the number of CAPTCHAs that are seen by
64 | honest users. Crucially this is achieved with a similar level of abuse
65 | protection provided by Cloudflare to current customers of their service.
66 |
67 | In essence, the protocol allows a user to solve a single CAPTCHA and in return
68 | learn a specified number of tokens that are blindly signed that can be used
69 | for redemption instead of witnessing CAPTCHA challenges in the future. For
70 | each request a client makes to a Cloudflare host that would otherwise demand
71 | a CAPTCHA solution, a browser plugin will automatically supply a bypass token.
72 | By issuing a number of tokens per CAPTCHA solution that is suitable for
73 | ordinary browsing but too low for attacks, we maintain similar protective
74 | guarantees to those of Cloudflare's current system. We also leave the door
75 | open to an elevated threat response that does not offer to accept bypass
76 | tokens.
77 |
78 | The solution we provide is general in the sense that it is agnostic of the
79 | edge server (in this case Cloudflare) - the protocol may be of use to similar
80 | edge providers who face similar issues. Furthermore while we explicitly talk
81 | of CAPTCHAs as a challenge mechanism, our design will be modular enough to
82 | incorporate the use of any particular challenge.
83 |
84 | 1 Participants:
85 |
86 | The "edge" proxies connections for a protected website and presents a
87 | challenge page if the request is deemed malicious. If the challenge is
88 | completed successfully then the edge issues tokens to the given user. The edge
89 | also consumes these tokens in order to allow bypassing of the challenge page
90 | and prevents double spending.
91 |
92 | The "plugin" runs in a user's browser and keeps track of their store of bypass
93 | tokens. It detects challenges pages and presents bypass tokens when needed. It
94 | also pins keys from challenge pages and validates tag signatures.
95 |
96 | We will also refer to the "user"/"client" who is responsible for making
97 | connection requests to protected websites.
98 |
99 | The original draft states the independence of the "edge" and the "challenge"
100 | service that issues tokens upon completion of challenges. We will assume that
101 | both these roles are carried out alone by the "edge".
102 |
103 | 2 Cryptographic preliminaries:
104 |
105 | 2.1 Schemes
106 |
107 | This specification uses the following cryptographic building blocks:
108 |
109 | * A public key encryption system PK_KEYGEN()->seckey, pubkey;
110 | PK_ENCRYPT(pubkey, msg)->ciphertext; and PK_DECRYPT(seckey,
111 | ciphertext)->msg; where secret keys are of length PK_SECKEY_LEN bytes,
112 | and public keys are of length PK_PUBKEY_LEN bytes.
113 |
114 | * A public key signature system SIGN_KEYGEN()->seckey, pubkey;
115 | SIGN_SIGN(seckey,msg)->sig; and SIGN_VERIFY(pubkey, sig, msg) -> { "OK",
116 | "BAD" }; where secret keys are of length SIGN_SECKEY_LEN bytes, public
117 | keys are of length SIGN_PUBKEY_LEN bytes, and signatures are of length
118 | SIGN_SIG_LEN bytes.
119 |
120 | This signature system must also support blind signing (see below).
121 |
122 | * A cryptographic hash function H(d), which should be pre-image and
123 | collision resistant. It produces hashes of length HASH_LEN bytes.
124 |
125 | * A cryptographic message authentication code MAC(key,msg) that produces
126 | outputs of length MAC_LEN bytes.
127 |
128 | We provisionally instantiate these with
129 |
130 | For both PK and SIGN: 2048-bit RSA using OAEP and PSS (respectively) for
131 | normal operations.
132 |
133 | For H: SHA256
134 |
135 | For MAC: HMAC-SHA256
136 |
137 | 2.2 Blind signing
138 |
139 | This specification uses the following cryptographic tools for blinding and
140 | unblinding tokens and corresponding signatures:
141 |
142 | * An algorithm BLIND_GEN(i) -> b_i where b_i is a secret blinding factor
143 | for blinding a single token t_i
144 |
145 | * An algorithm BLIND(t_i, b_i) -> t'_i that takes the original token t_i
146 | and a blinding factor b_i and outputs a blinded token t'_i
147 |
148 | * (Signing is carried out as above using the standard signing algorithm)
149 |
150 | * An algorithm UNBLIND(t'_i, b_i) -> taking the blinded token and the
151 | blinding factor used and outputting the original unblinded token t_i
152 |
153 | * An algorithm UNBLIND_SIG(s'_i, b_i) -> s_i where s'_i is a signature on
154 | the blinded token t'_i, b_i is the blinding factor used to retrieve t'_i
155 | from t_i and s_i is a valid signature on t_i.
156 |
157 | We use the RSA modification that is described at
158 | https://en.wikipedia.org/wiki/Blind_signature#Blind_signature_schemes to
159 | instantiate the algorithms that we define above.
160 |
161 | NOTE: In the future we believe it may be preferable to switch to a
162 | pairing-based elliptic curve variant that allows for blind signing.
163 | For example we may be able to use the variant of BLS signatures described
164 | here:
165 | http://crypto.stackexchange.com/questions/2424/which-blind-signature-schemes-exist-and-how-do-they-compare
166 | this would allow for much shorter signatures on tokens.
167 | Our construction maintains generality anyway and so switching the specific
168 | scheme should not affect the operation of the protocol.
169 |
170 | 2.3 Keys
171 |
172 | This protocol involves several keys. For clarity, they are:
173 |
174 | * One secret/public keypair for ENCRYPT, used by the plugin to encrypt the
175 | token contents such that they can only be read by the edge.
176 |
177 | * One secret/public keypair for SIGN, used by the challenge service to sign
178 | bypass tokens. This key pair *must* be different to the key pair for
179 | ENCRYPT to avoid attacks
180 | (https://en.wikipedia.org/wiki/Blind_signature#Dangers_of_blind_signing).
181 |
182 | * One secret/public keypair for SIGN, used by the edge to sign tags
183 | on challenge pages. If the same operator controls both the edge and the
184 | challenge service, this MUST be different from the keypair used to sign
185 | bypass tokens because the client has access to a signing oracle under
186 | the token key (see "Obtaining signatures"). The plugin should be
187 | distributed with a pinned copy of this public key for each supported
188 | edge.
189 |
190 | * A symmetric MAC key derived from each nonce, used by the plugin to bind
191 | a redeemed nonce to its particular request.
192 |
193 | 3 Notation
194 |
195 | - We use the symbol || to denote concatenation
196 | - All tokens are assumed to be communicated in the hex representation of their
197 | byte arrays.
198 |
199 | 4 Token representation
200 |
201 | The edge issues blindly signed tokens to the user who completes a given
202 | challenge that was specified by the edge. The tokens are stored and controlled
203 | by a plugin that is present in the client's browser.
204 |
205 | For our purposes tokens will be a JSON object comprising a single "nonce"
206 | field. The "nonce" field will be made up of 30 cryptographically random bytes.
207 |
208 | Token JSON:
209 |
210 | {
211 | nonce:
212 | }
213 |
214 | 5 Client-Edge messages
215 |
216 | The client communicates with the edge using "JSON requests" that can currently
217 | be used for designating tokens for signing or a single token for redemption
218 | and bypassing challenge pages.
219 |
220 | A JSON signing request takes the form:
221 |
222 | {
223 | type: "Signing",
224 | contents: [t'_1, t'_2, ... t'_M]
225 | }
226 |
227 | where each t'_i is a blinded version of a token taking the form above.
228 | A JSON redeeming request takes the form:
229 |
230 | {
231 | type: "Redeem",
232 | contents: [, , ]
233 | }
234 |
235 | where the encrypted token is an unblinded token as above, the signature is
236 | unblinded and valid over this token and the HMAC is computed over the Host
237 | header and HTTP path of the request containing the token. The HMAC is also
238 | keyed with the nonce from the "nonce" field in the token JSON.
239 |
240 | The contents field on both of these requests is base-64 encoded before it is
241 | sent.
242 |
243 | 6 Plugin operation
244 |
245 | The browser that is operated by the user will contain an extension known as
246 | the "plugin". The plugin will contain a CA cert that it can use to validate
247 | the key pairs used by the edge for signing/encryption. The plugin will also
248 | have access to a certificate transparency (CT) log to check that the certs it
249 | has been issued are not unique (see Section 8.1). These keys and certificates
250 | will be updated periodically by the edge depending on internal security
251 | considerations (default: 6 months).
252 |
253 | When the edge issues signed tokens to the user, these will be stored against
254 | the public key that correctly validates the signatures on the tokens. When
255 | posed with a token redemption request the plugin will choose a token that has
256 | not been used yet and send this to the edge along with the paired signature.
257 |
258 | 7 Design
259 |
260 | 7.1 Generation/signing of tokens
261 |
262 | When a user issues a HTTP request for a website that is protected the edge
263 | will respond with a challenge page, for instance containing a CAPTCHA.
264 | CAPTCHA responses are typically submitted to a