Patterns for Privacy by Design in Javascript APIs

35 |

36 | This document provides some background on the threats to users' privacy that Javascript APIs help create 37 | on the Web, and provides some patterns to mitigate such threats at the API design level. Its primary 38 | audience is therefore people involved in the definition and implementation inside user agents of such APIs. 39 |

40 |

73 |

Scope

74 |

75 | User privacy on the Web is a multifaceted topic which defies easy answers often causes great disagreement 76 | and debate. This document does not seek to analyse all facets of Web privacy. However, it does recognise 77 | the fact that awareness of Web privacy issues is on the rise, both on the part of Web users and Web developers. 78 | Over the past three years, the W3C has run several workshops and launched two new groups on topics related to 79 | user privacy. The IETF has also strongly expanded its activities in the privacy space. Strong privacy laws 80 | are in force in many parts of the world and being discussed in others. 81 |

82 |

83 | Privacy on the Web is a topic that overlaps technical, social, regulatory and emotional barriers and is 84 | therefore difficult to pin down when creating a technical specification. This paper therefore chooses to 85 | focus on a deliberately limited subset of the problem space. 86 |

87 |

88 | In this document we consider only those aspects of privacy and APIs that can be mitigated by API designers, 89 | such as providing more information than is necessary for a given operation, not making it possible for the 90 | user to control what information is being shared, and device fingerprinting. Conversely we do not cover other 91 | privacy attacks such as tricking the user into providing information or maliciously using collected information 92 | in ways that were not agreed to by the user. We have chosen this focus because well-designed APIs from the 93 | privacy standpoint should provide a solid foundation for better user privacy in general, because they can help 94 | address the problem at the root, and because we feel that they form a coherent whole. 95 |

96 |

97 | It is important in reading this document that its ambition is simply to capture known best current practices 98 | in API design in order to help spread them amongst groups, and so as to provide a common starting point on top 99 | of which further privacy-enhancing API design patterns can be built. Its content is therefore not expected to 100 | remain static for all of eternity, but rather to evolve so as to capture the community's knowledge 101 | of this domain. 102 |

103 |

104 | Privacy is a very broad topic that covers most parts of the technological stack as well as its relationship 105 | to society. Rather than attempt to “boil the privacy ocean” and address the entirety of the issue at once, 106 | it is the firm opinion of this document's authors that the users' privacy will be best served by addressing 107 | the problem separately at each layer that it touches so as to avoid the architectural infelicities involved 108 | in crossing layer boundaries with a single solution. Such well-scoped changes can furthermore be deployed 109 | quickly and provide the greater leverage of being accepted in their respective communities. There are 110 | therefore many privacy issues that this document does not address; we can only encourage others 111 | to attempt similar exercises across the board. Note that while the TAG's remit reaches well beyond API 112 | design and thus leaves the door open for further TAG work on privacy, this specific domain was selected 113 | as an area of high priority due to the great number of APIs being designed concurrently at this time. 114 |

115 |

117 |

Background

118 |

119 | This section introduces some of the background thinking and history behind the development of this 120 | document. 121 |

122 |

123 |

Data Minimisation

124 |

125 | In their 1975 paper The Protection of Information 126 | in Computer Systems, computer scientists Jerome Saltzer and Michael Schroeder articulated a principle of 127 | “least privilege:” 128 |

129 |

130 | Every program and every user of the system should operate using the least set of privileges 131 | necessary to complete the job. Primarily, this principle limits the damage that can result 132 | from an accident or error. It also reduces the number of potential interactions among privileged 133 | programs to the minimum for correct operation, so that unintentional, unwanted, or improper uses 134 | of privilege are less likely to occur. 135 |

136 |

137 | Although written long before the Web came into use and firmly form a security standpoint, Saltzer 138 | and Schroeder's definition could apply as easily to the distributed world of Web applications as 139 | they did to time-sharing mainframe programming of the 1970s. 140 |

141 |

142 | Today, client-side Web applications are increasingly playing a role as intermediates for our personal, 143 | privileged information between the devices we carry and applications residing somewhere on the Internet. 144 |

145 |

146 | In the “Terminology for 147 | Talking about Privacy by Data Minimization: Anonymity, Unlinkability, Undetectability, Unobservability, 148 | Pseudonymity, and Identity Management”, Andreas Pfitzmann, Marit Hansen, and Hannes Tschofenig 149 | succinctly define minimisation as a strategy towards implementing enhanced privacy in (personal) data 150 | collection and usage: 151 |

152 |

153 |
154 | Data minimization means that first of all, the possibility to collect personal data about others 155 | should be minimized. Next within the remaining possibilities, collecting personal data should be 156 | minimized. Finally, the time how long collected personal data is stored should be minimized. 157 |
158 |
159 | Data minimization is the only generic strategy to enable anonymity, since all correct personal data 160 | help to identify if we exclude providing misinformation (inaccurate or erroneous information, provided 161 | usually without conscious effort at misleading, deceiving, or persuading one way or another) 162 | or disinformation (deliberately false or distorted information given out in order to mislead or deceive). 163 |
164 |
165 | Furthermore, data minimization is the only generic strategy to enable unlinkability, since all correct 166 | personal data provides some linkability if we exclude providing misinformation or disinformation. 167 |
168 |

169 |

170 | In attempting to apply these principles to the area of client-side Web APIs, the W3C Device APIs Working 171 | Group has refined this definition within their Device 172 | API Privacy Requirements. 173 |

174 |

175 |
176 |
177 | APIs MUST make it easy to request as little information as required for the intended usage. For 178 | instance, an API call should require specific parameters to be set to obtain more information, 179 | and should default to little or no information. 180 |
181 |
182 | APIs SHOULD make it possible for user agents to convey the breadth of information that the requester 183 | is asking for. For instance, if a developer only needs to access a specific field of a user address 184 | book, it should be possible to explicitly mark that field in the API call so that the user agent can 185 | inform the user that this single field of data will be shared. 186 |
187 |
188 | APIs SHOULD make it possible for user agents to let the user select, filter, and transform information 189 | before it is shared with the requester. The user agent can then act as a broker for trusted data, and will 190 | only transmit data to the requester that the user has explicitly allowed. 191 |
192 |
193 |

194 |

195 |

196 |

Privacy by Design

197 |

198 | “Privacy by design” is a relatively loaded term which has taken on multiple different definitions depending 199 | on context. However, since it is the term of trade that has been in use in groups tasked with defining 200 | Javascript APIs over the past few years, we have chosen to keep it, while ensuring that a specific definition 201 | is provided here. 202 |

203 |

204 | Within the scope of this document, privacy by design is a design approach that takes malicious 205 | practices for granted and endeavours to prevent and mitigate them from the ground up by applying specific 206 | patterns to the creation of Javascript APIs. 207 |

208 |

209 | It is particularly important in privacy by design that users be exposed to as few direct privacy decisions as 210 | possible. Notably, it should never be assumed that users can be “educated” into making correct privacy decisions. 211 | The reason for this is that user typically interact with their user agent in order to accomplish a specific task. 212 | Anything that interrupts the flow of that task's realisation is likely to only be given as cursory a thought 213 | as possible. Therefore, requiring users to make privacy decisions whilst in the middle of accomplishing a task 214 | is a recipe for poor decisions. 215 |

216 |

217 | There are two primary issues that privacy by design seeks to address within the scope of Javascript APIs. 218 |

219 |

220 |

Poor Information Scoping

221 |

222 | In order to accomplish a given operation, a Web application may legitimately require access to some of 223 | the user's private information. The issue here is that in providing access to the required information 224 | can sometimes entail exposing a lot more information. For instance, when the user wishes to share one 225 | event in their calendar, the entire calendar becomes available; or where the user needs to share the 226 | name and email addresses of some of their contacts, the phone numbers, pictures, home addresses, etc. 227 | of these contacts are also returned. 228 |

229 |

230 |

231 |

Device Fingerprinting

232 |

233 | Users are routinely tracked across the Web through the use of cookies and other such identification 234 | mechanisms. In many cases, these tracking methods can be successfully mitigated by the user agent, for instance 235 | by only returning cookies for the top level window in the browsing context (as opposed to providing them 236 | for iframes for instance). 237 |

238 |

239 | Fingerprinting circumvents this by looking at as many of the unique features of a user agent in order to 240 | ascertain its uniqueness. For instance, it may look at screen resolution, the availability of specific 241 | plugins, at the list of fonts that are installed on the system, the user agent string, the timezone, and 242 | a wealth of information that user agents tend to provide by default. Taken one by one, none of these 243 | data are sufficient to identify a single user, but put together they collect enough bits to narrow 244 | the identification down to just the one person — especially if you take into account the population 245 | that visits a given site or set of sites. 246 |

247 |

248 | A very good demonstration of fingerprinting, alongside with a paper detailing the approach, are 249 | available from Panopticlick — How Unique and Traceable is Your Browser? 250 |

251 |

252 |

253 |

255 |

Privacy-Enhancing API Patterns

256 |

257 | The Web application platform requires the availability of a vast array of functionality that can greatly 258 | vary in nature. As a result, not all APIs can look the same, and no single approach can be applied automatically 259 | across the board in order to take privacy into account during API design. This section therefore lists 260 | patterns that API designers are expected to adapt to the specific requirements of their work. 261 |

262 |

263 |

Action-Based Availability

264 |

265 | We seldom pause to think about it, but the manner in which mouse events are provided to Web applications 266 | is a good example of privacy by design. When a page is loaded, the application has no way of knowing 267 | whether a mouse is attached, what type of mouse it is (let alone which make and model), what kind of 268 | capabilities it exposes, how many are attached, and so on. Only when the user decides to use the 269 | mouse — presumably because it is required for interaction — does some of this information become 270 | available. And even then, only the strict minimal is exposed: you could not know whether it is a 271 | trackpad for instance, and the fact that it may have a right button is only exposed if it is used. 272 |

273 |

274 | This is an efficient way to prevent fingerprinting: only the minimal amount of information is provided, and 275 | even that only when it is required. Contrast it with a design approach that is more typical of the first 276 | proposals one sees to expose new interaction modalities: 277 |

278 |

279 |           var mice = navigator.getAllMice();
280 |           for (var i = 0, n = mice.length; i < n; i++) {
281 |               var mouse = mice[i];
282 |               // discover all sorts of unnecessary information about each mouse
283 |               // presumably do something to register event handlers on them
284 |           }
285 |

286 |

287 | The “Action-Based Availability” design pattern is applicable beyond mouse events. For instance, 288 | the Gamepad API makes 289 | use of it. It is impossible for a Web game to know if the user 290 | agent has access to gamepads, how many there are, what their capabilities are, etc. It is simply 291 | assumed that if the user wishes to interact with the game through the gamepad then she will know 292 | when to action it — and actioning it will provide the application with all the information that 293 | it needs to operate (but no more than that). 294 |

295 |

296 | The way in which this pattern is supported for the mouse is simply by only providing information 297 | on the mouse's behaviour when certain events take place. The approach is therefore to expose 298 | event handling (e.g. triggering on click, move, button press) as the sole interface to the device. 299 |

300 |

301 | The Gamepad API supports this in a different, but equally good, fashion. The navigator.gamepads 302 | array is initially empty, and only becomes populated when a gamepad has been interacted with (and then, 303 | only with those of the connected gamepads that have been manipulated). Once they have been interacted with, 304 | for the lifetime of the document these gamepads become available in the array and can be queried for a limited 305 | (but essential) set of information. 306 |

307 |

308 |

309 |

Graceful Degradation

310 |

311 | Graceful degradation is the principle according to which a system will continue to operate at the 312 | best of its capabilities despite the fact that a given piece of functionality may be missing. 313 | While commonly relied upon in Web technology, notably for document styling, it also has value 314 | as a tool that can minimise device fingerprinting. 315 |

316 |

317 | A good example of this pattern in action is the Vibration API. 318 | Traditional APIs for vibration would make it possible for the developer to detect whether the device 319 | on which her application is being run features a vibrator. Additionally, attempting to trigger 320 | vibration would likely cause an exception or other such code-detectable failure. This would produce 321 | at least one bit of additional fingerprinting information, possibly more if the device's vibrators could be 322 | further investigated. 323 |

324 |

325 | The Vibration API approaches this problem differently: when a device does not support actual vibration, 326 | it does not surface this information. Calling navigator.vibrate(1000) will work just as 327 | well as if a vibrator had been present, and indistinguishably from that set up. In addition to making 328 | the code more robust, this approach therefore also contributes to fingerprinting reduction. 329 |

330 |

331 |

332 |

User Mediation

333 |

334 | By default, the user agent should provide Web applications with an environment that is privacy-safe 335 | for the user in that it does not expose any of the user's information without her consent. But in 336 | order to be most useful, and properly constitute a user agent, it should be allowed to 337 | occasionally punch limited holes through this protection and access relevant data. This should 338 | never happen without the user's express consent and such access therefore needs to be mediated 339 | by the user. 340 |

341 |

342 | At first sight this appears to conflict with the requirement that users be asked to make direct privacy 343 | decisions as little as possible, but there is a subtle distinction at play. Direct privacy decisions 344 | will prompt the user to provide access to information at the application's behest, usually through 345 | some form of permissions dialog. User-mediated access will afford a control for the user to provide 346 | the requisite information, and do so in a manner that contextualises the request and places it in 347 | the flow of the user's intended action. 348 |

349 |

350 | A good and well-established example of user-mediated access to private data is the file upload form control. 351 | If we imagine a Web application that wishes to obtain a picture of the user to use on her profile, 352 | the direct privacy decision approach would, as soon as the page is loaded, prompt the user with a 353 | permissions dialog asking if the user wishes to share her picture, without any context as to why 354 | or for what purpose. Conversely, using a file upload form control the user will naturally go through 355 | the form, reach the picture field, activate the file picker dialog, and offer a picture of her choosing. 356 | The operative difference is that in the latter case the user need not specifically think about whether 357 | providing this picture is a good idea or not. All the context in which the picture is asked for is 358 | clearly available, and the decision to share is an inherent part of the action of sharing it. Naturally, the 359 | application could still be malicious but at least the user is in the driving seat and in a far better position 360 | to make the right call. 361 |

362 |

363 | A counter-example to this pattern can be seen at work in the Geolocation API [[GEOLOCATION-API]]. If a 364 | user visits a mapping application in order to obtain the route between two addresses that she enters, she 365 | should not be prompted to provide her location since it is useless to the operation at hand. Nevertheless, 366 | with the Geolocation API as currently designed she will typically be asked right at load to provide it. 367 | A better design would be to require a user-mediated action to expose the user's location. 368 |

369 |

370 | This approach is applicable well beyond file access and geolocation. 371 | Web Intents are 372 | currently being designed as a generic mechanism for user-mediated access to user-specific data and 373 | services and they are expected to apply to a broad range of features such as address book, calendar, 374 | messaging, sensors, and more. 375 |

376 |

377 |

378 |

Minimisation

379 |

380 | Minimisation is a strategy that involves exposing as little information as is required for a given 381 | operation to complete. More specifically, it requires not providing access to more information than 382 | was apparent in the user-mediated access or allowing the user some control over which information 383 | exactly is provided. 384 |

385 |

386 | For instance, if the user has provided access to a given file, the object representing that should not 387 | make it possible to obtain information about that file's parent directory and its contents as that is 388 | clearly not what is expected. 389 |

390 |

391 | During user-mediated access, the user should also be in control of what is shared. For example, if the 392 | user is sharing a list of contacts from her address book it should be clear which fields of these 393 | contacts are being requested (e.g. name and email), and she should be able to choose whether those 394 | fields are actually going to be returned or not. An example dialog for this may look as follows: 395 |

396 |

397 | Example contact picker 398 |

399 |

400 | In this case the application has clearly requested that the First Name, Salutations, and Email Addresses 401 | fields be returned. If it seems unnecessary to the user, she could unselect for instance First Name before 402 | providing the list of selected contacts and the application would not be made aware of this choice. Such 403 | a decision is made in the flow of user action. 404 |

405 |

406 | The above requires the API to be designed not only in such a manner that the user can initiate the mediation 407 | process of access to her data, but also so that it can specify which fields it claims to need. 408 |

409 |

410 |

Introduction