├── w3c.json ├── README.md ├── POSE-AND-ENVIRONMENT.md └── EXPLAINER.md /w3c.json: -------------------------------------------------------------------------------- 1 | { 2 | "group": 109735 3 | , "contacts": ["dontcallmedom", "himorin"] 4 | , "repo-type": "others" 5 | } 6 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Immersive Web Privacy and Security 2 | 3 | Web applications using immersive APIs may have access to real-world understanding data (e.g. real-world geometry), and may also use the camera or other sensors during scene composition and rendering (e.g. augmented reality on mobile devices). This access (or perceived access) to data presents certain threat vectors to user privacy and security. 4 | 5 | The purpose of this repo is to explore those threat vectors and possible mitigations that may form the basis of the Privacy and Security Considerations sections for APIs related to the immersive web, as well as informing normative requirements for those APIs. 6 | 7 | This repository is intended to include several explainers, each capturing a dimension of data access (or perception of data access) and analysing potential threat vectors and mitigations. 8 | -------------------------------------------------------------------------------- /POSE-AND-ENVIRONMENT.md: -------------------------------------------------------------------------------- 1 | # Pose and Environment Data 2 | 3 | ## Background 4 | 5 | Immersive Web APIs may provide both pose data (position/orientation) and data about the user's environment and context (data in a 'reference space') to site developers so that they can render 3D scenes for VR and AR. This document outlines potential threat vectors and mitigations for such data. 6 | 7 | ### Data Considered 8 | 9 | _Floor Height:_ Through a reference space a developer may access the height of the user's viewpoint from the floor. On some systems this may be an accurate representation of height; on others it may be a user-set value or (when not provided) be a default value set by the client or user agent. 10 | 11 | _Boundaries:_ Through a reference space a developer may access the boundary of the user's allowed real-world walking space (for example, a safety boundary). This may be represented as an arbitrary geometric outline at real-world scale relative to an arbitrary origin. 12 | 13 | _Pose:_ Depending upon what reference space type is used, a developer may access the orientation and possibly position of a viewer, which represents either the user's head or device. This is often in real-world scale relative to an arbitrary origin. Some reference spaces may limit pose data, for example by restricting position to within a boundary, restricting viewing angle to within less than a full 360 degrees, or locking position to a fixed point. 14 | 15 | Some frames of reference may provide multiple poses - for example, an HMD may provide a pose for each eye (to enable stereoscopic rendering), other devices may support multiple displays (such as a bank of monitors). In this case, even if no absolute position is provided the relative position between the provided poses may vary and provide additional information about the user (e.g. interpupillary distance, or monitor locations). 16 | 17 | ### Device Support 18 | 19 | This document assumes that any device could support all types of data considered: 20 | 21 | * Some smartphone devices today provide device orientation sensors and may be able to infer floor height through camera data using SLAM libraries such as ARCore and ARKit; 22 | * Some laptops today include device orientation sensors and cameras, giving them similar capabilities to smartphones; 23 | * VR headsets today may provide floor height, pose data, and immersive bounds; 24 | * Fixed-position devices may use a user-facing camera to determine user location, head and eye orientation relative to the screen. 25 | 26 | While it is assumed that all devices could potentially get all types of data, only some devices are likely to obtain multiple simultaneous poses (e.g. an HMD may get a pose for each eye.) This is assumed to depend upon the reference space type and the device form factor. 27 | 28 | ## Threat Vectors 29 | 30 | ### Gaze Tracking 31 | 32 | _Exposed by: Orientation. Threat increases with full pose data (both orientation and position)._ 33 | 34 | Sites may be able to determine what the user is looking at through analysis of pose orientation. While pose data is not intended to provide precise eye tracking, in some contexts the direction of a viewer may provide an approximation of what the user is looking at. 35 | 36 | For example, while using an immersive headset the viewer may adjust their head orientation as they look at different 2D parts of a web page, at objects outside the page, or at an inline 3D elements on the web page (where the viewer may adjust their gaze to look at the 3D element from different directions). In such situations the web page may be able to track user interest in parts of the web page, or interest in their surroundings. 37 | 38 | Note: Specific to browsing a 2D web page, gaze tracking may be limited to the contents of the site itself, as the site may have knowledge of its own contents. Outside the site content, gaze tracking may only be a threat if the site also has an understanding of what is outside the web page (e.g. access to a camera feed showing the real world, or access to real-world geometry). 39 | 40 | 41 | ### Reading Inputs (or "Input Sniffing") 42 | 43 | _Exposed by: Orientation. Threat increases with full pose data (both orientation and position)._ 44 | 45 | Mobile device orientation may [be used](https://arxiv.org/pdf/1602.04115.pdf) to reliably determine a user's input actions (e.g. scrolling, tapping). Similar threat vectors may exist where pose is affected by device orientation, and where device orientation may be affected by user input. Examples of input devices that may affect device orientation include physical keyboards (e.g. on a laptop), touch-screen inputs (e.g. a mobile touch keyboard or PIN pad), or virtual keyboards (e.g. in an HMD). 46 | 47 | Generally speaking, this threat vector may be a concern in situations where a site or origin has access to pose data, while input is being entered elsewhere, and that input is not intended to be available to the site or origin. For example, when: 48 | 49 | 50 | 51 | * Pose data is available to a website that is occluded. This includes the website being behind a virtual keyboard, behind a PIN pad, or in a background tab; 52 | * The site is fully visible but not in focus, as in an open, non-focused desktop window; 53 | * Pose data is available to an origin on a website while another origin on the same website is soliciting input. 54 | 55 | 56 | ### User Profiling 57 | 58 | _Exposed by: Pose, Floor Height, or Boundaries._ 59 | 60 | Floor height, or pose data, may be used to profile user physical characteristics such as height or even facial characteristics. Pose data may be used to profile a user's activities - for example, whether they are walking, running, or stationary. The user's gait, as exposed through pose data, might even be used to profile disabilities or other characteristics. On devices where a pose is provided for each eye, the interpupillary distance (IPD) may be used to estimate a user's head size. For example, this may be used to determine whether or not the user is a child. 61 | 62 | The boundaries of a user's session may be used to infer user characteristics or context. For example, a large boundary may suggest that the user is affluent and can afford a large room. Similarly, the specific shape of a boundary may be used to determine what room the user is in, or whether they are outside. 63 | 64 | 65 | ### User Fingerprinting 66 | 67 | _Exposed by: Pose, Floor Height, or Immersive Bounds._ 68 | 69 | User fingerprinting is the ability to establish that the user is the same between two sessions, based on unique characteristics of their configuration. 70 | 71 | Session boundaries - particularly when established using physical placement of sensors - may be sufficiently unique to allow user fingerprinting. 72 | 73 | A user's physical characteristics may affect pose data and may allow fingerprinting (for example, by analyzing a user's gait). A user's floor height (whether set manually or algorithmically) may be unique. In sessions that provide multiple poses, the relative difference in position and orientation between those poses may allow fingerprinting (for example, on devices that allow configuration of interpupillary distance.) 74 | 75 | Device orientation sensors have the potential to enable fingerprinting, particularly if combined with other browser fingerprinting signals [[1](https://arxiv.org/pdf/1605.08763.pdf)] [[2](https://arxiv.org/pdf/1503.01874.pdf)]. If pose data is directly based upon such orientation sensors then that pose data may enable fingerprinting. While the research was based on phone sensors, the same issues may apply to headsets and other form factors. 76 | 77 | 78 | ### User Location 79 | 80 | _Exposed by: Pose Data gathered over a sufficiently large region._ 81 | 82 | Particularly if the session is non-stationary and unbounded, a site may be able to infer user location by tracking the user's position w.r.t. to an arbitrary unknown origin and correlating with known pathways or roads. 83 | 84 | For example, on a university campus tour, the user's path may have high correlation to the pathways of that campus and thus the site may be able to infer (a) that the user is on the campus and (b) the user's exact location on that campus, even if the user does not provide additional input. 85 | 86 | As another example, [research suggests](https://ieeexplore.ieee.org/document/8406600) that it may be possible to identify a user's location in major cities given less than 1000m of trajectory data when driving on roads. 87 | 88 | ### In Combination with Real-World Geometry or Geolocation 89 | 90 | _Exposed by: Pose Data combined with real-world geometry understanding_ 91 | 92 | Pose data in combination with real-world geometry understanding or geolocation may compound certain threat vectors associated with both data types. For example, if pose data is used to perform gaze tracking outside the visible area of a web site, then real-world geometry or geolocation may allow a site to determine what real-world objects a user may be looking at. As another example, pose data in combination with real-world geometry may allow a site to infer user height, leading to the same threat vectors as Floor Height above. 93 | 94 | ## Possible Mitigations 95 | 96 | ### Focused and Visible 97 | 98 | At an application level, ensuring that pose data is only available to an application while that application is in focus and visible can mitigate a range of threat vectors. For example, such a limitation prevents a site from accessing pose data influenced by inputs that are entered outside of the application, such as when the site is behind a PIN pad, when a 2D site is viewed within a 3D HMD environment, or when a keyboard on a laptop is used to enter a password in a different application or outside the browser. 99 | 100 | Similarly, within browsers that allow multiple web applications to be open at the same time, ensuring that pose data is only available to a single application/frame that is in focus and visible can help mitigate a range of threat vectors. For example, this approach could prevent a site in a background tab from accessing foreground tab inputs, or prevent a non-focused foreground tab site from accessing input data solicited by a focused foreground tab site in another window. 101 | 102 | In general, pose data should be limited to when it actually has potential value for the user and when providing such data could reasonably be expected by the user. For example, an inline experience does not need to be updated while the phone is in the user's pocket, so there is no value to giving the site pose data, only privacy costs. Specifically, pose data should generally not be provided when a device is locked or the display is asleep. This may prevent some use cases, such as audio-based experiences or user-desired path tracking, but those are the exception and should be handled with more explicit intent and/or consent mechanisms. 103 | 104 | ### Same-Origin, or Single-Origin Only 105 | 106 | To address cross-origin threats, a user agent may wish to restrict access to pose data to only a single origin at a time. For example, restricting pose data to only the origin in focus may mitigate threat vectors related to cross-origin input sniffing. This may help mitigate both cross-origin iframe threats, as well as cross-origin DOM content that may be displayed in a fully immersive experience. 107 | 108 | More generally, a user agent may wish to restrict access to pose data to only origins that are the same as the top-level document. 109 | 110 | ### Data Throttling 111 | 112 | In theory, limiting pose input frequency may reduce a site's ability to infer gaze, input data, location, or perform user profiling. In practice however, such throttling would certainly affect the user experience and it is unlikely to provide additional security. 113 | 114 | For example, even at 1Hz, the user's location and gaze information could likely be inferred. Similarly, while touch input data snooping was performed at frequencies as [low as 20Hz](https://arxiv.org/pdf/1602.04115.pdf), it is not known whether there is in practice a lower bound that prevents such detection. In either case display frame rates are likely to be higher than 20Hz and thus pose data at a higher frequency would be needed to maintain a reasonable user experience. 115 | 116 | ### Position Limiting 117 | 118 | The user agent may wish to limit position information based upon which type of reference space the application has requested. For example, while a stationary reference space is intended to support applications that require limited or no movement, they still allow accurate positional data to be reported when available for user comfort. When using a stationary reference space, the user agent may wish to stop reporting pose data if the user moves beyond a limited radius from the origin. This may help address threat vectors associated with location tracking for these types of reference spaces. 119 | 120 | Note: Because positional changes may be reported for "stationary" reference spaces, policies and mitigations for such reference spaces must not assume that the position is fixed. 121 | 122 | Note: The intent of position limiting is to limit data to the application, but care should be taken to ensure that the user experience is not negatively affected; it is generally undesirable to have a 'hard stop' at the position limits, as this may cause a disruptive user experience where the user is moving but the scene is not. The user agent may choose to transition to a different, non-application user interface when the viewer is outside the permitted position, and may choose to provide guides back to the permitted position. 123 | 124 | ### Rounding, Quantization or Fuzzing 125 | 126 | Either rounding, quantization, or fuzzing (the introduction of slight, random errors) in data for floor height and session bounds may alleviate the threat of fingerprinting users using that data. For example, while a user's exact physical VR sensor locations may be unique when measured with 0.1cm precision, they may not be unique when measured at 10cm precision. Care should be taken to ensure that user safety and experience is not compromised when using this approach. 127 | 128 | Sufficiently low precision in pose data may prevent sites from performing input detection, fingerprinting, and gaze tracking. Some AR modes on some devices may not require precise pose data. For example, one [study](https://arxiv.org/pdf/1605.08763.pdf) suggests that quantization of orientation sensor data to 6-degree increments may alleviate the threat of fingerprinting without significantly affecting tilt-based handheld smartphone experiences. Such quantization may be suitable for a handheld smartphone user viewing a 3D object inline on a web page. 129 | 130 | However, it is unclear whether the same restricted level of precision will allow a meaningful experience when in an immersive mode, or when viewing inline content in an immersive headset; the suitability of low precision to effect useful experience will likely vary based upon the session mode and the device form factor. 131 | 132 | ### User Intent through Gaze 133 | 134 | On systems where the user's gaze is known, making pose data only available to elements where the user is actively looking may help mitigate certain threat vectors. For example, a user agent may choose to only make pose data available for inline experiences when the user is actually looking at that inline element; this approach may prevent gaze tracking for elements outside the inline experience. 135 | 136 | This approach alone may not mitigate threat vectors associated user profiling. It further may not mitigate threats related to reading inputs - for example, if a user is gazing towards a virtual keyboard and an (unfocused, background) site behind the keyboard, then the site may still be presented with pose data while the user entered data on the keyboard. Further, the act of limiting pose data to what the user is looking at may enable gaze tracking. For these reasons it is unclear why using the user's gaze to limit inputs would be preferable to the [Focused and Visible](#focused-and-visible) mitigation. 137 | 138 | ### Inline vs. Exclusive 139 | 140 | A user agent may apply different privacy rules when the user is in an exclusive experience (e.g. fullscreen on mobile or an immersive mode on headsets) as opposed to an inline experience. For example, an exclusive experience may prevent gaze tracking for elements outside that exclusive experience by virtue of the fact that elements are not visible. Further, if the user agent controls the exclusivity of the experience, then the user agent may prevent other applications or sites from soliciting input. 141 | 142 | ### Position and Orientation vs. Orientation Only 143 | 144 | A user agent may choose to apply different policies based on whether the site will access position and orientation data (a full pose) as opposed to only orientation data with a fixed position. Such an approach may alleviate the threats of identifying location and some user profiling. However, [research suggests](http://web.cs.ucdavis.edu/~hchen/paper/hotsec2011.pdf) that orientation alone is sufficient to read inputs, and thus it is unlikely that this approach would alleviate the threat of reading an input. 145 | 146 | ### Active User Engagement 147 | 148 | A user agent may wish to prevent the availability of pose data unless the user has indicated that they are actively engaging with the session, page, and/or iframe that is requesting pose data. 149 | 150 | ### Visible Indicators 151 | 152 | A user agent may wish to display a visual indicator to the user indicating that pose and/or other data is being shared with the site. 153 | 154 | ### Consent 155 | 156 | The user agent may require user consent before allowing the site to access pose or reference space data. 157 | 158 | ### Data Usage Validation 159 | 160 | There may be mechanisms by which the user agent can validate that the use of pose data is legitimate. For example: 161 | 162 | * The user agent may wish to check GL matrices to ensure that they are tracking pose data, or ensure that there are visible changes that correlate with updates to pose data. Note that if the immersive API allows developers to change or ignore pose data, then this approach may not work. 163 | * The user agent may wish to only provide pose data for visible, non-obscured regions of a specific size or percentage of the window. This approach may help ensure that the user knows what is going on and prevent the use of invisible (e.g. 1x1) experiences. 164 | 165 | ### Generic Sensor Guidelines 166 | 167 | Pose data, and in some cases reference space data, may be based upon sensors on the user's device. For such data, API specifications and user agents should follow the Generic Sensor API's security and privacy considerations unless there is a direct user-benefiting reason not to and appropriate mitigations to compensate are put in place: 168 | 169 | * [Secure Context](https://www.w3.org/TR/generic-sensor/#secure-context): Only expose APIs and data on secure origins 170 | * [Focused Area](https://www.w3.org/TR/generic-sensor/#focused-area): Only provide poses to the focused frame 171 | * [Same Origin](https://www.w3.org/TR/generic-sensor/#concepts-can-expose-sensor-readings): Only provide poses for frames that are same-origin to the top level document 172 | * [Visibility](https://www.w3.org/TR/generic-sensor/#visibility-state): Only provide poses on active documents whose visibility state is 'visible' 173 | * [Feature Policy](https://www.w3.org/TR/generic-sensor/#feature-policy): Pose and reference space data access should be aligned with feature policies for the underlying sensors that generate that data. 174 | * [User Consent](https://www.w3.org/TR/generic-sensor/#permissions): Where appropriate, require user consent, either through the Permissions API or through other means, before allowing the site to access data. 175 | 176 | -------------------------------------------------------------------------------- /EXPLAINER.md: -------------------------------------------------------------------------------- 1 | # Explainer - AR Privacy & Security on the Web 2 | 3 | 4 | ## Scope 5 | 6 | This explainer outlines security and privacy **considerations**, including specific threat vectors, and possible **mitigation** options for user agents that wish to implement augmented reality on the web. The purpose of this explainer is to inform the development of web specifications related to augmented reality. 7 | 8 | It is not the purpose of this explainer to prescribe a specific solution. This document is not a specification, nor is the intent to specify requirements for specifications. All text in this explainer is intended as informative only, and should not be treated as normative. 9 | 10 | 11 | ## Background 12 | 13 | Augmented reality systems typically use one or more sensors to infer information about the real world, and may then present information about the real world to the user. In such systems there may be a wide range of input sensor types used, and a range of real-world data generated. Further, sensors (e.g. the camera) may be accessed and used to render information in context for the user. 14 | 15 | 16 | ## Accessing Real-World Data 17 | 18 | A specification for augmented reality might allow the site to access information about the real world environment around the user. There are a variety of types of such information, ranging from real-world geometry (e.g. planes or point clouds representing actual objects in the user's space), to the ability to identify and track specific objects in the user's space. Different data types have different considerations, potential threat vectors and mitigations. 19 | 20 | 21 | ### Real-World Geometry 22 | 23 | Geometry representing objects in the real world can be generated using one or more sensors, including an RGB camera, or a depth camera. Such data today may be as imprecise as ~5cm but this is expected to become more precise over time as sensor fidelity, algorithms, and the ability to combine sensor inputs evolve. However, such geometry data can be generated today from just camera data at varying levels of permission. 24 | 25 | 26 | #### Threat Vectors 27 | 28 | There are wide range of threat vectors relating to site access to real-world geometry, for example: 29 | 30 | 31 | ###### User PII 32 | 33 | 34 | 35 | * Users may be identified by learning the geometry of their face, or through gait analysis 36 | * In cases where the user is not visible to the AR system, identifying a user's associates may allow sites to identify the user 37 | * Credit card information may be pulled by analyzing embossed card geometry 38 | 39 | 40 | ###### Fingerprinting 41 | 42 | 43 | 44 | * Room geometry may be used to identify when two sessions are occuring in the same space 45 | 46 | 47 | ###### User Location 48 | 49 | 50 | 51 | * Real-world geometry may indicate a specific location. For example, the user might be near a recognizable object (e.g. the Eiffel Tower) or some place with unique geometry characteristics that the site can identify. 52 | 53 | 54 | ###### User Profiling 55 | 56 | 57 | 58 | * Estimating the size of the user's house, to estimate user income. 59 | * User ergonomics may be inferred. For example, users will typically hold phones at the same height, in the same location, and user height may be inferred by identifying the ground plane and the device's relative position to it. 60 | * Determine what businesses or apartments the user has visited, if enough real-world geometry is available to create a continuous map segment between the user's current location and other locations. 61 | * In combination with other technology such as eye tracking, could allow inference of what the user is interested in or attracted to. 62 | 63 | 64 | ###### User Physical Safety 65 | 66 | * The occlusion of real-world objects pose potential threats to users. If a user cannot see a stop sign they might get into a traffic accident; if the user cannot see a chair on the floor they might trip over it. On devices where a site can directly obscure a user's view of the real world, access to real-world geometry may allow sites to identify and occlude objects related to user safety. For example, a site might use real-world geometry to identify the octagonal shape of a stop sign and render an object to occlude it. 67 | 68 | 69 | ##### Precision of Real-World Geometry 70 | 71 | Even a little real-world geometry data poses a potential security risk. For example, exposing a simple API such as hit-testing real-world objects with a ray can be used quickly to identify the floor plane and user height, and with enough data points can be used to build a point cloud in space from which real-world geometry can be inferred. 72 | 73 | Further, even simplified geometry (such as bounding boxes) can provide information about the user. For example, bounding boxes of a certain size and placement can indicate that the user is in a kitchen, a bedroom or outdoors. 74 | 75 | 76 | ##### Historical Real-World Geometry 77 | 78 | An additional privacy concern with world geometry data applies to AR subsystems that continuously improve their map of world geometry during system use, so as to not require the user to manually rescan their environment again each time they enter a new app or return to an existing app. 79 | 80 | On such systems, some geometry data that is available from the subsystem may have been captured outside of a user's current user agent session, perhaps in previous device sessions. Processed world geometry can thus create novel privacy concerns that go beyond raw camera data, as they can pull in _historical_ perception of the environment. 81 | 82 | This may be in conflict with user expectations for privacy and security. When a user enables their webcam, it is generally understood that the page can see the user's surroundings in the present, but any objects put away prior to the webcam session will stay private. In contrast, an AR subsystem that captures world geometry during general usage might still have information about objects or parts of the world that are not presently visible. It is unclear whether users will understand that such historical data exists and whether such data is available to the site. 83 | 84 | Further, on such systems the geometry generated during a browsing session may persist within the subsystem beyond the browsing session, potentially allowing other applications or websites to access that data. This is in conflict with user expectations for privacy and security, where it is generally expected that data generated on a web page is not available to other sites or applications. 85 | 86 | 87 | #### Possible Mitigations for Real-World Geometry Considerations 88 | 89 | 90 | ##### Throttling and Precision 91 | 92 | Geometry precision is typically necessary to do even basic AR tasks. For example, tracking an object's location in a room, or measuring an object, requires that the data be reasonably accurate. 93 | 94 | Certain threat vectors can be mitigated by only providing a rough estimate of real-world geometry to a site. For example, while detailed geometry could identify credit card numbers or faces, this is unlikely if the site only has access to two or three polygons representing the card or the face. 95 | 96 | However, it's unclear how much value lowering precision or introducing error would have. Scrambling or de-precising geometry data could potentially be worked around by restarting the session several times; and analysis of the differences in errors may allow inference of what the actual 3D data is. 97 | 98 | Further, it's not clear what value data throttling would bring. Even a small amount of data can pose a security risk, and without a hard limit on data access, over time a site could gather a large amount of geometry data. For example, if geometry gathering was limited to a user action, sites could design experiences (e.g. games) which solicit frequent user actions to get as much geometry as possible. 99 | 100 | In both cases, providing low-resolution or inaccurate data might trade off perceived safety for app quality, but but given the low amount of data necessary to pose a risk it is possible that such an approach would hobble the API (degrade the user experience) without actually making it safer. 101 | 102 | A user agent might choose to allow users to explicitly choose the data fidelity available to a site, possibly with a visual indication of what data resolution is being provided, so that the user can estimate both the threats to their privacy as well as the impact on their user experience. 103 | 104 | _TODO: Provide examples of decimated, lower-resolution data sets so the reader can get a feel for what information is actually available at various geometry levels._ 105 | 106 | 107 | ##### Filtering 108 | On systems where historical geometry is available beyond the scope of the current session, user agents may wish to filter what data is available to sites to only include data present in the user's current browsing session. 109 | 110 | For example, a user agent might limit the distance and location of data that is available, to only that _near_ the user. This approach has limitations and might still expose data that is not visible to the user (for example, geometry for parts of the world directly behind a wall in the user's space). This approach may also limit AR use cases (for example, in a large room or outdoors, some geometry might be beyond the allowable distance). 111 | 112 | Similarly, a user agent might limit the scope of data to only that _visible_ in the user's view, or data about geometry that has actually been seen by the user in the current session. For example, a hit testing API might limit results to those visible in the current view by only providing the closest hit-test result (thus not including anything that is occluded) and by restricting rays to those within the user's viewport, originating from the device pose. However, this approach may not guarantee that historical data is excluded. For example, if there is not yet any geometry information known about the visible user's space, then hit tests may return results from known, adjacent areas without a way of disambiguating whether those results are visible or not. 113 | 114 | The approach that a user agent takes will likely depend upon the mechanisms offered by the underlying AR system. For example, a subsystem may have mechanisms for filtering real-world geometry to that seen within the current session, or may have its own context management system for managing access to previously mapped areas such as rooms. 115 | 116 | 117 | ##### Clearing Data 118 | 119 | On systems where historical geometry is available between sessions, user agents may wish to clear or delete real-world geometry to avoid giving sites access to historical data, or giving other applications access to data generated during the browsing session. 120 | 121 | For example, at session start the user agent might clear any historical data present on the system, only using data generated by the sensors in the current user's context. This approach might require the user to spend time scanning their environment before the AR experience can execute. 122 | 123 | Similarly, the user agent might clear any real-world geometry data generated by the browsing session when the session ends. This may be complicated or impossible if the underlying AR subsystem has has merged some of this geometry with its larger real world understanding; for example, merging newly discovered floor planes into a larger plane that includes historical data. 124 | 125 | The approach that a user agent takes will likely depend upon the mechanisms offered by the underlying AR system. For example, a subsystem may have mechanisms for sandboxing real-world geometry between app sessions. 126 | 127 | ### Object or Image Identification 128 | 129 | Another type of real-world data access is the ability to detect objects or images in the scene. Such identification might include 3D object detection (e.g. identifying and tracking a 'dining room table') or 2D planar image detection (e.g. identifying and tracking a poster of [Rick Astley](https://www.youtube.com/watch?v=dQw4w9WgXcQ)). 130 | 131 | #### Threat Vectors 132 | 133 | Depending upon how the API is structured, the user may not know what objects the site is looking for. For example, if an API accepts a set of reference images and attempts to match them in the real world, the user likely would not know what those references images are. 134 | 135 | Using such APIs, the site could gain information about what is present in the user's environment, such as: 136 | 137 | * Detecting that there is high-denomination money 138 | * Detecting valuable furniture or electronics 139 | * Detecting publicly visible objects or images such as storefront signs 140 | 141 | This capability exposes several threat vectors including: 142 | 143 | 144 | ###### Location Identification 145 | * Storefront signs, or similarly known images, might allow a site to determine a user's location 146 | * Publicly visible QR or barcodes could allow a site to track user location 147 | * A unique configuration of a set of known objects (e.g. picnic tables) might allow a site to recognize the user's location (by knowing the configuration of picnic tables in parks) 148 | 149 | 150 | ###### User Profiling 151 | * Identifying expensive items, particularly in conjunction with location, could allow a bad actor to determine targets for real-world burglary 152 | * Identifying expensive items or currency could allow demographic-based ad targeting 153 | 154 | 155 | ###### Compromising Information 156 | * Identifying embarrassing imagery in the user's space might allow a site to embarrass the user 157 | 158 | ###### Unique Codes 159 | * Detecting QR or bar codes in the scene could allow a site to determine the presence of specific objects, or user's location 160 | 161 | 162 | ###### Obscuring Real-World Data 163 | * Detecting and obscuring certain objects in the user's field of view could pose safety issues for the user. For example, a site might detect stop signs and obscure them. 164 | 165 | #### Possible Mitigations for Object Detection 166 | 167 | ##### Throttling 168 | 169 | Limiting the number of objects or images that the site can identify in a given session could help protect the user from broad-ranging attacks. 170 | 171 | ##### User-Visible Queries 172 | 173 | If the user has the ability to visualize what the site is looking for (e.g. the ability to preview reference images), this could help the user ensure that only relevant information to their task is being searched for. 174 | 175 | 176 | ##### Limited Location or Tracking 177 | 178 | A user agent may choose to prevent a site from accessing specific locations, or the ability to track identified objects. Such limitations could mitigate certain threat vectors (such as identifying the exact configuration of objects). However, even a declarative method (e.g. "place this 3D model on a $100 bill") could allow the site to know that certain objects are in the scene (e.g. when the site knows the model is placed). 179 | 180 | 181 | ##### Composition Rules 182 | 183 | A user agent or underlying systems could establish rules for how imagery is composited into the user's field of view, to prevent obscuring important objects or critical parts of the user's vision, to ensure user safety. 184 | 185 | 186 | ## Permissions and User Consent 187 | 188 | ### Perception of Camera Access 189 | 190 | Augmented reality systems typically enhance some part of the user's field of view, and may either render the camera behind those enhancements (e.g. on a smartphone) or use the camera for visual realism (e.g. refraction). Even if the site cannot access camera data, the _perception_ that the site has access to camera data is a consideration. 191 | 192 | #### Threat Vectors 193 | 194 | User perception of camera access is an actionable threat vector, even if the site cannot access camera data. Consider the following scenario: 195 | 196 | 1. A malicious site could create an AR session that displays camera (and nothing else); 197 | 1. A smartphone user could visit that site, and be presented with a front-camera view, potentially capturing a compromising situation; 198 | 1. The user would likely close the camera view, quickly; 199 | 1. The site could then say "Thank you for that picture! We'll add to the public gallery. Please pay our $10 membership fee to control who can see the picture." 200 | 201 | In this case, even if the site does not have access to camera data, it is likely that some percentage of users will be fooled into paying the membership fee. More generally, these threat vectors could be similar to malware which falsely claims that the user agent is compromised. 202 | 203 | Such threat vectors could be a concern even on devices without cameras. Consider the following scenario: 204 | 205 | 1. The user is using an optically see-through device; 206 | 1. A malicious 2D web page - without access to camera data - recognizes that it is on an optically see-through device (e.g. detecting through CSS that it is on an additive display); 207 | 1. The page renders a camcorder viewfinder with a blinking REC indicator in the corner (e.g. on an additive display, the viewfinder could be filled with black pixels, and would thus show the real world). This would make it appear that the web page can record real world imagery, even though it can't. 208 | 209 | If not managed properly, the perception of unauthorized camera access could result in a negative perception of the user agent. For example, "Did this browser just give a site access to my camera without permission!?" 210 | 211 | #### Possible Mitigations 212 | 213 | If the site does not have access to the camera, ideally the user agent will communicate that fact explicitly to the user to mitigate the false perception. Other UX approaches might achieve the same outcome. 214 | 215 | Given the threat vectors associated with the perception of camera access, user agents may wish to be explicit about whether a site does or does not have access to camera data (independent of other sensor data) 216 | 217 | 218 | ### Permissions 219 | 220 | #### Considerations 221 | 222 | It is likely that the user agent will wish to gain user consent, or at least notify the user about what data is being accessed, when entering an web-based augmented reality session. In such sessions, it is possible that the site will have access to one or more sensors and/or one or more types of real world data. 223 | 224 | Traditional web permissions could be used to solicit user consent. However, this approach has several considerations: 225 | 226 | ##### Over-prompting and fatigue 227 | 228 | Users that get prompted for permission(s) every time they enter an AR experience could be trained to accept _all_ types of permissions without reading them. For AR in particular, a site may be able to train a user to accept an AR permission and then prompt for (and gain) permanent camera permission without the user reading the prompt or understanding that the camera permission is not related and/or restricted to the AR capabilities. 229 | 230 | ##### Ambiguity, Complexity 231 | 232 | Web permissions may make it difficult to accurately describe what is happening. AR may expose a wide range of data to the site, and it's difficult to describe that data access in a manner that is clear to the user. In some ways, _incomprehensible_ permissions are worse than no permission, and existing permission user experiences are often limited in how much text it can display. 233 | 234 | Further, the user may not connect a web permission with the AR session. The current permissions user experience may confuse users who might approve or deny it without realizing that it is a gatekeeper to a camera-based AR experience (i.e. they think it's just another permission). 235 | 236 | ##### Data access beyond the life of the session 237 | 238 | There are [threat vectors](https://github.com/immersive-web/privacy-and-security/issues/6) associated with a site having access to resources after the AR session ends, particularly if those resources are general permissions (e.g. camera), but even the [perception of camera access is a threat vector](https://github.com/immersive-web/privacy-and-security/issues/3), meaning that a long-running AR permission is a potential danger. Web and native permission models have generally included some amount of persistence (either permanent or time-limited) to reduce friction and over-prompting. 239 | 240 | Further, it is expected that augmented reality experiences will be desirable for users, and sites can use the incentive of an augmented reality experience to gain consent. Because of this, malicious sites could gain long-running AR permissions for one safe context, and then solicit those users to visit malicious experiences later (where the user may not realize that they've already given permission to the site). 241 | 242 | ##### Consent vs. Notification 243 | 244 | Some augmented reality specifications may support AR sessions where no data is shared with the site. In those situations, user agents may still wish to notify the user what is happening (e.g. a declarative AR API where the site cannot access the [camera](https://github.com/immersive-web/privacy-and-security/issues/3), and the user agent wants to be clear on that point). The existing web permissions model does not support this. 245 | 246 | ##### User Control 247 | 248 | With augmented reality, a user may wish to configure what data is available to the site within a session. For example, the user may wish to allow some level of geometry access, but block the ability to access the camera or the ability to identify specific objects. This type of configuration within the scope of a session is not available with web permissions. 249 | 250 | #### Possible Mitigations 251 | 252 | There are several approaches that could mitigate the concerns with using web-based permissions: 253 | 254 | ##### Time-limited permissions 255 | 256 | To address concerns about the long-running nature of web permissions, a user agent may wish to time-limit permissions. It is not clear that giving users explicit control would solve this problem, but permissions could be session-based by default. However, this approach may not adequately inform the user as to the connection between the AR session and the granted permission, and may additionally lead to over-prompting and permission fatigue. 257 | 258 | ##### Augmented Reality Mode 259 | 260 | Instead of permissions, an alternative approach could be an explicit "AR Mode" which may ask for user consent before entering, with consent lasting only as long as the page is in this mode (i.e., session). 261 | 262 | Such a mode would have the following advantages: 263 | 264 | 1. _Flexibility_. The user agent could either ask for consent, or notify the user, or both. A user agent could further choose a UX for this mode allowing presentation of detailed information to the user without the constraints of traditional permissions user experience. This approach also gives the most opportunity to customize the experience across a variety of form factors, some of which may have different data sharing or user interface requirements than others. 265 | 1. _Clear Scope_. Similar to Full Screen and VR presentation modes, this mode may be clearer to users than permissions that they are entering a specific mode, and there could be clear instructions on how to leave that mode. The origin that has data access during the mode can also be clearly indicated, and the user interface could be clear to the user that they're entering a specific experience, not just giving the site access to some data. 266 | 1. _Enforcement_. A specific AR mode could enforce data access directly connected to what the user was notified of, and what user consent was provided. This may be less clearly defined with a permission that could (in theory) be requested independent of the actual session creation. 267 | 268 | --------------------------------------------------------------------------------